GW PPA termination with no error

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

shobhit21287
Posts: 13
Joined: Sat Aug 05, 2023 6:10 am

GW PPA termination with no error

Post by shobhit21287 » Thu Aug 10, 2023 7:45 pm

I am running the gw_ppa calculations a fairly large system. I have already run the hartree-fock calculations for the same at 130Ry and 150Ry. I simply run my calculations with mpirun however it has been terminating prematurely lately that too without any given error. I am attaching the input files here.
report_for_MoSe2_SOC.txt
MoSe2_GW_PPA_report_110b_3Ry.txt
Input_MoSe2_gw_ppa_110b_3Ry.txt

Regards
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhia
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: GW PPA termination with no error

Post by Daniele Varsano » Fri Aug 25, 2023 7:59 am

Dear Shobhit,
looking at your MoSe2_GW_PPA_report_110b_3Ry.txt it seems that calculation is running (actually computing dipoles),
can you check if you have some error message in one of the log files or in your job output?
It is possible, even if not sure, you can have some memory issue that you can try to mitigate by defining a parallel strategy to distribute memory e.g. inserting in input:

Code: Select all

X_and_IO_CPU= "1 1 1 10 5"                 # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v"               # [PARALLEL] CPUs roles (q,g,k,c,v)
DIP_CPU= "1 10 5"                      # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"   
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

shobhit21287
Posts: 13
Joined: Sat Aug 05, 2023 6:10 am

Re: GW PPA termination with no error

Post by shobhit21287 » Fri Sep 01, 2023 12:04 pm

Dear Daniele,
I reran the calculations with 16-core parallelisation. When I did "top" the next day, YAMBO wasn't running. The calculation stopped at [5] Dipole again. Although the log files suggest that the Dipole part was completed.
Another unusual thing I spotted that at the top of the report file the total threads is 512 which isn't right. I am attaching the files for reference.
l-forum_solution_HF_and_locXC_gw0_dyson_em1d_ppa_CPU_1.txt
r-test2_HF_and_locXC_gw0_dyson_em1d_ppa.txt
gw_ppa.txt
all log files show the same thing. Thank you for your time.
Best,
Shobhit
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: GW PPA termination with no error

Post by Daniele Varsano » Thu Sep 07, 2023 5:28 pm

Dear Shobhit,
dipoles are calculated, and the code stops when calculated, the response function. Do you know much memory per node do you have?
Conduction wfs in the log you posted is using 3.75Gb, in case you have 4Gb per core it is possible that some tasks exceed your memory resources.
Can you try to run with just 1 node (as you did) but using half of the cores (e.g. 8?).

The threads are correct as you have 32 threads per core, you can set them to 1 by setting in your input *_threads=1, or also setting in your job script:
export OMP_NUM_THREADS=1

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

shobhit21287
Posts: 13
Joined: Sat Aug 05, 2023 6:10 am

Re: GW PPA termination with no error

Post by shobhit21287 » Sun Sep 10, 2023 7:37 pm

Dear Daniele,
While using 8 cores worked for the dipole, the calculation now stopped midway at the next step. The gw_ppa file used is the same as above, except some extra threads were allocated to dipoles. I am attaching the log files for the reference.
l-8core2_HF_and_locXC_gw0_dyson_em1d_ppa_CPU_1.txt
Best,
Shobhit
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: GW PPA termination with no error

Post by Daniele Varsano » Mon Sep 11, 2023 1:23 pm

Dear Shobhit,
it seems you are still at the limit of memory capability, you can try to raise the number of CPU still parallelizing on c and v in X_and_IO_CPU
X_and_IO_CPU= "1 1 1 4 4" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v"

and if possible, raise the number of the CPU changing that variable consistently.
You should not delete the databases already produced.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

shobhit21287
Posts: 13
Joined: Sat Aug 05, 2023 6:10 am

Re: GW PPA termination with no error

Post by shobhit21287 » Thu Sep 21, 2023 8:17 am

Dear Daniele,
After trial and error I have successfully done the GW-PPA calculation till the di-electric matrix calculation [5] however now I am getting an error on the 6th step that I am unable to debug. Eaerlier when I alloted 4 cores to qp in SE parallelism, then another error occured. It stated that the cores alloted were too much [required were only 2] and the calculation stopped. I am attaching the report file along with the inputs and the logs for reference. Thanks a lot for your time.
gw_ppa.txt
l-gw_ppa_10_124343_50_50_parallelised10_HF_and_locXC_gw0_dyson_em1d_ppa_CPU_2_01.txt
r-gw_ppa_10_124343_50_50_parallelised10_HF_and_locXC_gw0_dyson_em1d_ppa.txt
Best,
Shobhit
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: GW PPA termination with no error

Post by Daniele Varsano » Fri Sep 22, 2023 5:15 pm

Dear Shobhit,

you are calculating two QP corrections, so you cannot assign more than two CPU in the "qp" role.
The problem most probably is due to memory. You can try to parallelize on "b" as much as you can e.g.
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_CPU= "1 1 8" # [PARALLEL] CPUs for each role
or more CPUs in case you have resources.

Best,
Daniele

PS: I suggest you then to check our convergences wrt band summation. 50 total bands with 46 occupied bands are most probably not enouch to reach converge.
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

shobhit21287
Posts: 13
Joined: Sat Aug 05, 2023 6:10 am

Re: GW PPA termination with no error

Post by shobhit21287 » Mon Oct 02, 2023 8:35 am

Dear Daniele,
We found the optimal CPU configuration for running the gw_ppa however. The file was working fine about a week ago but now we are experiencing a new error. The input files are the same as above, except the parallelisation has changed.

<03s> P1: [DIP] Writing dipoles header
P1: [ERROR] STOP signal received while in[04] Dipoles
P1: [ERROR] Writing File ./gw_ppa_30_1_40//ndb.dipoles_fragment_1; Variable DIP_iR_k_0001_spin_0001; NetCDF: HDF error

We can't find a fix for it as it is a writing error where as most of the errors there are no the forum are reading ones

Best,
Shobhit
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: GW PPA termination with no error

Post by Daniele Varsano » Tue Oct 03, 2023 8:01 am

Dear Shobhit,

it is not easy to spot the problem with so little information. Anyway, I suggest you to update to the latest release of the code and start the calculation from scratch. If the problem persists, reply here adding input/report/log files of the calculation together with the config.log file of the compilation.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply