GW PPA termination with no error
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano
-
- Posts: 17
- Joined: Sat Aug 05, 2023 6:10 am
GW PPA termination with no error
I am running the gw_ppa calculations a fairly large system. I have already run the hartree-fock calculations for the same at 130Ry and 150Ry. I simply run my calculations with mpirun however it has been terminating prematurely lately that too without any given error. I am attaching the input files here.
Regards
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhia
Regards
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhia
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
- Daniele Varsano
- Posts: 4209
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: GW PPA termination with no error
Dear Shobhit,
looking at your MoSe2_GW_PPA_report_110b_3Ry.txt it seems that calculation is running (actually computing dipoles),
can you check if you have some error message in one of the log files or in your job output?
It is possible, even if not sure, you can have some memory issue that you can try to mitigate by defining a parallel strategy to distribute memory e.g. inserting in input:
Best,
Daniele
looking at your MoSe2_GW_PPA_report_110b_3Ry.txt it seems that calculation is running (actually computing dipoles),
can you check if you have some error message in one of the log files or in your job output?
It is possible, even if not sure, you can have some memory issue that you can try to mitigate by defining a parallel strategy to distribute memory e.g. inserting in input:
Code: Select all
X_and_IO_CPU= "1 1 1 10 5" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
DIP_CPU= "1 10 5" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 17
- Joined: Sat Aug 05, 2023 6:10 am
Re: GW PPA termination with no error
Dear Daniele,
I reran the calculations with 16-core parallelisation. When I did "top" the next day, YAMBO wasn't running. The calculation stopped at [5] Dipole again. Although the log files suggest that the Dipole part was completed.
Another unusual thing I spotted that at the top of the report file the total threads is 512 which isn't right. I am attaching the files for reference. all log files show the same thing. Thank you for your time.
Best,
Shobhit
I reran the calculations with 16-core parallelisation. When I did "top" the next day, YAMBO wasn't running. The calculation stopped at [5] Dipole again. Although the log files suggest that the Dipole part was completed.
Another unusual thing I spotted that at the top of the report file the total threads is 512 which isn't right. I am attaching the files for reference. all log files show the same thing. Thank you for your time.
Best,
Shobhit
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
- Daniele Varsano
- Posts: 4209
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: GW PPA termination with no error
Dear Shobhit,
dipoles are calculated, and the code stops when calculated, the response function. Do you know much memory per node do you have?
Conduction wfs in the log you posted is using 3.75Gb, in case you have 4Gb per core it is possible that some tasks exceed your memory resources.
Can you try to run with just 1 node (as you did) but using half of the cores (e.g. 8?).
The threads are correct as you have 32 threads per core, you can set them to 1 by setting in your input *_threads=1, or also setting in your job script:
export OMP_NUM_THREADS=1
Best,
Daniele
dipoles are calculated, and the code stops when calculated, the response function. Do you know much memory per node do you have?
Conduction wfs in the log you posted is using 3.75Gb, in case you have 4Gb per core it is possible that some tasks exceed your memory resources.
Can you try to run with just 1 node (as you did) but using half of the cores (e.g. 8?).
The threads are correct as you have 32 threads per core, you can set them to 1 by setting in your input *_threads=1, or also setting in your job script:
export OMP_NUM_THREADS=1
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 17
- Joined: Sat Aug 05, 2023 6:10 am
Re: GW PPA termination with no error
Dear Daniele,
While using 8 cores worked for the dipole, the calculation now stopped midway at the next step. The gw_ppa file used is the same as above, except some extra threads were allocated to dipoles. I am attaching the log files for the reference. Best,
Shobhit
While using 8 cores worked for the dipole, the calculation now stopped midway at the next step. The gw_ppa file used is the same as above, except some extra threads were allocated to dipoles. I am attaching the log files for the reference. Best,
Shobhit
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
- Daniele Varsano
- Posts: 4209
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: GW PPA termination with no error
Dear Shobhit,
it seems you are still at the limit of memory capability, you can try to raise the number of CPU still parallelizing on c and v in X_and_IO_CPU
X_and_IO_CPU= "1 1 1 4 4" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v"
and if possible, raise the number of the CPU changing that variable consistently.
You should not delete the databases already produced.
Best,
Daniele
it seems you are still at the limit of memory capability, you can try to raise the number of CPU still parallelizing on c and v in X_and_IO_CPU
X_and_IO_CPU= "1 1 1 4 4" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v"
and if possible, raise the number of the CPU changing that variable consistently.
You should not delete the databases already produced.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 17
- Joined: Sat Aug 05, 2023 6:10 am
Re: GW PPA termination with no error
Dear Daniele,
After trial and error I have successfully done the GW-PPA calculation till the di-electric matrix calculation [5] however now I am getting an error on the 6th step that I am unable to debug. Eaerlier when I alloted 4 cores to qp in SE parallelism, then another error occured. It stated that the cores alloted were too much [required were only 2] and the calculation stopped. I am attaching the report file along with the inputs and the logs for reference. Thanks a lot for your time. Best,
Shobhit
After trial and error I have successfully done the GW-PPA calculation till the di-electric matrix calculation [5] however now I am getting an error on the 6th step that I am unable to debug. Eaerlier when I alloted 4 cores to qp in SE parallelism, then another error occured. It stated that the cores alloted were too much [required were only 2] and the calculation stopped. I am attaching the report file along with the inputs and the logs for reference. Thanks a lot for your time. Best,
Shobhit
You do not have the required permissions to view the files attached to this post.
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
- Daniele Varsano
- Posts: 4209
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: GW PPA termination with no error
Dear Shobhit,
you are calculating two QP corrections, so you cannot assign more than two CPU in the "qp" role.
The problem most probably is due to memory. You can try to parallelize on "b" as much as you can e.g.
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_CPU= "1 1 8" # [PARALLEL] CPUs for each role
or more CPUs in case you have resources.
Best,
Daniele
PS: I suggest you then to check our convergences wrt band summation. 50 total bands with 46 occupied bands are most probably not enouch to reach converge.
you are calculating two QP corrections, so you cannot assign more than two CPU in the "qp" role.
The problem most probably is due to memory. You can try to parallelize on "b" as much as you can e.g.
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_CPU= "1 1 8" # [PARALLEL] CPUs for each role
or more CPUs in case you have resources.
Best,
Daniele
PS: I suggest you then to check our convergences wrt band summation. 50 total bands with 46 occupied bands are most probably not enouch to reach converge.
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 17
- Joined: Sat Aug 05, 2023 6:10 am
Re: GW PPA termination with no error
Dear Daniele,
We found the optimal CPU configuration for running the gw_ppa however. The file was working fine about a week ago but now we are experiencing a new error. The input files are the same as above, except the parallelisation has changed.
<03s> P1: [DIP] Writing dipoles header
P1: [ERROR] STOP signal received while in[04] Dipoles
P1: [ERROR] Writing File ./gw_ppa_30_1_40//ndb.dipoles_fragment_1; Variable DIP_iR_k_0001_spin_0001; NetCDF: HDF error
We can't find a fix for it as it is a writing error where as most of the errors there are no the forum are reading ones
Best,
Shobhit
We found the optimal CPU configuration for running the gw_ppa however. The file was working fine about a week ago but now we are experiencing a new error. The input files are the same as above, except the parallelisation has changed.
<03s> P1: [DIP] Writing dipoles header
P1: [ERROR] STOP signal received while in[04] Dipoles
P1: [ERROR] Writing File ./gw_ppa_30_1_40//ndb.dipoles_fragment_1; Variable DIP_iR_k_0001_spin_0001; NetCDF: HDF error
We can't find a fix for it as it is a writing error where as most of the errors there are no the forum are reading ones
Best,
Shobhit
Shobhit Pandey
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
Undergraduate Researcher
QEL Lab, Indraprastha Institute of Information Technology, Delhi
- Daniele Varsano
- Posts: 4209
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: GW PPA termination with no error
Dear Shobhit,
it is not easy to spot the problem with so little information. Anyway, I suggest you to update to the latest release of the code and start the calculation from scratch. If the problem persists, reply here adding input/report/log files of the calculation together with the config.log file of the compilation.
Best,
Daniele
it is not easy to spot the problem with so little information. Anyway, I suggest you to update to the latest release of the code and start the calculation from scratch. If the problem persists, reply here adding input/report/log files of the calculation together with the config.log file of the compilation.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/