Dear All,
I am running a Yambo calculation on a small gold cluster (4 atoms, 76 electrons). I am running some convergence calculations, in particular on the NgsBlkXp variable for the GW correction in the PPA. I was able to run successfully in a fast way (a few minutes max) up to NgsBlkXp= 9 Ry — if I select 10 Ry the code starts the calculation, but then somewhat sits there forever. I left the computer cluster run this calculation for about 10 hours (compared with the about 10 minutes needed for the 9 Ry) before killing it. Not sign of a crash or any other error was reported in the report or log files, and also the cluster didn’t signal any issue (memory, or crashing processes). Where can I look for the possible problem? How can I proceed?
A few pieces of information:
1) Running on Yambo 5.2.0
2) About 8 nodes, 384 process both MPI and OpenMP
I attach here the log of CPU 1 and report for the 9 Ry case. The 10 Ry case looks similar but it sits at X@q[1] forever.
All the best,
Roberto
Weird behaviour with NgsBlkXp
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano
-
- Posts: 12
- Joined: Thu Mar 22, 2018 8:45 am
Weird behaviour with NgsBlkXp
You do not have the required permissions to view the files attached to this post.
Roberto D'Agosta
Nano-Bio Spectroscopy Group
Av de Tolosa 72
Donostia-San Sebastian
Spain
Nano-Bio Spectroscopy Group
Av de Tolosa 72
Donostia-San Sebastian
Spain
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Weird behaviour with NgsBlkXp
Ciao Roberto,
I can see fmor the report at 9Ry you have a X matrix which is more than 20k X 20k.
At 10 Ry it will be larger, and the inversion of such a matrix scales horribly with the dimension size.
You can try to use more CPU and increase the number in the SCALPACK procedure:
X_and_IO_nCPU_LinAlg_INV= 16
but I'm not sure that this will solve the problem, as SCALAPACK are not super efficient.
We have faced the same issue in the past and people that have analyzed the problem in deep can provide you more insight.
Best,
Daniele
I can see fmor the report at 9Ry you have a X matrix which is more than 20k X 20k.
Code: Select all
X matrix size : 20875
You can try to use more CPU and increase the number in the SCALPACK procedure:
X_and_IO_nCPU_LinAlg_INV= 16
but I'm not sure that this will solve the problem, as SCALAPACK are not super efficient.
We have faced the same issue in the past and people that have analyzed the problem in deep can provide you more insight.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 214
- Joined: Fri Jan 31, 2014 11:13 am
Re: Weird behaviour with NgsBlkXp
Hi Roberto,
just to follow up on Daniele's reply,
since you are at Gamma with about 384 cores, all processors are available during the inversion of the Dyson equation for P/Chi/W.
This means that if the strategy with scalapack works, you have a lot of room for improvement
you can even use something like:
X_and_IO_nCPU_LinAlg_INV= 100
that is a 10x10 scalapack grid
cheers
Andrea
just to follow up on Daniele's reply,
since you are at Gamma with about 384 cores, all processors are available during the inversion of the Dyson equation for P/Chi/W.
This means that if the strategy with scalapack works, you have a lot of room for improvement
you can even use something like:
X_and_IO_nCPU_LinAlg_INV= 100
that is a 10x10 scalapack grid
cheers
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it
-
- Posts: 12
- Joined: Thu Mar 22, 2018 8:45 am
Re: Weird behaviour with NgsBlkXp
Dear Daniele and Andrea,
Thanks for your fast and informative answers. I have checked and from 9 Ry to 10 Ry the X matrix increases from 20875 to 24357 (a 1.2 factor) that even in the worse case scenario of matrix inversion should at most double the computational time.
I will try your recommended strategy, However, I am still quite limited with the memory so I am not sure if I can assign so many cores to assign to scalapack.
What I have also noticed is that, looking at the output of ‘top’, I get that apparently only the MPI processes are still active while the open-mp are not working (seeing a user usage of the CPU at 100% rather than the 1200% for the previous calculations).
Regards,
Roberto
Thanks for your fast and informative answers. I have checked and from 9 Ry to 10 Ry the X matrix increases from 20875 to 24357 (a 1.2 factor) that even in the worse case scenario of matrix inversion should at most double the computational time.
I will try your recommended strategy, However, I am still quite limited with the memory so I am not sure if I can assign so many cores to assign to scalapack.
What I have also noticed is that, looking at the output of ‘top’, I get that apparently only the MPI processes are still active while the open-mp are not working (seeing a user usage of the CPU at 100% rather than the 1200% for the previous calculations).
Regards,
Roberto
Roberto D'Agosta
Nano-Bio Spectroscopy Group
Av de Tolosa 72
Donostia-San Sebastian
Spain
Nano-Bio Spectroscopy Group
Av de Tolosa 72
Donostia-San Sebastian
Spain