Weird behaviour with NgsBlkXp

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
dagosta
Posts: 11
Joined: Thu Mar 22, 2018 8:45 am

Weird behaviour with NgsBlkXp

Post by dagosta » Fri Nov 17, 2023 10:38 am

Dear All,

I am running a Yambo calculation on a small gold cluster (4 atoms, 76 electrons). I am running some convergence calculations, in particular on the NgsBlkXp variable for the GW correction in the PPA. I was able to run successfully in a fast way (a few minutes max) up to NgsBlkXp= 9 Ry — if I select 10 Ry the code starts the calculation, but then somewhat sits there forever. I left the computer cluster run this calculation for about 10 hours (compared with the about 10 minutes needed for the 9 Ry) before killing it. Not sign of a crash or any other error was reported in the report or log files, and also the cluster didn’t signal any issue (memory, or crashing processes). Where can I look for the possible problem? How can I proceed?

A few pieces of information:
1) Running on Yambo 5.2.0
2) About 8 nodes, 384 process both MPI and OpenMP

I attach here the log of CPU 1 and report for the 9 Ry case. The 10 Ry case looks similar but it sits at X@q[1] forever.

All the best,
Roberto
You do not have the required permissions to view the files attached to this post.
Roberto D'Agosta
Nano-Bio Spectroscopy Group
Av de Tolosa 72
Donostia-San Sebastian
Spain

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Weird behaviour with NgsBlkXp

Post by Daniele Varsano » Fri Nov 17, 2023 11:32 am

Ciao Roberto,

I can see fmor the report at 9Ry you have a X matrix which is more than 20k X 20k.

Code: Select all

X matrix size                                    :  20875
At 10 Ry it will be larger, and the inversion of such a matrix scales horribly with the dimension size.
You can try to use more CPU and increase the number in the SCALPACK procedure:
X_and_IO_nCPU_LinAlg_INV= 16
but I'm not sure that this will solve the problem, as SCALAPACK are not super efficient.
We have faced the same issue in the past and people that have analyzed the problem in deep can provide you more insight.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

andrea.ferretti
Posts: 206
Joined: Fri Jan 31, 2014 11:13 am

Re: Weird behaviour with NgsBlkXp

Post by andrea.ferretti » Fri Nov 17, 2023 12:07 pm

Hi Roberto,

just to follow up on Daniele's reply,

since you are at Gamma with about 384 cores, all processors are available during the inversion of the Dyson equation for P/Chi/W.
This means that if the strategy with scalapack works, you have a lot of room for improvement

you can even use something like:
X_and_IO_nCPU_LinAlg_INV= 100
that is a 10x10 scalapack grid

cheers
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

dagosta
Posts: 11
Joined: Thu Mar 22, 2018 8:45 am

Re: Weird behaviour with NgsBlkXp

Post by dagosta » Fri Nov 17, 2023 2:38 pm

Dear Daniele and Andrea,

Thanks for your fast and informative answers. I have checked and from 9 Ry to 10 Ry the X matrix increases from 20875 to 24357 (a 1.2 factor) that even in the worse case scenario of matrix inversion should at most double the computational time.

I will try your recommended strategy, However, I am still quite limited with the memory so I am not sure if I can assign so many cores to assign to scalapack.

What I have also noticed is that, looking at the output of ‘top’, I get that apparently only the MPI processes are still active while the open-mp are not working (seeing a user usage of the CPU at 100% rather than the 1200% for the previous calculations).

Regards,
Roberto
Roberto D'Agosta
Nano-Bio Spectroscopy Group
Av de Tolosa 72
Donostia-San Sebastian
Spain

Post Reply