Out of memory in Static Dielectric Matrix calculation

Having trouble compiling the Yambo source? Using an unusual architecture? Problems with the "configure" script? Problems in GPU architectures? This is the place to look.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Forum rules
If you have trouble compiling Yambo, please make sure to list:
(1) the compiler (vendor and release: e.g. intel 10.1)
(2) the architecture (e.g. 64-bit IBM SP5)
(3) if the problems occur compiling in serial/in parallel
(4) the version of Yambo (revision number/major release version)
(5) the relevant compiler error message
Post Reply
varrassi
Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Fri Apr 22, 2022 11:43 am

Dear All,
I'm trying to run Yambo on a cluster with GPU nodes, but I'm getting an Out of Memory error and I don't under why.
The node is characterized by 30 cores, 350GB of RAM and 4 A100 gpu cards with 80GB of gpu RAM each. I'm trying to run a em1s run on a spin-polarized 2D system with 10 atoms and 512 bands (with terminator), using a 2 GPUs and requesting 300GB RAM (#SBATCH --mem=300G).

The LOG files says:
<12m-18s> P1: [MEMORY] Alloc WF%c( 15.22250 [Gb]) (HOST) TOTAL: 15.85183 [Gb] (traced)
<12m-18s> P1: [ERROR] Allocation of WF%c_d failed with code 1
P1: [ERROR] STOP signal received while in[07] Static Dielectric Matrix
P1: [ERROR]out of memory
I can't understand the origin of the error: both the node and the gpu card have more RAM memory than the 16Gb allocated. Have i made some mistake in input file / submission job or have I misunderstood something?

Attached to the post you can find the report/log files and the compilations report.
Thanks a lot!
Kind regards,
Lorenzo
You do not have the required permissions to view the files attached to this post.
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Fri Apr 22, 2022 1:10 pm

Dear Lorenzo,

most probably the reason for the failure is that XTermKind= "BG" is not ported to GPU, or at least not tested.
The reason for that is that it is very memory intensive and unlike the case of the GTermKind for QP calculations, we did not observe a great speed up of the convergence, moreover
the GPU porting is very efficient and it is more convenient to add bands more than using the terminator.

So, I suggest you to set XTermKind= "none" and eventually add bands until convergence.

I also noticed you are using:

Code: Select all

% QpntsRXs
   70 | 130 |                 # [Xs] Transferred momenta
%
if you need the static screening matrix for the building of the Bethe-Salpeter kernel you need to compute the screening for all the q points.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

varrassi
Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Fri Apr 22, 2022 3:56 pm

Thanks for the advice and for the quick answer!
I tried to set XTermKind= "none" but the out of memory error is still here ( I attached to the post the input and output files); The amount of memory Yambo tries to allocate now is effectively way less than before:
<12s> P2: [MEMORY] Alloc WF%c( 9.565918 [Gb]) (HOST) TOTAL: 10.54426 [Gb] (traced)
where previously the total was around ~16Gb.
Maybe it's due to a compilation problem?

Best,
Lorenzo
You do not have the required permissions to view the files attached to this post.
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Sat Apr 23, 2022 6:59 am

Dear Lorenzo,

Even it cannot be excluded, it does not seem a compilation problem, from the report the GPU cards are correctly recognized.
Can you try a job using 4 GPU cards, i.e. 4mpi processes?

X_and_IO_CPU = "1 1 1 4 1"

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

varrassi
Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Sun Apr 24, 2022 12:18 pm

Dear Daniele,
I tried with 4 gpus, but the log files of all the 4 mpi processes says: [ERROR]Allocation attempt of WF%c of negative size.
Thanks,
Lorenzo
You do not have the required permissions to view the files attached to this post.
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

User avatar
Nicola Spallanzani
Posts: 62
Joined: Thu Nov 21, 2019 10:15 am

Re: Out of memory in Static Dielectric Matrix calculation

Post by Nicola Spallanzani » Tue Apr 26, 2022 10:04 am

Dear Lorenzo,
could you try to set these variable in the input file?

X_and_IO_nCPU_LinAlg_INV= 1
DIP_CPU= "1 4 1"
DIP_ROLEs= "k c v"
DIP_Threads= 0

If this doesn't work, could you try running with only one GPU (and one MPI task too).
Another try could be use another version of openmpi.

Best regards,
Nicola
Nicola Spallanzani, PhD
S3 Centre, Istituto Nanoscienze CNR and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu

varrassi
Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Tue Apr 26, 2022 9:07 pm

Dear developers,
I tried both ideas but sadly I got respectively "Allocation attempt of WF%c of negative size" and "out of memory".
I will try to install a new openmpi version - do you suggest to use specific versions of Nvidia HPC SDK and OpenMPI?

Thanks again.
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Wed Apr 27, 2022 8:54 am

Dear Lorenzo,

most probably your problem is related to a bug in the compiler some of us already experienced.
can you try to update your source and retry? possibly the workaround has been already implemented in the newer version.
If it does not work I will ask to send you a patch to overcome this problem.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply