Out of memory in Static Dielectric Matrix calculation

Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Fri Apr 22, 2022 11:43 am

Dear All,
I'm trying to run Yambo on a cluster with GPU nodes, but I'm getting an Out of Memory error and I don't under why.
The node is characterized by 30 cores, 350GB of RAM and 4 A100 gpu cards with 80GB of gpu RAM each. I'm trying to run a em1s run on a spin-polarized 2D system with 10 atoms and 512 bands (with terminator), using a 2 GPUs and requesting 300GB RAM (#SBATCH --mem=300G).

The LOG files says:
<12m-18s> P1: [MEMORY] Alloc WF%c( 15.22250 [Gb]) (HOST) TOTAL: 15.85183 [Gb] (traced)
<12m-18s> P1: [ERROR] Allocation of WF%c_d failed with code 1
P1: [ERROR] STOP signal received while in[07] Static Dielectric Matrix
P1: [ERROR]out of memory
I can't understand the origin of the error: both the node and the gpu card have more RAM memory than the 16Gb allocated. Have i made some mistake in input file / submission job or have I misunderstood something?

Attached to the post you can find the report/log files and the compilations report.
Thanks a lot!
Kind regards,
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

Daniele Varsano
Daniele Varsano
Posts: 4094
Joined: Tue Mar 17, 2009 2:23 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Fri Apr 22, 2022 1:10 pm

Dear Lorenzo,

most probably the reason for the failure is that XTermKind= "BG" is not ported to GPU, or at least not tested.
The reason for that is that it is very memory intensive and unlike the case of the GTermKind for QP calculations, we did not observe a great speed up of the convergence, moreover
the GPU porting is very efficient and it is more convenient to add bands more than using the terminator.

So, I suggest you to set XTermKind= "none" and eventually add bands until convergence.

I also noticed you are using:

Code: Select all

% QpntsRXs
   70 | 130 |                 # [Xs] Transferred momenta
if you need the static screening matrix for the building of the Bethe-Salpeter kernel you need to compute the screening for all the q points.


Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale

Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Fri Apr 22, 2022 3:56 pm

Thanks for the advice and for the quick answer!
I tried to set XTermKind= "none" but the out of memory error is still here ( I attached to the post the input and output files); The amount of memory Yambo tries to allocate now is effectively way less than before:
<12s> P2: [MEMORY] Alloc WF%c( 9.565918 [Gb]) (HOST) TOTAL: 10.54426 [Gb] (traced)
where previously the total was around ~16Gb.
Maybe it's due to a compilation problem?

Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

Daniele Varsano
Daniele Varsano
Posts: 4094
Joined: Tue Mar 17, 2009 2:23 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Sat Apr 23, 2022 6:59 am

Dear Lorenzo,

Even it cannot be excluded, it does not seem a compilation problem, from the report the GPU cards are correctly recognized.
Can you try a job using 4 GPU cards, i.e. 4mpi processes?

X_and_IO_CPU = "1 1 1 4 1"

Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale

Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Sun Apr 24, 2022 12:18 pm

Dear Daniele,
I tried with 4 gpus, but the log files of all the 4 mpi processes says: [ERROR]Allocation attempt of WF%c of negative size.
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

User avatar
Nicola Spallanzani
Posts: 75
Joined: Thu Nov 21, 2019 10:15 am

Re: Out of memory in Static Dielectric Matrix calculation

Post by Nicola Spallanzani » Tue Apr 26, 2022 10:04 am

Dear Lorenzo,
could you try to set these variable in the input file?

X_and_IO_nCPU_LinAlg_INV= 1
DIP_CPU= "1 4 1"
DIP_ROLEs= "k c v"
DIP_Threads= 0

If this doesn't work, could you try running with only one GPU (and one MPI task too).
Another try could be use another version of openmpi.

Best regards,
Nicola Spallanzani, PhD
S3 Centre, Istituto Nanoscienze CNR and MaX Center, Italy
MaX - Materials design at the Exascale

Posts: 4
Joined: Mon Nov 29, 2021 3:08 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by varrassi » Tue Apr 26, 2022 9:07 pm

Dear developers,
I tried both ideas but sadly I got respectively "Allocation attempt of WF%c of negative size" and "out of memory".
I will try to install a new openmpi version - do you suggest to use specific versions of Nvidia HPC SDK and OpenMPI?

Thanks again.
Lorenzo Varrassi
Ph.D. Student
Department of Physics and Astronomy
University of Bologna - Bologna, Italy

Daniele Varsano
Daniele Varsano
Posts: 4094
Joined: Tue Mar 17, 2009 2:23 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Wed Apr 27, 2022 8:54 am

Dear Lorenzo,

most probably your problem is related to a bug in the compiler some of us already experienced.
can you try to update your source and retry? possibly the workaround has been already implemented in the newer version.
If it does not work I will ask to send you a patch to overcome this problem.

Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale

Posts: 108
Joined: Thu Oct 10, 2019 7:03 am

Re: Out of memory in Static Dielectric Matrix calculation

Post by Dean » Thu Apr 18, 2024 1:40 am

Dear Daniele,

I met the same problem. When I set "BndsRnXp=980, NGsBlkXp= 13 Ry", yambo run well.
But, when I set "BndsRnXp=600, NGsBlkXp= 13 Ry", yambo crashed with "Allocation attempt of WF%c of negative size".
I use Yambo-5.2.2 on GPU node with 2 A6000.
Is there any patch to overcome this problem?
Thanks a lot.
Dr. Yimin Ding
Soochow University, China.

Daniele Varsano
Daniele Varsano
Posts: 4094
Joined: Tue Mar 17, 2009 2:23 pm

Re: Out of memory in Static Dielectric Matrix calculation

Post by Daniele Varsano » Thu Apr 18, 2024 7:50 am

Dear Yimin,
can you post in attachment your input/report file together with the config.log of your compilation?


Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale

