Page 1 of 2

Out of memory in Static Dielectric Matrix calculation

Posted: Fri Apr 22, 2022 11:43 am
by varrassi
Dear All,
I'm trying to run Yambo on a cluster with GPU nodes, but I'm getting an Out of Memory error and I don't under why.
The node is characterized by 30 cores, 350GB of RAM and 4 A100 gpu cards with 80GB of gpu RAM each. I'm trying to run a em1s run on a spin-polarized 2D system with 10 atoms and 512 bands (with terminator), using a 2 GPUs and requesting 300GB RAM (#SBATCH --mem=300G).

The LOG files says:
<12m-18s> P1: [MEMORY] Alloc WF%c( 15.22250 [Gb]) (HOST) TOTAL: 15.85183 [Gb] (traced)
<12m-18s> P1: [ERROR] Allocation of WF%c_d failed with code 1
P1: [ERROR] STOP signal received while in[07] Static Dielectric Matrix
P1: [ERROR]out of memory
I can't understand the origin of the error: both the node and the gpu card have more RAM memory than the 16Gb allocated. Have i made some mistake in input file / submission job or have I misunderstood something?

Attached to the post you can find the report/log files and the compilations report.
Thanks a lot!
Kind regards,
Lorenzo

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Fri Apr 22, 2022 1:10 pm
by Daniele Varsano
Dear Lorenzo,

most probably the reason for the failure is that XTermKind= "BG" is not ported to GPU, or at least not tested.
The reason for that is that it is very memory intensive and unlike the case of the GTermKind for QP calculations, we did not observe a great speed up of the convergence, moreover
the GPU porting is very efficient and it is more convenient to add bands more than using the terminator.

So, I suggest you to set XTermKind= "none" and eventually add bands until convergence.

I also noticed you are using:

Code: Select all

% QpntsRXs
   70 | 130 |                 # [Xs] Transferred momenta
%
if you need the static screening matrix for the building of the Bethe-Salpeter kernel you need to compute the screening for all the q points.

Best,

Daniele

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Fri Apr 22, 2022 3:56 pm
by varrassi
Thanks for the advice and for the quick answer!
I tried to set XTermKind= "none" but the out of memory error is still here ( I attached to the post the input and output files); The amount of memory Yambo tries to allocate now is effectively way less than before:
<12s> P2: [MEMORY] Alloc WF%c( 9.565918 [Gb]) (HOST) TOTAL: 10.54426 [Gb] (traced)
where previously the total was around ~16Gb.
Maybe it's due to a compilation problem?

Best,
Lorenzo

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Sat Apr 23, 2022 6:59 am
by Daniele Varsano
Dear Lorenzo,

Even it cannot be excluded, it does not seem a compilation problem, from the report the GPU cards are correctly recognized.
Can you try a job using 4 GPU cards, i.e. 4mpi processes?

X_and_IO_CPU = "1 1 1 4 1"

Daniele

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Sun Apr 24, 2022 12:18 pm
by varrassi
Dear Daniele,
I tried with 4 gpus, but the log files of all the 4 mpi processes says: [ERROR]Allocation attempt of WF%c of negative size.
Thanks,
Lorenzo

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Tue Apr 26, 2022 10:04 am
by Nicola Spallanzani
Dear Lorenzo,
could you try to set these variable in the input file?

X_and_IO_nCPU_LinAlg_INV= 1
DIP_CPU= "1 4 1"
DIP_ROLEs= "k c v"
DIP_Threads= 0

If this doesn't work, could you try running with only one GPU (and one MPI task too).
Another try could be use another version of openmpi.

Best regards,
Nicola

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Tue Apr 26, 2022 9:07 pm
by varrassi
Dear developers,
I tried both ideas but sadly I got respectively "Allocation attempt of WF%c of negative size" and "out of memory".
I will try to install a new openmpi version - do you suggest to use specific versions of Nvidia HPC SDK and OpenMPI?

Thanks again.

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Wed Apr 27, 2022 8:54 am
by Daniele Varsano
Dear Lorenzo,

most probably your problem is related to a bug in the compiler some of us already experienced.
can you try to update your source and retry? possibly the workaround has been already implemented in the newer version.
If it does not work I will ask to send you a patch to overcome this problem.

Best,
Daniele

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Thu Apr 18, 2024 1:40 am
by Dean
Dear Daniele,

I met the same problem. When I set "BndsRnXp=980, NGsBlkXp= 13 Ry", yambo run well.
But, when I set "BndsRnXp=600, NGsBlkXp= 13 Ry", yambo crashed with "Allocation attempt of WF%c of negative size".
I use Yambo-5.2.2 on GPU node with 2 A6000.
Is there any patch to overcome this problem?
Thanks a lot.
Best,

Re: Out of memory in Static Dielectric Matrix calculation

Posted: Thu Apr 18, 2024 7:50 am
by Daniele Varsano
Dear Yimin,
can you post in attachment your input/report file together with the config.log of your compilation?

Best,

Daniele