BSE stop with "Allocation of K_slk%blc failed"

sdwang · Post by **sdwang** » Tue Aug 13, 2019 9:03 am

Dear developers,
I met a new problem when running BSE as:
...
<04d-15h-25m-39s> P0003: Kernel |################# | [085%] 03d-17h-09m-34s(E) 04d-08h-53m-09s(X)
<04d-18h-32m-47s> P0003: Kernel |################## | [090%] 03d-20h-16m-43s(E) 04d-06h-31m-32s(X)
<04d-21h-34m-55s> P0003: Kernel |################### | [095%] 03d-23h-18m-50s(E) 04d-04h-19m-21s(X)
<05d-00h-03m-35s> P0003: Kernel |####################| [100%] 04d-01h-47m-30s(E) 04d-01h-47m-30s(X)
<05d-04h-00m-33s> P0003: [07] BSE solver(s)
<05d-04h-00m-33s> P0003: [LA] SERIAL linear algebra

<05d-04h-00m-33s> P0003: [07.01] Inversion solver
P0003: [ERROR] STOP signal received while in :[07.01] Inversion solver
P0003: [ERROR]Allocation of K_slk%blc failed
....
And I generate the infut as:./yambo -b -o b -k sex -y i -V all, and the input corresponding to the paralell is:
PAR_def_mode= "balanced" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
X_all_q_CPU= "1 1 1 28 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_CPU= "1 14 2" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra

Thanks!

SD

Post by **Daniele Varsano** » Thu Aug 22, 2019 10:12 am

Dear Shudong,
we will have a look to it, anyway does the calculation run smoothly when using the BSE solver as "diago" (-y d) or "haydock" (-y h) ?
Many thanks,

Daniele

sdwang · Post by **sdwang** » Thu Aug 22, 2019 10:26 am

Ciao Daniele,
Actually this is the question I wan to ask here. The calculation stops. And I found the BSE calculation with -d or -i used all of my memory, and the memory is increasing when the BSE calculation running until the memory has been used up and the calculation dies. The GW calculation is OK, but why the BSE uses much more memory even for my 3 atoms 2D MoS2? It seems the BSE can not allocate the memory to the cores. My BSE input for MoS2 is:
GPL Version 4.3.2 Revision 134. (Based on r.15658 h.afdb12
# MPI+SLK Build
# http://www.yambo-code.org
#
rim_cut # [R RIM CUT] Coulomb potential
optics # [R OPT] Optics
bss # [R BSS] Bethe Salpeter Equation solver
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
bse # [R BSE] Bethe Salpeter Equation.
bsk # [R BSK] Bethe Salpeter Equation kernel
StdoHash= 20 # [IO] Live-timing Hashes
Nelectro= 26.00000 # Electrons number
ElecTemp= 0.000000 eV # Electronic Temperature
BoseTemp=-1.000000 eV # Bosonic Temperature
OccTresh=0.1000E-4 # Occupation treshold (metallic bands)
NLogCPUs=0 # [PARALLEL] Live-timing CPU`s (0 for all)
DBsIOoff= "none" # [IO] Space-separated list of DB with NO I/O. DB=(DIP,X,HF,COLLs,J,GF,CARRIERs,W,SC,BS,ALL)
DBsFRAGpm= "none" # [IO] Space-separated list of +DB to FRAG and -DB to NOT FRAG. DB=(DIP,X,W,HF,COLLS,K,BS,QINDX,RT,ELP
FFTGvecs= 45 Ry # [FFT] Plane-waves
#WFbuffIO # [IO] Wave-functions buffered I/O
PAR_def_mode= "memory" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
X_all_q_CPU= "1 1 32 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_CPU= "1 1 32" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
NonPDirs= "none" # [X/BSS] Non periodic chartesian directions (X,Y,Z,XY...)
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
I set the k with the cores, but it took much memory than above setting.

Thanks!

Shudong

Post by **Daniele Varsano** » Thu Aug 22, 2019 11:13 am

Dear Shudong,
can you post the complete input/report file? What is the dimension of your matrix and the kernel parameter?
Best,
Daniele

sdwang · Post by **sdwang** » Thu Aug 22, 2019 1:39 pm

Dear Daniele,
Attached are the files of input and log. Please note that I used 1 Ry of block zise but still using 256 G memory for 2D MoS2.

Thanks!

Ciao

Shudong

Post by **Daniele Varsano** » Thu Aug 22, 2019 4:17 pm

Dear Shudong,

Please note that I used 1 Ry of block zise

OK, but here the problem is not the building of the kernel that it is done correctly, but the solver.
I can see that despite it is a simple 2D-MoS2 you have a BSE matrix dimension of 19200. The inversion I think allocate two matrices of this size that corresponds approximately to 11Gb which is not distributed. Do you have 11Gb of memory per core?
My suggestion is:
1) Try to run the inversion runlevel in serial: in this way, you will have all the memory of the node at your disposal.
2) Reduce the BSE matrix e.g. (23-30) and see if the calculations run successfully.
3) I do not know if it can help, but you can try to update the code to a more recent version: this will help us in case some debug is needed.

Best,
Daniele

sdwang · Post by **sdwang** » Fri Aug 23, 2019 5:39 am

Dear Daniele,
I removed the parallel setting and it does not work, and I reduced the BSE matrix to 6400 (e.g. 23-30), it works. But if I need more transition states, I have to include more than 23-30 bands and the matrix dimension increases again... I used double-precision version and include SOC in MoS2, does this matter with the problem?
ps: how did you figure out that the matrix dimension of 19200 corresponds about to 11Gb?

Thanks!

Best,

Shudong

Post by **Daniele Varsano** » Sat Aug 24, 2019 9:13 am

Dear Shudong,

I used double-precision version and include SOC in MoS2, does this matter with the problem?

Yes, it matters as both features contribute to increasing the memory needed, anyway SOC is needed for MoS2, maybe you can try to work in single precision, probably you will not lose precision.

ps: how did you figure out that the matrix dimension of 19200 corresponds about to 11Gb?

Just a rough calculation. Element in the matrix NxN, times 16 byte (complex numbers) divided by (1024)^3 to have it in Gb. in the case of inversion you need to allocate two matrices of that size.

Best,
Daniele

z.hooshmand · Post by **z.hooshmand** » Mon Aug 26, 2019 1:50 pm

Dear Andre,

First I should apologize for posting this under this subject however it was impossible to find the tab to post my question.
I am new to yambo and what I am calculating is absorption spectrum of a nanoparticle with over 70 atoms therefore the size of all the files
is big. I did the relaxations of this system using quantum espresso. To avoid getting "segmentation fault" I read the parallelism over yambo and tried to use it on the cluster where yambo is installed.
This has been unsuccessful so far. So here are my questions:

1. this is what I have added to my job file: mpirun -np 8 /home/zhg/APPS/yambo-4.1.5/bin/yambo -F yambo.in -J test
after a few minutes it stops with NO message in log files or report files. I tried the tutorial on the website (h-BN-2D) using parallel running and without it. While without parallelism it run well, with it, it stops. What could cause this?

2. in the output file, o.eps_q1_inv_rpa_dyson only one line is printed which I am guessing is because of issue of number 1. How to resolve this issue to get these outputs completely?

3. Why there are several of o.eps_q1_inv_rpa_dyson file (01, 02, ...). And for final analysis which one should be used?

4. Finally to do linear response absorption spectrum calculations, which kernel must be used? The tutorials are not clear on this one.

Thanks for your help in advance.

Zahra Hooshmand
University of Central Florida

Post by **Daniele Varsano** » Mon Aug 26, 2019 2:06 pm

Dear Zahra,

1. this is what I have added to my job file: mpirun -np 8 /home/zhg/APPS/yambo-4.1.5/bin/yambo -F yambo.in -J test
after a few minutes it stops with NO message in log files or report files. I tried the tutorial on the website (h-BN-2D) using parallel running and without it. While without parallelism it run well, with it, it stops. What could cause this?

If the tutorial example does not run in parallel I do suspect that the mpi version of yambo it is not properly installed or something changed in the mpi modules with respect the one used for installation, maybe you should ask the cluster administrator. A question, you do have no messages in the report/log or any error messages. In the first case please post the report and logs.

2. in the output file, o.eps_q1_inv_rpa_dyson only one line is printed which I am guessing is because of issue of number 1. How to resolve this issue to get these outputs completely?

No, I think that this is unrelated with point one, you should check the energy range and energy steps in your input file ( EnRngeXd and EnStps).

Why there are several of o.eps_q1_inv_rpa_dyson file (01, 02, ...). And for final analysis which one should be used?

The number of output is equal to the number of q vector you asked (QpntsRXd). For optics you should set equal to (1|1) i.e. q=0. If you have the same file with the suffix _01, _02 etc, it means that you repeated the calculation many the times. Yambo does not overwrite previous output, but increase the suffix number.

4. Finally to do linear response absorption spectrum calculations, which kernel must be used? The tutorials are not clear on this one.

For a linear response calculation including local filed effects Chimod should be set as "Hartree".

Best,
Daniele

Yambo Community Forum

BSE stop with "Allocation of K_slk%blc failed"

BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Re: BSE stop with "Allocation of K_slk%blc failed"

Parallelruns with Yambo

Re: BSE stop with "Allocation of K_slk%blc failed"