Page 2 of 3

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Mon Apr 22, 2019 9:25 am
by ljzhou86
Dear Daniele

I have fixed the memory issue by setting
"X_all_q_CPU= "1 1 1 32 36" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 4 # [PARALLEL] CPUs for Linear Algebra
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles"
..."
I then processed to the BSE calculation by using "yambo -o b -k sex -y d -r -V par" and do the following settings as
"optics # [R OPT] Optics
bss # [R BSS] Bethe Salpeter Equation solver
rim_cut # [R RIM CUT] Coulomb potential
bse # [R BSE] Bethe Salpeter Equation.
bsk # [R BSK] Bethe Salpeter Equation kernel
NLogCPUs=8 # [PARALLEL] Live-timing CPU`s (0 for all)
PAR_def_mode= "memory" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
BS_CPU= "32 1 36" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
K_Threads=0 # [OPENMP/BSK] Number of threads for response functions
RandQpts= 3000000 # [RIM] Number of random q-points in the BZ
RandGvec= 123 RL # [RIM] Coulomb interaction RS components
CUTGeo= "box z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws X/Y/Z/XY..
% CUTBox
0.00000 | 0.00000 | 55.00000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
BSEmod= "resonant" # [BSE] resonant/retarded/coupling
BSKmod= "SEX" # [BSE] IP/Hartree/HF/ALDA/SEX
BSSmod= "h" # [BSS] (h)aydock/(d)iagonalization/(i)nversion/(t)ddft`
BSENGexx= 68539 RL # [BSK] Exchange components
BSENGBlk= 4 Ry # RL # [BSK] Screened interaction block size
#WehCpl # [BSK] eh interaction included also in coupling
% KfnQP_E
0.8000000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
% BEnRange
0.00000 | 8.00000 | eV # [BSS] Energy range
%
% BDmRange
0.050000 | 0.050000 | eV # [BSS] Damping range
%
BEnSteps=600 # [BSS] Energy steps
% BLongDir
1.000000 | 0.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
% BSEBands
375 | 396 | # [BSK] Bands range
%
#WRbsWF
BSHayTrs= -0.02000 # [BSS] Relative [o/o] Haydock treshold. Strict(>0)/Average(<0)
(END)
..."

However, I got the log file with the following prompts as
"<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 43.31600Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 45.47000Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 1.075000Mb) TOTAL: 46.54900Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 1.075000Mb) TOTAL: 47.62800Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 49.78200Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 51.93600Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [PARALLEL Response_T_space for (e/h) Groups on 1080 CPU] Loaded/Total (Percentual):10/10800(0%)
<04s> P0001: [MEMORY] Alloc px%element_2D( 455.6250Mb) TOTAL: 507.6920Mb (traced) 37.02000Mb (memstat)
<05s> P0001: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 1 CPU] Loaded/Total (Percentual):54005/58325400(0%)
<05s> P0001: [PARALLEL Response_T_space for CON bands on 1080 CPU] Loaded/Total (Percentual):1/715(0%)
<05s> P0001: [PARALLEL Response_T_space for VAL bands on 1 CPU] Loaded/Total (Percentual):385/385(100%)
<05s> P0001: [06.01] Transition Groups build-up
<05m-34s> P0001: [06.02] CPU-dependent Block structure
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.219194Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.220210Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.221226Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.222242Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.223258Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.224274Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.225290Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.226306Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.227322Gb (traced) 37.02000Mb (memstat)
..."

The report file got termination in the steps of [06] as
" [K]Fermi Level [ev]: 0.000000
[K]VBM / CBm [ev]: 0.000000 1.990708
[K]Electronic Temp. [ev K]: 0.00 0.00
[K]Bosonic Temp. [ev K]: 0.00 0.00
[K]Finite Temperature mode: no
[K]El. density [cm-3]: 0.108E+24
[K]States summary : Full Metallic Empty
0001-0385 0386-1100
[K]Indirect Gaps [ev]: 1.990708 2.240225
[K]Direct Gaps [ev]: 1.997653 2.240225
[QP apply] Ind. Gap Correction [ev]: 0.800000

[06] Response Functions in Transition space
===========================================


[WARNING]Allocation attempt of W%p of zero size.

[ERROR] STOP signal received while in :[06] Response Functions in Transition space
(END)"

I guess this issue is also from memory. I have tried massive attempts of varying parallization methods and also reduced the values of "BSENGexx", "BSENGBlk" and "BSEBands", however, all the attempts failed in the step of "[06] Response Functions in Transition space"

Do you have any nice suggestions to address this issue? Attached are the three input (ljbse), log and report files files. Hope this could be helping. Thanks a lot

Best

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Tue Apr 23, 2019 9:40 am
by Daniele Varsano
Dear Zhou Liu-Jiang,
looking at your report, you have 10k point sampling in the IBZ, and this is not compatible with the 32 assignments on k point you set in parallelization.
I do not know if this is causing problems to the run, but it is surely a waste of resources. Moreover your BSE matrix (Nc x Nv x Nbz) is rather small (dim ~2000), I would reduce
the total number of CPU for this run. By the way, if this is a 0D system, why are you using a k point grid instead of performing a gamma only calculation?
Best,
Daniele

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Tue Apr 23, 2019 12:18 pm
by ljzhou86
Hi Daniele

I have tried this kind of seting 10 assignments on k point to match the 10 k-points. But it did not give me any improvement. The system containing 115 atoms is 2D monolayer interfaced with a 0D cluster. I will do the run by using the reduced number of cpu.

Thanks a lot

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Wed Apr 24, 2019 11:40 pm
by ljzhou86
Dear Daniele

Daniele Varsano wrote: looking at your report, you have 10k point sampling in the IBZ, and this is not compatible with the 32 assignments on k point you set in parallelization.
I do not know if this is causing problems to the run, but it is surely a waste of resources. Moreover your BSE matrix (Nc x Nv x Nbz) is rather small (dim ~2000), I would reduce
the total number of CPU for this run.
I reduced the total number of CPU and made this kind parallelization as
"BS_CPU= "10 3 3" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads=4 # [OPENMP/X] Number of threads for response functions
DIP_Threads=4 # [OPENMP/X] Number of threads for dipoles
K_Threads=4 # [OPENMP/BSK] Number of threads for response functions"

However, it still got stuck in the step of "[06.02] CPU-dependent Block structure" . I attached the input, report and log files here. Please help me to spot it. THanks a lot.

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Thu Apr 25, 2019 7:12 am
by Daniele Varsano
Dear Dr. Zhou Liu-Jiang,
you need to reduce the number of CPU consistently, the BS_CPU now it is not consistent with the number of used CPU.
In this case, yambo ignores the setting and switch to the default. I have the impression anyway that you are running out of memory,
how much memory per core you have available?

I suggest you run the calculations using the same input but with 90 CPU instead of 360. If possible try to. use few CPU per node in order to reserve more
memory per core.

Best,
Daniele

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Thu Apr 25, 2019 10:49 pm
by ljzhou86
Dear Daniele

THanks. I will furthur reduce the # of cpu in my calculation. The reason that I made such setting is based on the equation: "number of BS_CPU (90)" times " number of Threads (4) " = the total number of used cpu (360)? Is it wrong when carrying out the MPI+openMPI strategey?

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Fri Apr 26, 2019 7:36 am
by Daniele Varsano
Dear Zhou Liu-Jiang,
the strategy is right, but this setting depends on the number of cores you have inside a node and the setting of your submission script.
An example: in a node of e.g. 16 core you can reserve the entire node, ask for 4 CPU (MPI) and set 4 threads.
In your previous run you asked for 360 MPI processes, from the report:

Code: Select all

* CPU-Threads     :360(CPU)-1(threads)-4(threads@X)-4(threads@DIP)-4(threads@K)
 * CPU-Threads     :BS(environment)-10 3 3(CPUs)-k eh t(ROLEs)
 * MPI CPU         :  360
 * THREADS    (max):  4
 * THREADS TOT(max): 1440
Best,
Daniele

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Tue Apr 30, 2019 12:49 am
by ljzhou86
Hi Daniele
ljzhou86 wrote:Hi Daniele

I have tried this kind of seting 10 assignments on k point to match the 10 k-points. But it did not give me any improvement. The system containing 115 atoms is 2D monolayer interfaced with a 0D cluster. I will do the run by using the reduced number of cpu.
I have done these settings as you suggested, but the memory issue still persists. The system contains 115 atoms with up to 770 electrons. The total memory per 1 cpu is 3.56 G. No matter how large or small cpu resources I used, the calculation ended in the step of "[06.01] Transition Groups build-up" and got the prompt like "[MEMORY] Alloc BS_blk(iB)%mat( 164.4510Mb) TOTAL: 2.987147Gb (traced) 36.43600Mb (memstat)" Could you help me to build up a parallization stratege?

By the way, how to understand the difference of these key words regarding memory issues, namely Total, Traced, memstat? Does 'TOTAL" mean the total memory size that the one cpu should have? But I found there are many lines, showing the increased TOTAL value not a fixed one. How about further "memstat, traced"?

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Tue Apr 30, 2019 7:57 am
by Daniele Varsano
Dear Zhou,
the Total, is the amount of memory allocated at that point of the calculation. As you can see it keeps increasing up it reaches the max memory you have at disposal, then it goes out-of-memory and crashes.
What you can do is:
1) Allow more memory per node. This can be done by your submission script by using less core per node. I do not know what queue system are you using, but usually, it is possible by reseving more nodes and assigning few tasks per node.
2) Try to reduce the number of bands in the BSE

Code: Select all

 BSEBands
. Start with a small number of bands and increase to see if you arrive at convergence using less bands than you are using now.

Daniele

Re: stuck in caluclation of static inverse dielectric matrix

Posted: Thu May 02, 2019 11:12 pm
by ljzhou86
Dear Naniele
Daniele Varsano wrote: the Total, is the amount of memory allocated at that point of the calculation. As you can see it keeps increasing up it reaches the max memory you have at disposal, then it goes out-of-memory and crashes.
What you can do is:
1) Allow more memory per node. This can be done by your submission script by using less core per node. I do not know what queue system are you using, but usually, it is possible by reseving more nodes and assigning few tasks per node.
My submission script is as follows
"
#!/bin/bash
##SBATCH -t 16:00:00
#SBATCH -t 0:10:00
#SBATCH -N 32
#SBATCH --partition=standard
#SBATCH --ntasks-per-node=6
#SBATCH --ntasks=192
#SBATCH -A s17_cint

module load intel/17.0.4 intel-mpi/2017.1 mkl/11.4.1 hdf5-serial/1.8.16 netcdf-serial/4.4.0

MPIRUN="mpirun -np 192"

#$MPIRUN yambo -F Inputs/05w
$MPIRUN yambo -F Inputs/ljbse -J ljbse"

The total memory per node is up to 128 G. I use less cores (6 cores) per node (36 cores) so as to get sixfold amount of available meory per node. Are these settings in submission script right?

However, I still suffered from the memory issue as reported in log file
""
...
<12s> P0001: [PARALLEL Response_T_space for (e/h) Groups on 24 CPU] Loaded/Total (Percentual):2/240(1%)
<12s> P0001: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 1 CPU] Loaded/Total (Percentual):242/28920(1%)
<12s> P0001: [PARALLEL Response_T_space for CON bands on 24 CPU] Loaded/Total (Percentual):30/715(4%)
<12s> P0001: [PARALLEL Response_T_space for VAL bands on 1 CPU] Loaded/Total (Percentual):385/385(100%)
<12s> P0001: [06.01] Transition Groups build-up
<15s> P0001: [06.02] CPU-dependent Block structure
<15s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.027819Gb) TOTAL: 1.219881Gb (traced) 36.43600Mb (memstat)
""
The TOTAL got decreased but my job still jumped out very fast. Reducing the number of bands in BSE also did not make sense to address this issue.

Do you have any other suggestions? Thanks a lot