GW and BSE calculations with larger k-point mesh
GW and BSE calculations with larger k-point mesh
Hello YAMBO developers and users
I am trying to calculate GW quasiparticle energies and BSE spectra for a 2D material. I have the unit cell structure of 7 atoms. I am trying this calculation with a 24*24*1 k-points grid. When I run it with 8 CPUs on 1 node then it is working but I faced a time issue for the calculation. But when I increased the number of CPUs with higher nodes, then the calculation stopped (maybe memory issues). Please help me with how I can make a suitable parallelization of CPUs for this specific calculation.
Thank you
Daniele Varsano
Re: GW and BSE calculations with larger k-point mesh
Dear Nilesh,
If it is a memory issue, you can set a suitable parallelization strategy in the input file to distribute memory.
Can you post the input file together with submission script and error message of the problematic case?
Re: GW and BSE calculations with larger k-point mesh
Hello Mr. Daniele
I was trying to attach the file as attachments but it is not supporting. So I mentioned all the necessary details here,
1. Input file for GW calculation-
2. the submit script is here-
3. the output error file is here,
4. the log file is here-
The calculation stopped at this line " [DIP] Checking dipoles header".
Please let me know what can I do solve this issue?
#PBS -A OPEN-24-46
#PBS -N yambo
#PBS -q qlong
#PBS -l select=8:mpiprocs=36
#PBS -l walltime=144:00:00
#PBS -j oe
module load OpenMPI/4.0.3-GCC-9.3.0
#yambo -r -fatlog -x -p p -g n -V all -F
mpirun -np 256 yambo -F -J gw.out -C report
Code: Select all
Sat May 6 12:10:16 CEST 2023
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun noticed that process rank 215 with PID 1446841 on node cn98 exited on signal 9 (Killed).
Code: Select all
_| _| _|_| _| _| _|_|_| _|_|
_| _| _| _| _|_| _|_| _| _| _| _|
_| _|_|_|_| _| _| _| _|_|_| _| _|
_| _| _| _| _| _| _| _| _|
_| _| _| _| _| _|_|_| _|_|
<---> P1: [01] MPI/OPENMP structure, Files & I/O Directories
<---> MPI Cores-Threads : 256(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)-1(threads@NL)
<---> MPI Cores-Threads : DIP(environment)-16 4 4(CPUs)-k c v(ROLEs)
<---> MPI Cores-Threads : X_and_IO(environment)-1 1 16 4 4(CPUs)-q g k c v(ROLEs)
<---> MPI Cores-Threads : SE(environment)-16 4 4(CPUs)-q qp b(ROLEs)
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<---> [02.02] Symmetries
<---> [02.03] Reciprocal space
<---> [02.04] K-grid lattice
<---> Grid dimensions : 24 24
<---> [02.05] Energies & Occupations
<---> [03] Transferred momenta grid and indexing
<---> [04] Coloumb potential Random Integration (RIM)
<---> [04.01] RIM initialization
<---> Random points | | [000%] --(E) --(X)
<---> Random points |########################################| [100%] --(E) --(X)
<02s> [04.02] RIM integrals
<02s> Momenta loop | | [000%] --(E) --(X)
<04s> Momenta loop |########################################| [100%] 02s(E) 02s(X)
<04s> [05] Coloumb potential CutOffbox
<04s> Box | | [000%] --(E) --(X)
<06s> Box |########################################| [100%] --(E) --(X)
<06s> [06] Dipoles
<06s> [PARALLEL DIPOLES for K(ibz) on 16 CPU] Loaded/Total (Percentual):4/61(7%)
<06s> [PARALLEL DIPOLES for CON bands on 4 CPU] Loaded/Total (Percentual):75/300(25%)
<06s> [PARALLEL DIPOLES for VAL bands on 4 CPU] Loaded/Total (Percentual):14/56(25%)
<06s> [DIP] Checking dipoles header
<06s> [DIP] Database not correct or missing. To be computed
<06s> [x,Vnl] computed using 262 projectors
<06s> [WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
<07s> Dipoles: P, V and iR (T) | | [000%] --(E) --(X)
<07s> [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):75/300(25%)
<18s> Dipoles: P, V and iR (T) |# | [002%] 11s(E) 07m-47s(X)
<32s> Dipoles: P, V and iR (T) |########## | [025%] 25s(E) 01m-39s(X)
<45s> Dipoles: P, V and iR (T) |#################### | [050%] 38s(E) 01m-17s(X)
<58s> Dipoles: P, V and iR (T) |############################## | [075%] 51s(E) 01m-09s(X)
<01m-00s> Dipoles: P, V and iR (T) |########################################| [100%] 53s(E) 53s(X)
<01m-08s> [DIP] Writing dipoles header
<01m-09s> [07] Dynamic Dielectric Matrix (PPA)
<01m-16s> [WARNING] Response block size reduced to 1299 RL (7124 mHa)
<01m-16s> [PARALLEL Response_G_space_and_IO for K(bz) on 16 CPU] Loaded/Total (Percentual):36/576(6%)
<01m-16s> [PARALLEL Response_G_space_and_IO for Q(ibz) on 1 CPU] Loaded/Total (Percentual):61/61(100%)
<01m-16s> [PARALLEL Response_G_space_and_IO for K-q(ibz) on 1 CPU] Loaded/Total (Percentual):61/61(100%)
<01m-16s> [LA] SERIAL linear algebra
<01m-16s> [PARALLEL Response_G_space_and_IO for K(ibz) on 1 CPU] Loaded/Total (Percentual):61/61(100%)
<01m-16s> [PARALLEL Response_G_space_and_IO for CON bands on 4 CPU] Loaded/Total (Percentual):75/300(25%)
<01m-16s> [PARALLEL Response_G_space_and_IO for VAL bands on 4 CPU] Loaded/Total (Percentual):14/56(25%)
<01m-16s> [PARALLEL distribution for RL vectors(X) on 1 CPU] Loaded/Total (Percentual):1687401/1687401(100%)
<01m-17s> [DIP] Checking dipoles header
Please let me know what can I do solve this issue?
Daniele Varsano
Re: GW and BSE calculations with larger k-point mesh
Dear Nilesh,
this is not really comfortable to read, you can share files renaming them with a supported suffix as e.g. file.txt
Anyway, a good parallelization strategy to share memory among cpu, considering you are using 288 CPUs:
Furthermore, I strongly advise you to set:
as this is very memory intensive and not particularly efficient.
Re: GW and BSE calculations with larger k-point mesh
Dear Team,
I am also facing similar problem while interpolating band structures, which is supposedly arising from virtual memory issue. I tried using more nodes and less cores, so as to use higher memory, but all in vain.
The calculation was stopping every time at
<14s> P1: [05.01] G0W0 on the real axis
<14s> P1: Self_Energy parallel ENVIRONMENT is incomplete. Switching to defaults
<14s> P1: [PARALLEL Self_Energy for QPs on 3 CPU] Loaded/Total (Percentual):10260/30780(33%)
<14s> P1: [PARALLEL Self_Energy for Q(ibz) on 1 CPU] Loaded/Total (Percentual):81/81(100%)
<14s> P1: [PARALLEL Self_Energy for G bands on 2 CPU] Loaded/Total (Percentual):190/380(50%)
<15s> P1: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):20520/30780(67%)
It was creating the error file "yambo.80s-56187,node9.btr" which shows:
yambo:55185 terminated with signal 11 at PC=4af805 SP=7ffd064bf6c0. Backtrace: /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x4af805] /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x497eeb] /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x493469] /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x40bc25] /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x73f525] /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x406875] /usr/lib64/[0x2b59338823d5] /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo[0x4066a9]
After seeing this forum, I tried using:
X_and_IO_CPU= "1 1 1 48 4"
X_and_IO_ROLEs= "q g k c v"
SE_CPU= "1 4 48" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
Now, the "yambo.80s-56187,node9.btr" files are not created, still the calculation is stopping at the same place.
Any help would be highly appreciated.
Thanks and regards,
Harshita, Research Scholar, INST
Daniele Varsano
Re: GW and BSE calculations with larger k-point mesh
Dear Harshita,
to distribute memory the best parallelization strategy is to assign all the tasks in the b channel:
Please note that from the log you posted it seems you are using 6 mpi task, instead in the input file you are assigning 192 tasks.
The input variable should be consistent with the used resource otherwise Yambo switch to its default distribution, and it can be not the most efficient in terms of memory distribution.
BTW, how many QP particle are you calculating (%QPkrange) ?
If the problem persists, you can try to recompile Yambo using the flag --enable-memory-profile:
./configure --enable-memory-profile --other options
in this way, you can track the memory and see how much resource you would need.
to distribute memory the best parallelization strategy is to assign all the tasks in the b channel:
Code: Select all
SE_CPU= "1 1 ncpu" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
The input variable should be consistent with the used resource otherwise Yambo switch to its default distribution, and it can be not the most efficient in terms of memory distribution.
BTW, how many QP particle are you calculating (%QPkrange) ?
If the problem persists, you can try to recompile Yambo using the flag --enable-memory-profile:
./configure --enable-memory-profile --other options
in this way, you can track the memory and see how much resource you would need.
Re: GW and BSE calculations with larger k-point mesh
Dear Daniele,
Thanks for your response. I have corrected the mpi tasks in accordance to the input file, still the error persists.
Below is my input file:
HF_and_locXC # [R] Hartree-Fock
gw0 # [R] GW approximation
dyson # [R] Dyson Equation solver
EXXRLvcs= 165 RL # [XX] Exchange RL components
VXCRLvcs= 165 RL # [XC] XCpotential RL components
% GbndRnge
0 | 380 | # [GW] G[W] bands range
% BndsRnXd
0 | 380 | # [Xd] Polarization function bands
NGsBlkXd= 4 Ry # [Xd] Response block size
% DmRngeXd
0.050000 | 0.050000 | eV # [Xd] Damping range
ETStpsXd= 800 # [Xd] Total Energy steps
EMStpsXd= 100.0000 # [Xd] [o/o] Memory Energy steps
DrudeWXd= ( 0.000000 , 0.000000 ) eV # [Xd] Drude plasmon
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
GTermKind= "none" # [GW] GW terminator ("none","BG" Bruneval-Gonze,"BRS" Berger-Reining-Sottile)
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
X_and_IO_CPU= "1 1 1 16 1"
X_and_IO_ROLEs= "q g k c v"
SE_CPU= "1 1 16" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
And the script:
mpirun -ppn 4 -hostfile $PBS_NODEFILE -np 16 /apps/scratch/compile/yambo-5.1/yambo-5.1.1/bin/yambo -F -J all_Bz
Just to mention, the system is metallic and I haven't added the Drude term, I hope that isn't causing the calculation to run.
Thanks and regards,
Daniele Varsano
Re: GW and BSE calculations with larger k-point mesh
Dear Harhista,
the Drude term is harmless, and it is not the source of the problem.
Some comments on your input file:
1) It seems that you are doing a full-frequency calculation (real axis), this is computationally very intensive, are you sure that you cannot opt for the plasmon-pole approximation that it is somehow the standard for a gw calculation? This input file is generated using yambo -gw0 p
2) Bands range: the number of bands should start from 1 and not from zero.
3) You can attach in the forum (by rename it with e.g. *.txt) the report and log file, this can help to spot the problem.
4) In order to spot if it is a memory problem, you can lower the QPkrange calculating, for instance just for a couple of bands. Note that it is possible to split the calculation in many runs with different number of bands and then merge the calculated databases.
5) Not related with your problem, EXXRLvcs and VXCRLvcs have a very low value and are most probably not at convergence.
6) You can update the code to the latest version and eventually to compile it with the --enable-memory-profile option.
