I/O activity from yambo jobs is too high for the filesystem
Posted: Thu Jan 30, 2020 10:27 pm
Dear yambo developers and users,
I was running GW using yambo in parallel over 4232 CPUs. However, the admins of clusters said that the code cause in/out problem as the I/O activities are too many for the filesystem to handle. Initially, I think that might be at the number of LOG files produced. When trying to use NLogCPUs variable, I decreased the number of LOG files about 46 times. Although the number LOG files generated are still 4232, the number of LOG files were frequently written only 92. However, the problem wasn't fixed. It seems that the I/O problem came out at step [06.01] G0W0 (W PPA) when yambo reading "ndb.pp_fragment_x" files. In the following, I pasted my input file
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
EXXRLvcs= 90 Ry # [XX] Exchange RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
GfnQPdb= "E < ./SAVE/ndb.QP"
XfnQPdb= "E < ./SAVE/ndb.QP"
% BndsRnXp
1 | 598 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 21 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 598 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # # [GW] QP generalized Kpoint/Band indices
1|47|37|64|
%
X_all_q_CPU= "1 2 46 46" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_LinAlg_INV= 4232 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
SE_CPU= "47 1 46" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0
NLogCPUs = 92 # [PARALLEL] Live-timing CPU`s (0 for all)
The system has 47 k-points and 46 valence bands. Here, I set SE_CPU= "47 1 46", so the number of CPUs used for calculation of self-energy is about a half of the total CPUs. The reason is, these steps require more memory than polarization calculations before (X_all_q_CPU= "1 2 46 46" ). If I set SE_CPU= "46 2 46", I will get a lack of memory when computing self-energy. I attached here two log files and the standard output file for more details.
Please give me some advice to reduce the I/O activities. Thank you.
I was running GW using yambo in parallel over 4232 CPUs. However, the admins of clusters said that the code cause in/out problem as the I/O activities are too many for the filesystem to handle. Initially, I think that might be at the number of LOG files produced. When trying to use NLogCPUs variable, I decreased the number of LOG files about 46 times. Although the number LOG files generated are still 4232, the number of LOG files were frequently written only 92. However, the problem wasn't fixed. It seems that the I/O problem came out at step [06.01] G0W0 (W PPA) when yambo reading "ndb.pp_fragment_x" files. In the following, I pasted my input file
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
ppa # [R Xp] Plasmon Pole Approximation
gw0 # [R GW] GoWo Quasiparticle energy levels
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
EXXRLvcs= 90 Ry # [XX] Exchange RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
GfnQPdb= "E < ./SAVE/ndb.QP"
XfnQPdb= "E < ./SAVE/ndb.QP"
% BndsRnXp
1 | 598 | # [Xp] Polarization function bands
%
NGsBlkXp= 10 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 1.000000 | 1.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 21 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 598 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # # [GW] QP generalized Kpoint/Band indices
1|47|37|64|
%
X_all_q_CPU= "1 2 46 46" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_LinAlg_INV= 4232 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
SE_CPU= "47 1 46" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 0
NLogCPUs = 92 # [PARALLEL] Live-timing CPU`s (0 for all)
The system has 47 k-points and 46 valence bands. Here, I set SE_CPU= "47 1 46", so the number of CPUs used for calculation of self-energy is about a half of the total CPUs. The reason is, these steps require more memory than polarization calculations before (X_all_q_CPU= "1 2 46 46" ). If I set SE_CPU= "46 2 46", I will get a lack of memory when computing self-energy. I attached here two log files and the standard output file for more details.
Please give me some advice to reduce the I/O activities. Thank you.