MPI + OpenMP in yambo
Posted: Sun May 31, 2020 10:51 am
Dear all,
I am trying to use the a parallel GW job using Yambo 4.5.1. As my system is huge, i adopted mpi+ openmp with 256 cores (=4 nodes * 64 cores) but the jobs always failed.
Each node has 256 GB memory.
I used the parallelization setting as below:
'X_Threads= 16 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 16 # [OPENMP/X] Number of threads for dipoles
SE_Threads= 16 # [OPENMP/GW] Number of threads for self-energy
X_CPU= "1 1 2 4 2" # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_nCPU_LinAlg_INV= 32 # [PARALLEL] CPUs for Linear Algebra
DIP_CPU= "2 4 2" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
SE_CPU= "1 2 8" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)'
The LOG file shows the job stopped with
' <04s> P1: [01] CPU structure, Files & I/O Directories
<04s> P1-n1303.para.bscc: CPU-Threads:256(CPU)-1(threads)-16(threads@X)-16(threads@DIP)-16(threads@SE)
<04s> P1-n1303.para.bscc: CPU-Threads:DIP(environment)-2 4 2(CPUs)-k c v(ROLEs)
<04s> P1-n1303.para.bscc: CPU-Threads:X(environment)-1 1 2 4 2(CPUs)-q g k c v(ROLEs)
<04s> P1-n1303.para.bscc: CPU-Threads:SE(environment)-1 2 8(CPUs)-q qp b(ROLEs)
<04s> P1-n1303.para.bscc: [02] CORE Variables Setup
<04s> P1-n1303.para.bscc: [02.01] Unit cells
<06s> P1-n1303.para.bscc: [02.02] Symmetries
<06s> P1-n1303.para.bscc: [02.03] RL shells
<06s> P1-n1303.para.bscc: [02.04] K-grid lattice
<06s> P1-n1303.para.bscc: Grid dimensions : 4 4 2
<06s> P1-n1303.para.bscc: [02.05] Energies [ev] & Occupations
<06s> P1-n1303.para.bscc: [WARNING][X] Metallic system
<11s> P1-n1303.para.bscc: [03] Transferred momenta grid
<11s> P1-n1303.para.bscc: [04] Dipoles
<11s> P1-n1303.para.bscc: [DIP] Checking dipoles header
<12s> P1-n1303.para.bscc: [WARNING] DIPOLES database not correct or not present
<12s> P1-n1303.para.bscc: DIPOLES parallel ENVIRONMENT is incomplete. Switching to defaults
<12s> P1-n1303.para.bscc: [PARALLEL DIPOLES for K(ibz) on 1 CPU] Loaded/Total (Percentual):20/20(100%)
<12s> P1-n1303.para.bscc: [PARALLEL DIPOLES for CON bands on 2 CPU] Loaded/Total (Percentual):182/364(50%)
<12s> P1-n1303.para.bscc: [PARALLEL DIPOLES for VAL bands on 128 CPU] Loaded/Total (Percentual):2/176(1%)
<12s> P1-n1303.para.bscc: [x,Vnl] computed using 600 projectors
<12s> P1-n1303.para.bscc: [WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
<12s> P1-n1303.para.bscc: [MEMORY] Alloc kbv( 4.178512Gb) TOTAL: 4.429481Gb (traced) 10.06800Mb (memstat)
<12s> P1-n1303.para.bscc: Dipoles: P, V and iR (T): | | [000%] --(E) --(X)
<12s> P1-n1303.para.bscc: Reading kb_pp_pwscf_fragment_1
<12s> P1-n1303.para.bscc: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):500/500(100%)'
It may be related the memory issue? Any wrong about my parallelization setting? Anyone can help me to fix it?
The related files are attached.
Many thanks.
I am trying to use the a parallel GW job using Yambo 4.5.1. As my system is huge, i adopted mpi+ openmp with 256 cores (=4 nodes * 64 cores) but the jobs always failed.
Each node has 256 GB memory.
I used the parallelization setting as below:
'X_Threads= 16 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 16 # [OPENMP/X] Number of threads for dipoles
SE_Threads= 16 # [OPENMP/GW] Number of threads for self-energy
X_CPU= "1 1 2 4 2" # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_nCPU_LinAlg_INV= 32 # [PARALLEL] CPUs for Linear Algebra
DIP_CPU= "2 4 2" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
SE_CPU= "1 2 8" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)'
The LOG file shows the job stopped with
' <04s> P1: [01] CPU structure, Files & I/O Directories
<04s> P1-n1303.para.bscc: CPU-Threads:256(CPU)-1(threads)-16(threads@X)-16(threads@DIP)-16(threads@SE)
<04s> P1-n1303.para.bscc: CPU-Threads:DIP(environment)-2 4 2(CPUs)-k c v(ROLEs)
<04s> P1-n1303.para.bscc: CPU-Threads:X(environment)-1 1 2 4 2(CPUs)-q g k c v(ROLEs)
<04s> P1-n1303.para.bscc: CPU-Threads:SE(environment)-1 2 8(CPUs)-q qp b(ROLEs)
<04s> P1-n1303.para.bscc: [02] CORE Variables Setup
<04s> P1-n1303.para.bscc: [02.01] Unit cells
<06s> P1-n1303.para.bscc: [02.02] Symmetries
<06s> P1-n1303.para.bscc: [02.03] RL shells
<06s> P1-n1303.para.bscc: [02.04] K-grid lattice
<06s> P1-n1303.para.bscc: Grid dimensions : 4 4 2
<06s> P1-n1303.para.bscc: [02.05] Energies [ev] & Occupations
<06s> P1-n1303.para.bscc: [WARNING][X] Metallic system
<11s> P1-n1303.para.bscc: [03] Transferred momenta grid
<11s> P1-n1303.para.bscc: [04] Dipoles
<11s> P1-n1303.para.bscc: [DIP] Checking dipoles header
<12s> P1-n1303.para.bscc: [WARNING] DIPOLES database not correct or not present
<12s> P1-n1303.para.bscc: DIPOLES parallel ENVIRONMENT is incomplete. Switching to defaults
<12s> P1-n1303.para.bscc: [PARALLEL DIPOLES for K(ibz) on 1 CPU] Loaded/Total (Percentual):20/20(100%)
<12s> P1-n1303.para.bscc: [PARALLEL DIPOLES for CON bands on 2 CPU] Loaded/Total (Percentual):182/364(50%)
<12s> P1-n1303.para.bscc: [PARALLEL DIPOLES for VAL bands on 128 CPU] Loaded/Total (Percentual):2/176(1%)
<12s> P1-n1303.para.bscc: [x,Vnl] computed using 600 projectors
<12s> P1-n1303.para.bscc: [WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
<12s> P1-n1303.para.bscc: [MEMORY] Alloc kbv( 4.178512Gb) TOTAL: 4.429481Gb (traced) 10.06800Mb (memstat)
<12s> P1-n1303.para.bscc: Dipoles: P, V and iR (T): | | [000%] --(E) --(X)
<12s> P1-n1303.para.bscc: Reading kb_pp_pwscf_fragment_1
<12s> P1-n1303.para.bscc: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):500/500(100%)'
It may be related the memory issue? Any wrong about my parallelization setting? Anyone can help me to fix it?
The related files are attached.
Many thanks.