Page 1 of 1

BSE parallel Issue

Posted: Wed Dec 23, 2020 10:09 am
by jasonhan0710
Dear all,

I have tried to run BSE in parallel, however, an error occurred in the parallel progress. The cluster has 64 cores per node with 192G RAM. When I run BSE with 5 nodes and 16 cores on each node, with 4 process on each task (OMP threads),

Code: Select all

DIP_CPU= "5 4 4"            # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"         # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads=  4             # [OPENMP/X] Number of threads for dipoles
X_CPU= "1 1 4 4 5"          # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v"       # [PARALLEL] CPUs roles (q,g,k,c,v)
X_nCPU_LinAlg_INV= 80   # [PARALLEL] CPUs for Linear Algebra
X_Threads=  4                 # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 8 10"           # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b"         # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads=  4
the calculation would stop at screening part and have no output file. However, when I use 8 nodes and 16 cores per node,

Code: Select all

DIP_CPU= "16 2 4"            # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"           # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads=  4               # [OPENMP/X] Number of threads for dipoles
X_CPU= "4 1 4 2 4"           # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v"         # [PARALLEL] CPUs roles (q,g,k,c,v)
X_nCPU_LinAlg_INV= 128   # [PARALLEL] CPUs for Linear Algebra
X_Threads=  4                   # [OPENMP/X] Number of threads for response functions
SE_CPU= " 1 16 8"             # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b"           # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads=  4
The static screening part will go successfully and end in 2 hours. Does yambo in parallel need particular number of cores?

What's more, the BSE calculation said that the calculation aborted unexpectedly, because of the insufficient memory size. Is there anyone who can tell me how to reduce the memory in BSE calculation?

Best,
Jason

Re: BSE parallel Issue

Posted: Wed Dec 23, 2020 2:02 pm
by Daniele Varsano
Dear Jason,

in the calculation of the screening also the memory is distributed between cores, in particular parallelizing on "c","v" is advantageous for this task. I discourage instead the parallelization over "q" as it can induce a bit of unbalance, especially in presence symmetries.

Best,
Daniele