Page 1 of 1

Error in MPI in BSE calculations

Posted: Thu Sep 26, 2024 9:52 pm
by DmitrySkachkov
Hello,

I have an error in hybrid OpenMP-MPI calculations of BSE on 3d step (BSE solver), whereas, 1st step (BSE screening) and second step (BSE Kernel) are calculated without any errors.

The error:
[r2x08:04448] *** An error occurred in MPI_Allreduce
[r2x08:04448] *** reported by process [1228734465,28]
[r2x08:04448] *** on communicator MPI_COMM_WORLD
[r2x08:04448] *** MPI_ERR_COUNT: invalid count argument
[r2x08:04448] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[r2x08:04448] *** and potentially your MPI job)


The OpenMP-MPI version of Yambo was compiled from the GitHub version with the following modules:
intel/2020u4
intel-mkl/2020u4
openmpi/4.1.5:intel-2020
using the following configuration:
> ./configure --enable-memory-profile --enable-dp --enable-open-mp --enable-par-linalg \
FC=ifort F77=ifort CC=icc MPICC=mpicc MPIFC=mpifort
The compiled version:
This is yambo - MPI+OpenMP+SLK+HDF5_MPI_IO - Ver. 5.2.0 Revision 23096 Hash f147e08b32
The input file for the 3d step (BSE solver):

# __ __ ________ ___ __ __ _______ ______
# /_/\/_/\ /_______/\ /__//_//_/\ /_______/\ /_____/\
# \ \ \ \ \\::: _ \ \\::\| \| \ \\::: _ \ \\:::_ \ \
# \:\_\ \ \\::(_) \ \\:. \ \\::(_) \/_\:\ \ \ \
# \::::_\/ \:: __ \ \\:.\-/\ \ \\:: _ \ \\:\ \ \ \
# \::\ \ \:.\ \ \ \\. \ \ \ \\::(_) \ \\:\_\ \ \
# \__\/ \__\/\__\/ \__\/ \__\/ \_______\/ \_____\/
#
#
# Version 5.2.0 Revision 23096 Hash (prev commit) f147e08b32
# Branch is master
# MPI+OpenMP+SLK+HDF5_MPI_IO Build
# http://www.yambo-code.eu
#
bss # [R] BSE solver
optics # [R] Linear Response optical properties
dipoles # [R] Oscillator strenghts (or dipoles)
bse # [R][BSE] Bethe Salpeter Equation.
BSKmod= "SEX" # [BSE] IP/Hartree/HF/ALDA/SEX/BSfxc
BSEmod= "retarded" # [BSE] resonant/retarded/coupling
BSSmod= "d" # [BSS] (h)aydock/(d)iagonalization/(s)lepc/(i)nversion/(t)ddft`
BSENGexx= 40 Ry # [BSK] Exchange components
BSENGBlk=-1 RL # [BSK] Screened interaction block size [if -1 uses all the G-vectors of W(q,G,Gp)]
#WehCpl # [BSK] eh interaction included also in coupling
KfnQPdb= "E < SAVE/ndb.QP" # [EXTQP BSK BSS] Database action
KfnQP_INTERP_NN= 1 # [EXTQP BSK BSS] Interpolation neighbours (NN mode)
KfnQP_INTERP_shells= 20.00000 # [EXTQP BSK BSS] Interpolation shells (BOLTZ mode)
KfnQP_DbGd_INTERP_mode= "NN" # [EXTQP BSK BSS] Interpolation DbGd mode
% KfnQP_E
0.000000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
%
KfnQP_Z= ( 1.000000 , 0.000000 ) # [EXTQP BSK BSS] Z factor (c/v)
KfnQP_Wv_E= 0.000000 eV # [EXTQP BSK BSS] W Energy reference (valence)
% KfnQP_Wv
0.000000 | 0.000000 | 0.000000 | # [EXTQP BSK BSS] W parameters (valence) eV| 1|eV^-1
%
KfnQP_Wv_dos= 0.000000 eV # [EXTQP BSK BSS] W dos pre-factor (valence)
KfnQP_Wc_E= 0.000000 eV # [EXTQP BSK BSS] W Energy reference (conduction)
% KfnQP_Wc
0.000000 | 0.000000 | 0.000000 | # [EXTQP BSK BSS] W parameters (conduction) eV| 1 |eV^-1
%
KfnQP_Wc_dos= 0.000000 eV # [EXTQP BSK BSS] W dos pre-factor (conduction)
% BSEQptR
1 | 1 | # [BSK] Transferred momenta range
%
% BSEBands
111 | 122 | # [BSK] Bands range
%
% BEnRange
0.00000 | 10.00000 | eV # [BSS] Energy range
%
% BDmRange
0.100000 | 0.100000 | eV # [BSS] Damping range
%
BEnSteps= 100 # [BSS] Energy steps
% BLongDir
0.000000 | 1.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
BSEprop= "abs" # [BSS] Can be any among abs/jdos/kerr/magn/dich/photolum/esrt
BSEdips= "none" # [BSS] Can be "trace/none" or "xy/xz/yz" to define off-diagonal rotation plane
WRbsWF # [BSS] Write to disk excitonic the WFs

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
K_Threads= 0 # [OPENMP/BSK] Number of threads for response functions
NLogCPUs= 10 # [PARALLEL] Live-timing CPU`s (0 for all)
PAR_def_mode= "balanced" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")

Could you please suggest how to improve it.

Thank you,
Dmitry

Re: Error in MPI in BSE calculations

Posted: Fri Sep 27, 2024 8:42 am
by Daniele Varsano
Dear Dmitry,

the report/log files would be useful to understand at which point the error appears and the dimensions of the BS matrix.
In the meanwhile, as you are opting for a full diagonalization, unless the BS matrix is very large you can do the diagonalization in serial.

Best,
Daniele

Re: Error in MPI in BSE calculations

Posted: Fri Sep 27, 2024 2:57 pm
by DmitrySkachkov
Dear Daniele,

Thank you for the reply.

Here are the report and log files.
The system for running is 1 node with 64 cores and 2000Gb of memory.

Re: Error in MPI in BSE calculations

Posted: Mon Sep 30, 2024 7:49 am
by Daniele Varsano
Dear Dmitry,

it seems a problem due to the scalapack: note that you are running with 32 MPI, using a 4x4 scalapack grid.
You can try to set a different parallel strategy for the diagonalization as 4,16,64 etc.. setting the corresponding resource in the job script, and see if the problem persists This can be done by setting the BS_nCPU_LinAlg_DIAGO variable (using the option -V par variable governing the parallelization strategy will appear).

Having said that, in your case you have a quite large BS matrix (dimension=46656) and I strongly suggest you to try iterative algorithms, i.e. Haydock if you are interested in the spectra only, or slepc if you are interested also in the firsts eigenvecotrs. Note that to use the slepc algorithm the libraries should be linked and this is done adding the slepc option in the configure command. (--enable-slepc-linalg)

Best,
Daniele