Error on Allocation of X_mat for parallel computation

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
DavidPolito93
Posts: 71
Joined: Sat Jun 06, 2020 10:43 am

Error on Allocation of X_mat for parallel computation

Post by DavidPolito93 » Wed Sep 30, 2020 4:27 pm

Dear all,

I am trying to do G0W0 computation on a linear chain made of 200 atoms along the z axis, performing the full frequency computation. I am running on GALILEO machines, with a mixed MPI+open MP parallelization in order to avoid OUT OF MEMORY errors.

This is the input file, together with the bash for submitting the parallel job:

#!/bin/bash
#SBATCH -N 5 # number of nodes
#SBATCH --mem=118000 # memory 86000MB for cache/flat nodes
#SBATCH --time=24:00:00 # time limits: 24 hour
#SBATCH --tasks-per-node=6
#SBATCH --cpus-per-task=6

gw0 # [R GW] GoWo Quasiparticle energy levels
rim_cut # [R RIM CUT] Coulomb potential
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
X_nCPU_LinAlg_INV= $ncpu
X_Threads=0 # [OPENMP/X] Number of threads for response functions
SE_Threads=0 # [OPENMP/GW] Number of threads for self-energy
DIP_Threads=0
RandQpts=0 # [RIM] Number of random q-points in the BZ
RandGvec= 1 RL # [RIM] Coulomb interaction RS components
CUTGeo= "ws Z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws X/Y/Z/XY..
CUTwsGvec= 1.1000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 50 Ry # [XX] Exchange RL components
VXCRLvcs= 424401 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% GbndRnge
1 | 500 | # [GW] G[W] bands range
%
XTermKind = "BG"
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
% BndsRnXd
1 | 500 | # [Xd] Polarization function bands
%
GTermKind = "BG"
NGsBlkXd= 3 Ry # [Xd] Response block size
% DmRngeXd
0.20000 | 0.20000 | eV # [Xd] Damping range
%
ETStpsXd= 100 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 1.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # # [GW] QP generalized Kpoint/Band indices
1|1|399|402|
%

Before starting the computation I reduced the number of RL vectors from 42441312 to 40000 RL, in order to reduce the computational time.

The output is the following:

[01] CPU structure, Files & I/O Directories
===========================================

* CPU-Threads :30(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)-1(threads@NL)
* MPI CPU : 30
* THREADS (max): 1
* THREADS TOT(max): 30
* I/O NODES : 5
* NODES(computing): 5
* (I/O): 1
* Fragmented WFs : yes

CORE databases in .
Additional I/O in .
Communications in .
Input file is gw_all_BZ_ff.in
Report file is ./r-all_BZ_ff_em1d_HF_and_locXC_gw0_rim_cut
Precision is SINGLE
Log files in ./LOG

Job string(s)-dir(s) (main): all_BZ_ff

[RD./SAVE//ns.db1]------------------------------------------
Bands : 500
K-points : 1
G-vectors [RL space]: 42441312
Components [wavefunctions]: 1513945
Symmetries [spatial+T-rev]: 16
Spinor components : 1
Spin polarizations : 1
Temperature [ev]: 0.000000
Electrons : 800.0000
WF G-vectors : 1513945
Max atoms/species : 200
No. of atom species : 1
Exact exchange fraction in XC : 0.000000
Exact exchange screening in XC : 0.000000
Magnetic symmetries : no
- S/N 000347 -------------------------- v.04.05.01 r.00165 -

[04] Coloumb potential CutOff :ws
=================================

Cut directions :Z
WS Cutoff [units to be defined]: 1.100000
Symmetry test passed :yes

Cutoff: 1.100000
n grid: 4 4 84

WS Direct Lattice(DL) unit cell [iru / cc(a.u.)]
A1 = 1.000000 0.000000 0.000000 18.89727 0.000000 0.000000
A2 = 0.000000 1.000000 0.000000 0.000000 18.89727 0.000000
A3 = 0.000000 0.000000 1.000000 0.000000 0.000000 478.8568

[WR./all_BZ_ff//ndb.cutoff]---------------------------------
Brillouin Zone Q/K grids (IBZ/BZ): 1 1 1 1
CutOff Geometry :ws z
Coulomb cutoff potential :ws z 1.100
Box sides length [au]: 0.00 0.00 0.00
Sphere/Cylinder radius [au]: 0.000000
Cylinder length [au]: 0.000000
RL components : 399997
RL components used in the sum : 399997
RIM corrections included :no
RIM RL components :0
RIM random points :0
- S/N 000347 -------------------------- v.04.05.01 r.00165 -

[05] Dipoles
============


[WARNING] DIPOLES database not correct or not present
[RD./SAVE//ns.kb_pp_pwscf]----------------------------------
Fragmentation :yes
- S/N 000347 -------------------------- v.04.05.01 r.00165 -

[WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
[WF-Oscillators/G space] Performing Wave-Functions I/O from ./SAVE

[WF-Oscillators/G space loader] Normalization (few states) min/max :0.865E-11 1.00

[WR./all_BZ_ff//ndb.dipoles]--------------------------------
Brillouin Zone Q/K grids (IBZ/BZ): 1 1 1 1
RL vectors (WF): 399997
Fragmentation :yes
Electronic Temperature [K]: 0.000000
Bosonic Temperature [K]: 0.000000
X band range : 1 500
X band range limits : 400 1
X e/h energy range [ev]:-1.000000 -1.000000
RL vectors in the sum : 399997
[r,Vnl] included :yes
Bands ordered :yes
Direct v evaluation :no
Field momentum norm :0.1000E-4
Approach used :G-space v
Dipoles computed :R V P
Wavefunctions :Perdew, Burke & Ernzerhof(X)+Perdew, Burke & Ernzerhof(C)
- S/N 000347 -------------------------- v.04.05.01 r.00165 -

Timing [Min/Max/Average]: 02h-15m-40s/02h-15m-43s/02h-15m-42s

[06] Dynamical Dielectric Matrix
================================

However, the computation stops with the following error:

[ERROR] STOP signal received while in :[06] Dynamical Dielectric Matrix

[ERROR]Allocation of X_mat failed

In the LOG directory I found:

<02h-16m-35s> P1-r039c02s08: [06] Dynamical Dielectric Matrix
<03h-39m-08s> P1-r039c02s08: Response_G_space parallel ENVIRONMENT is incomplete. Switching to defaults
<03h-39m-11s> P1-r039c02s08: [PARALLEL Response_G_space for K(bz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h-39m-11s> P1-r039c02s08: [PARALLEL Response_G_space for Q(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h-39m-11s> P1-r039c02s08: [PARALLEL Response_G_space for K-q(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h-39m-11s> P1-r039c02s08: [LA] SERIAL linear algebra
<03h-39m-11s> P1-r039c02s08: [PARALLEL Response_G_space for K(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h-39m-11s> P1-r039c02s08: [PARALLEL Response_G_space for CON bands on 5 CPU] Loaded/Total (Percentual):100/500(20%)


<03h-39m-11s> P1-r039c02s08: [PARALLEL Response_G_space for VAL bands on 3 CPU] Loaded/Total (Percentual):134/400(34%)
P1-r039c02s08: [ERROR] STOP signal received while in :[06] Dynamical Dielectric Matrix
P1-r039c02s08: [ERROR]Allocation of X_mat failed

Am I doing something wrong? Do you have any suggestion to overcome such problem?

Sincerely,
Davide Romanin
-----------------------------------------------------
PhD student in Physics XXXIII cycle
Representative of the PhD students in Physics
Applied Science and Technology department (DiSAT)
Politecnico di Torino
Corso Duca degli Abruzzi, 24
10129 Torino ITALY
------------------------------------------------------
-----------------------------------------------------
Assistant Professor
Polytech - Paris-Saclay University
C2N, CNRS
10 Bd Thomas Gobert
91120 Palaiseau

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Error on Allocation of X_mat for parallel computation

Post by Daniele Varsano » Thu Oct 01, 2020 7:40 am

Dear Davide,
in general, the full frequency calculations are very demanding and you are dealing with a large system, moreover, it is quite hard to converge with respect to the number of frequencies. Are you sure you need it? instead of the plasmon pole approximation? In general, I would discourage the use unless it is known that the plasmon pole does fail.

Anyway, what makes the calculation intense here are:

Code: Select all

BndsRnXd
NGsBlkXd
ETStpsXd
You can try to reduce one of these parameters and see if the calculation fits in your machine.
My suggestion is to compile the code using the following flag (if you have not already done):

Code: Select all

--enable-memory-profile 
and in the log files, you will find some info on the memory allocated so far and you can have an idea on how much memory you need.

Please note that the terminator techniques (Gterm,Xterm) do not apply in full frequency calculation.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

DavidPolito93
Posts: 71
Joined: Sat Jun 06, 2020 10:43 am

Re: Error on Allocation of X_mat for parallel computation

Post by DavidPolito93 » Thu Oct 01, 2020 8:52 am

Dear Daniele,

Thank you for your reply!

Yeah I thought about using the plasmon pole approximation, but some of the chains that I have to study are metals and I read that the PP approximation fails in that case. Am I wrong?

Anyway, I will try to adjust the parameters you told me and I will let you know! :)

Thanks,

Davide
-----------------------------------------------------
PhD student in Physics XXXIII cycle
Representative of the PhD students in Physics
Applied Science and Technology department (DiSAT)
Politecnico di Torino
Corso Duca degli Abruzzi, 24
10129 Torino ITALY
------------------------------------------------------
-----------------------------------------------------
Assistant Professor
Polytech - Paris-Saclay University
C2N, CNRS
10 Bd Thomas Gobert
91120 Palaiseau

Post Reply