gw calculation on 0D sysmem

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
ljzhou86
Posts: 85
Joined: Fri May 03, 2013 10:20 am

gw calculation on 0D sysmem

Post by ljzhou86 » Sun Sep 23, 2018 11:31 am

Dear YAMBO developers,

I am doing the GW calculation on a 0D system with 120 electrons by using"yambo -g n -p p -r", but I got the follow error in the log file:

<---> P0128: [01] CPU structure, Files & I/O Directories
<---> P0128: CPU-Threads:128(CPU)-1(threads)-4(threads@X)-4(threads@DIP)-4(threads@SE)
<---> P0128: [02] CORE Variables Setup
<---> P0128: [02.01] Unit cells
<03s> P0128: [02.02] Symmetries
<03s> P0128: [02.03] RL shells
<03s> P0128: [02.04] K-grid lattice
<03s> P0128: [02.05] Energies [ev] & Occupations
<03s> P0128: [03] Transferred momenta grid
<03s> P0128: [04] Coloumb potential Random Integration (RIM)
<03s> P0128: [04.01] RIM initialization
<03s> P0128: Random points | | [000%] --(E) --(X)
<06s> P0128: Random points |########################################| [100%] 03s(E) 03s(X)
<06s> P0128: [04.02] RIM integrals
<06s> P0128: [WARNING] Empty workload for CPU 128
<06s> P0128: Momenta loop | | [***%] --(E) --(X)
<23s> P0128: [05] Coloumb potential CutOff :box
<23s> P0128: Box | | [000%] --(E) --(X)
<28s> P0128: Box |################## | [045%] 05s(E) 11s(X)
<33s> P0128: Box |#################################### | [090%] 10s(E) 11s(X)
<34s> P0128: Box |########################################| [100%] 11s(E) 11s(X)
<35s> P0128: [06] Dynamic Dielectric Matrix (PPA)
<35s> P0128: Response_G_space parallel ENVIRONMENT is incomplete. Switching to defaults
<35s> P0128: [PARALLEL Response_G_space for K(bz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<35s> P0128: [PARALLEL Response_G_space for Q(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<35s> P0128: [PARALLEL Response_G_space for K-q(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<35s> P0128: [LA] SERIAL linear algebra
<35s> P0128: [PARALLEL Response_G_space for CON bands on 4 CPU] Loaded/Total (Percentual):113/452(25%)


<35s> P0128: [PARALLEL Response_G_space for VAL bands on 4 CPU] Loaded/Total (Percentual):15/60(25%)
P0128: [ERROR] STOP signal received while in :[06] Dynamic Dielectric Matrix (PPA)
P0128: [ERROR]Allocation of X_mat failed

#################
The R file stopped as:

......
Timing [Min/Max/Average]: 20s/20s/20s

[05] Coloumb potential CutOff :box
==================================

Cut directions :XYZ
Box sides [au]: 45.50000 45.50000 45.50000
Symmetry test passed :yes

[WR./04gw//ndb.cutoff]--------------------------------------
Brillouin Zone Q/K grids (IBZ/BZ): 1 1 1 1
CutOff Geometry :box xyz
Coulomb cutoff potential :box xyz 45.50045.50045.500
Box sides length [au]: 45.50000 45.50000 45.50000
Sphere/Cylinder radius [au]: 0.000000
Cylinder length [au]: 0.000000
RL components : 150699
RL components used in the sum : 150699
RIM corrections included :yes
RIM RL components : 123
RIM random points : 3000000
- S/N 001555 -------------------------- v.04.03.00 r.00129 -

Timing [Min/Max/Average]: 11s/11s/11s

[06] Dynamic Dielectric Matrix (PPA)
====================================


[ERROR] STOP signal received while in :[06] Dynamic Dielectric Matrix (PPA)

#################

The input file for GW calculation is as follows:

gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
rim_cut # [R RIM CUT] Coulomb potential
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
NLogCPUs=0 # [PARALLEL] Live-timing CPU`s (0 for all)
X_Threads= 4 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 4 # [OPENMP/X] Number of threads for dipoles
SE_Threads= 4 # [OPENMP/GW] Number of threads for self-energ
RandQpts= 3000000 # [RIM] Number of random q-points in the BZ
RandGvec= 123 RL # [RIM] Coulomb interaction RS components
CUTGeo= "box xyz" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere X/Y/Z/XY..
% CUTBox
45.50000 | 45.50000 | 45.50000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
EXXRLvcs= 150991 RL # [XX] Exchange RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 512 | # [Xp] Polarization function bands
%
NGsBlkXp= 6 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 0.000000 | 0.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 512 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 1| 55| 66|
%
(END)

#########
Could you help me to parse it and fix it?

Best,
Liujiang
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002

User avatar
Daniele Varsano
Posts: 4198
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: gw calculation on 0D sysmem

Post by Daniele Varsano » Sun Sep 23, 2018 3:47 pm

Dear Dr. Zhou Liu-Jiang,

it seems that you are experiencing allocation memory problem.
What you can do is to lower your convergence parameters for the calculation of the screening if possible, taking care to not lose too much accuracy:

Code: Select all

% BndsRnXp
1 | 512 | # [Xp] Polarization function bands
%
NGsBlkXp= 6 Ry # [Xp] Response block size
See if you can lower these parameters not loosing too much accuracy.

Another option is to parallelize as much as you can over bands using MPI raising the number of CPU:

Code: Select all

X_all_q_CPU= "1 1 8 4"                # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v"              # [PARALLEL] CPUs roles (q,g,k,c,v)
This is just an example using 32 CPUs. This will distribute memory load over CPUs. Next, most probably you would need to do the same for the self-energy:

Code: Select all

SE_CPU= "1 1 32"                     # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b"                   # [PARALLEL] CPUs roles (q,qp,b)
It seems to me that this is a quite heavy calculation and you will need to adopt both strategies.
I do not know on what kind of machine are you running but it is important you reserve enough memory in your nodes.
Consider also to lower the number of G vectors in the FFT: this is set by the FFTGvecs variable.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

ljzhou86
Posts: 85
Joined: Fri May 03, 2013 10:20 am

Re: gw calculation on 0D sysmem

Post by ljzhou86 » Wed Oct 17, 2018 7:34 am

Dear Daniele

I have tried these measures as you suggested. However, no matter how small “BndsRnXp" I use, the job of GW calculation still received aborted message very quickly as "application called MPI_Abort(MPI_COMM_WORLD, 0) - process 235
srun: Job step aborted: Waiting up to 32 seconds for job step to finish".

If I did the calculation of "Inverse Dielectric Matrix" using "yambo -b", it as well stopped very fast at " [06] Static Dielectric Matrix". Strangely, I also did gw+BSE calculation on other systems, such as 2D or 3D sysmtes with a similar number of atoms, and found that these jobs could come through relevant steps without errors. Note that our clusters of High Performance Computing are strong and should not suffer from memory issues for a system with only 120 electrons. Are any special setting in input files for calculations of 0D systems using yambo? Thanks a lot. Liujiang
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002

User avatar
Daniele Varsano
Posts: 4198
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: gw calculation on 0D sysmem

Post by Daniele Varsano » Wed Oct 17, 2018 8:29 am

Dear Zhou Liu-Jiang,
There is not any special setting for trating 0D system. Paramteres are naturally different as you have just 1 k point and many G vectors because of the large supercell needed.
If I did the calculation of "Inverse Dielectric Matrix" using "yambo -b", it as well stopped very fast at " [06] Static Dielectric Matrix".
I see from the previous log file the proble is in the allocation of the response matrix.
should not suffer from memory issues for a system with only 120 electrons
Number of electrons affects just the occupied bands, memory issue can come from high number of conduction band and G vectors to beincluded.

If you post your report and input file I will have a look and see if there is something odd.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

ljzhou86
Posts: 85
Joined: Fri May 03, 2013 10:20 am

Re: gw calculation on 0D sysmem

Post by ljzhou86 » Wed Oct 17, 2018 6:31 pm

Dear Daniele

Please see and parse the attached input and r files for the calculation of "Inverse Dielectric Matrix.
You do not have the required permissions to view the files attached to this post.
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002

User avatar
Daniele Varsano
Posts: 4198
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: gw calculation on 0D sysmem

Post by Daniele Varsano » Thu Oct 18, 2018 9:32 am

Dear Zhou Liu-Jiang,
I really suspect that you have memory problems in allocating the response matrix.
You have a large box and 5Ry in energy cutoff I suspect that it means to have a matrix NxN with N of around 15K or 20K Gvectors.
Note that when assigning G components using energy cutoff these depend on the size of your box (in 2D, 3D systems this translate in a much smaller number).
Next, I can see you assign in input cpus as 256, but then the job seems that has been launched using 80 cpu.
I do not know the architectures of your machine, please check how much RAM do you have per CPU in your run.
For this kind of calculation, I can roughly estimate you would need more than 5 Gb per CPU.
So the solution here is to lower your NGsBlkXs value, or using less cpu per node in order to reserve more memory.
Note that memory estimation is quite rough.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply