Memory and CPU time from yambo-3.4.1 to yambo-3.4.0
Posted: Thu Apr 30, 2015 4:27 pm
Dear all,
I am computing the independent-particle optical properties (yambo -o
c) of a metal slab with adsorbates. All discussions below refer to
calculations on a single CPU.
I started by a problem with yambo-3.4.1 which I was able to reproduce
with the new version yambo-4.0.0. But before coming to that (into
another thread), I noticed something strange in the RAM/CPU cost while
executing the new version: this is the focus of the current message.
The system is moderately large: the SAVE folder generated by p2y takes
21GB with 430 bands. There are 96 KP and ~700 electrons. I can handle
the problem in house, but some calculations were tested also on
the HPC system galileo.cineca.it.
With yambo-3.4.1:
-----------------
The RAM taken by the calculation (yambo -o c, to compute the IP-RPA at
gamma) is about the size of the SAVE folder.
After the evaluation of the dipoles (about 1h) the calculation of
Xo@q1 took only few seconds (<10s).
With yambo-4.0.0:
-----------------
The calculation of the dipoles goes as above. Also, the final spectrum
is consistent. But as the calculation enters the evaluation of X0@q1
are the unexpected issues:
1) the RAM occupied grows up to 53GB
2) the calculation of X0@q1 takes 5hours, and slows down at about 42%
(see below the extract from l_optics_chi). This is reproducible at
the very same point by rerunning the calculation on the same machine
as well as from scratch (also the pw.x run) on galileo.cineca.it.
<16m-58s> Xo@q[1] |############### | [037%] 09m-06s(E) 24m-17s(X)
<20m-16s> Xo@q[1] |################ | [040%] 12m-24s(E) 31m-01s(X)
<31m-30s> Xo@q[1] |################# | [042%] 23m-38s(E) 55m-37s(X)
<05h-33m-15s> Xo@q[1] |################## | [045%] 05h-25m-22s(E) 12h-03m-03s(X)
<05h-36m-46s> Xo@q[1] |################### | [047%] 05h-28m-54s(E) 11h-32m-24s(X)
<05h-37m-55s> Xo@q[1] |#################### | [050%] 05h-30m-03s(E) 11h-00m-06s(X)
Is there something wrong in the way I do invoke the calculation? In
both cases, the input file automatically generated by yambo already
fits my needs (they are reported below).
Many thanks,
Guido
INPUT FOR 3.4.1:
======================================================================
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
Chimod= "IP" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 430 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 10.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.10000 | 0.10000 | eV # [Xd] Damping range
%
ETStpsXd= 1001 # [Xd] Total Energy steps
% LongDrXd
0.000000 | 0.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
======================================================================
INPUT FOR 4.0.0:
======================================================================
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
X_q_0_CPU= "1 1 1" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=1 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "IP" # [X] IP/Hartree/ALDA/LRC/BSfxc
NGsBlkXd= 1 RL # [Xd] Response block size
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 430 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 20.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.10000 | 0.10000 | eV # [Xd] Damping range
%
ETStpsXd= 2001 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
======================================================================
I am computing the independent-particle optical properties (yambo -o
c) of a metal slab with adsorbates. All discussions below refer to
calculations on a single CPU.
I started by a problem with yambo-3.4.1 which I was able to reproduce
with the new version yambo-4.0.0. But before coming to that (into
another thread), I noticed something strange in the RAM/CPU cost while
executing the new version: this is the focus of the current message.
The system is moderately large: the SAVE folder generated by p2y takes
21GB with 430 bands. There are 96 KP and ~700 electrons. I can handle
the problem in house, but some calculations were tested also on
the HPC system galileo.cineca.it.
With yambo-3.4.1:
-----------------
The RAM taken by the calculation (yambo -o c, to compute the IP-RPA at
gamma) is about the size of the SAVE folder.
After the evaluation of the dipoles (about 1h) the calculation of
Xo@q1 took only few seconds (<10s).
With yambo-4.0.0:
-----------------
The calculation of the dipoles goes as above. Also, the final spectrum
is consistent. But as the calculation enters the evaluation of X0@q1
are the unexpected issues:
1) the RAM occupied grows up to 53GB
2) the calculation of X0@q1 takes 5hours, and slows down at about 42%
(see below the extract from l_optics_chi). This is reproducible at
the very same point by rerunning the calculation on the same machine
as well as from scratch (also the pw.x run) on galileo.cineca.it.
<16m-58s> Xo@q[1] |############### | [037%] 09m-06s(E) 24m-17s(X)
<20m-16s> Xo@q[1] |################ | [040%] 12m-24s(E) 31m-01s(X)
<31m-30s> Xo@q[1] |################# | [042%] 23m-38s(E) 55m-37s(X)
<05h-33m-15s> Xo@q[1] |################## | [045%] 05h-25m-22s(E) 12h-03m-03s(X)
<05h-36m-46s> Xo@q[1] |################### | [047%] 05h-28m-54s(E) 11h-32m-24s(X)
<05h-37m-55s> Xo@q[1] |#################### | [050%] 05h-30m-03s(E) 11h-00m-06s(X)
Is there something wrong in the way I do invoke the calculation? In
both cases, the input file automatically generated by yambo already
fits my needs (they are reported below).
Many thanks,
Guido
INPUT FOR 3.4.1:
======================================================================
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
Chimod= "IP" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 430 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 10.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.10000 | 0.10000 | eV # [Xd] Damping range
%
ETStpsXd= 1001 # [Xd] Total Energy steps
% LongDrXd
0.000000 | 0.000000 | 1.000000 | # [Xd] [cc] Electric Field
%
======================================================================
INPUT FOR 4.0.0:
======================================================================
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
X_q_0_CPU= "1 1 1" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=1 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "IP" # [X] IP/Hartree/ALDA/LRC/BSfxc
NGsBlkXd= 1 RL # [Xd] Response block size
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 430 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 20.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.10000 | 0.10000 | eV # [Xd] Damping range
%
ETStpsXd= 2001 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
======================================================================