Advice on parameters
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano
- Davide Sangalli
- Posts: 640
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
Re: Advice on parameters
Dear Bjoern,
I confirm the input variable for the number of CPUs to be used for the linear algebra has been renamed.
We are still working on the linear algebra implementation.
A more efficient interface with scalapack will be likely released soon with yambo 4.2
D.
I confirm the input variable for the number of CPUs to be used for the linear algebra has been renamed.
We are still working on the linear algebra implementation.
A more efficient interface with scalapack will be likely released soon with yambo 4.2
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
-
- Posts: 42
- Joined: Wed Aug 04, 2010 8:39 am
- Location: Eindhoven, The Netherlands
- Contact:
Re: Advice on parameters
Hey Daniele,
I'm still struggling to get my head around the G0W0 step. I had to increase the k-point sampling a bit. As you suggested, I split up the QPkrange, and used the MPI/OpenMP setup as we discussed:
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 400 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 60| 31|32|
%
[/code]
It takes about 20 hours until it enters the G0W0 step, which appears fine to me. But then it starts doing the strange thing again.
At the end, the timing for 60*2=120 QP corrections approaches 2 days. From your experience, is that final run time realistic for the set of parameters I chose? I'm somehow concerned that there is some slowdown due to allocation or something....
Cheers,
Bjoern
I'm still struggling to get my head around the G0W0 step. I had to increase the k-point sampling a bit. As you suggested, I split up the QPkrange, and used the MPI/OpenMP setup as we discussed:
Code: Select all
#
# __ __ ________ ___ __ __ _______ ______
# /_/\/_/\ /_______/\ /__//_//_/\ /_______/\ /_____/\
# \ \ \ \ \\::: _ \ \\::\| \| \ \\::: _ \ \\:::_ \ \
# \:\_\ \ \\::(_) \ \\:. \ \\::(_) \/_\:\ \ \ \
# \::::_\/ \:: __ \ \\:.\-/\ \ \\:: _ \ \\:\ \ \ \
# \::\ \ \:.\ \ \ \\. \ \ \ \\::(_) \ \\:\_\ \ \
# \__\/ \__\/\__\/ \__\/ \__\/ \_______\/ \_____\/
#
#
# GPL Version 4.1.1 Revision 112
# MPI+OpenMP Build
# http://www.yambo-code.org
#
gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
NLogCPUs=0 # [PARALLEL] Live-timing CPU`s (0 for all)
X_all_q_CPU= "1 7 2 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads= 2 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 2 # [OPENMP/X] Number of threads for dipoles
SE_CPU= "1 7 2" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 2 # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs= 47591 RL # [XX] Exchange RL components
Chimod= "" # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
1 | 200 | # [Xp] Polarization function bands
%
NGsBlkXp= 7 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 0.000000 | 0.000000 | # [Xp] [cc] Electric Field
%[code]
% GbndRnge
1 | 400 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 60| 31|32|
%
[/code]
It takes about 20 hours until it enters the G0W0 step, which appears fine to me. But then it starts doing the strange thing again.
Code: Select all
<19h-42m-43s> P0001: [07] Dyson equation: Newton solver
<19h-42m-43s> P0001: [07.01] G0W0 (W PPA)
<19h-42m-44s> P0001: [PARALLEL Self_Energy for QPs on 7 CPU] Loaded/Total (Percentual):18/120(15%)
<19h-42m-44s> P0001: [PARALLEL Self_Energy for Q(ibz) on 1 CPU] Loaded/Total (Percentual):85/85(100%)
<19h-42m-44s> P0001: [PARALLEL Self_Energy for G bands on 2 CPU] Loaded/Total (Percentual):200/400(50%)
<19h-42m-44s> P0001: [M 1.854 Gb] Alloc WF ( 1.760)
<19h-42m-44s> P0001: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):12000/24000(50%)
<19h-42m-45s> P0001: [WF] Performing Wave-Functions I/O from ./SAVE
<19h-42m-45s> P0001: [FFT-GW] Mesh size: 27 27 27
<19h-42m-45s> P0001: [M 1.872 Gb] Alloc wf_disk ( 0.018)
<19h-42m-45s> P0001: Reading wf_fragments_1_1
<19h-42m-45s> P0001: Reading wf_fragments_2_1
<19h-42m-49s> P0001: Reading wf_fragments_3_1
<19h-42m-49s> P0001: Reading wf_fragments_4_1
<19h-42m-49s> P0001: Reading wf_fragments_5_1
<19h-42m-50s> P0001: Reading wf_fragments_6_1
<19h-42m-50s> P0001: Reading wf_fragments_7_1
<19h-42m-51s> P0001: Reading wf_fragments_8_1
<19h-42m-51s> P0001: Reading wf_fragments_9_1
<19h-42m-51s> P0001: Reading wf_fragments_10_1
<19h-42m-52s> P0001: Reading wf_fragments_11_1
<19h-42m-52s> P0001: Reading wf_fragments_12_1
<19h-42m-52s> P0001: Reading wf_fragments_13_1
<19h-42m-52s> P0001: Reading wf_fragments_14_1
<19h-42m-52s> P0001: Reading wf_fragments_15_1
<19h-42m-53s> P0001: Reading wf_fragments_16_1
<19h-42m-53s> P0001: Reading wf_fragments_17_1
<19h-42m-53s> P0001: Reading wf_fragments_18_1
<19h-42m-53s> P0001: Reading wf_fragments_19_1
<19h-42m-54s> P0001: Reading wf_fragments_20_1
<19h-42m-54s> P0001: Reading wf_fragments_21_1
<19h-42m-54s> P0001: Reading wf_fragments_22_1
<19h-42m-54s> P0001: Reading wf_fragments_23_1
<19h-42m-55s> P0001: Reading wf_fragments_24_1
<19h-42m-55s> P0001: Reading wf_fragments_25_1
<19h-42m-55s> P0001: Reading wf_fragments_26_1
<19h-42m-55s> P0001: Reading wf_fragments_27_1
<19h-42m-56s> P0001: Reading wf_fragments_28_1
<19h-42m-56s> P0001: Reading wf_fragments_29_1
<19h-42m-56s> P0001: Reading wf_fragments_30_1
<19h-42m-56s> P0001: Reading wf_fragments_31_1
<19h-42m-56s> P0001: Reading wf_fragments_32_1
<19h-42m-56s> P0001: Reading wf_fragments_33_1
<19h-42m-57s> P0001: Reading wf_fragments_34_1
<19h-42m-57s> P0001: Reading wf_fragments_35_1
<19h-42m-57s> P0001: Reading wf_fragments_36_1
<19h-42m-57s> P0001: Reading wf_fragments_37_1
<19h-42m-57s> P0001: Reading wf_fragments_38_1
<19h-42m-58s> P0001: Reading wf_fragments_39_1
<19h-42m-58s> P0001: Reading wf_fragments_40_1
<19h-42m-58s> P0001: Reading wf_fragments_41_1
<19h-42m-58s> P0001: Reading wf_fragments_42_1
<19h-42m-59s> P0001: Reading wf_fragments_43_1
<19h-42m-59s> P0001: Reading wf_fragments_44_1
<19h-42m-59s> P0001: Reading wf_fragments_45_1
<19h-42m-59s> P0001: Reading wf_fragments_46_1
<19h-42m-59s> P0001: Reading wf_fragments_47_1
<19h-43m-00s> P0001: Reading wf_fragments_48_1
<19h-43m-00s> P0001: Reading wf_fragments_49_1
<19h-43m-00s> P0001: Reading wf_fragments_50_1
<19h-43m-00s> P0001: Reading wf_fragments_51_1
<19h-43m-01s> P0001: Reading wf_fragments_52_1
<19h-43m-01s> P0001: Reading wf_fragments_53_1
<19h-43m-01s> P0001: Reading wf_fragments_54_1
<19h-43m-01s> P0001: Reading wf_fragments_55_1
<19h-43m-02s> P0001: Reading wf_fragments_56_1
<19h-43m-02s> P0001: Reading wf_fragments_57_1
<19h-43m-02s> P0001: Reading wf_fragments_58_1
<19h-43m-02s> P0001: Reading wf_fragments_59_1
<19h-43m-03s> P0001: Reading wf_fragments_60_1
<19h-43m-03s> P0001: [M 1.854 Gb] Free wf_disk ( 0.018)
<19h-43m-03s> P0001: Reading pp_fragment_1
<19h-43m-03s> P0001: G0W0 (W PPA) | | [000%] --(E) --(X)
<19h-43m-03s> P0001: Reading pp_fragment_1
<19h-45m-15s> P0001: Reading pp_fragment_2
<20h-11m-47s> P0001: Reading pp_fragment_3
<20h-12m-03s> P0001: G0W0 (W PPA) |# | [002%] 29m-00s(E) 19h-20m-22s(X)
<20h-29m-28s> P0001: Reading pp_fragment_4
<20h-42m-43s> P0001: Reading pp_fragment_5
<20h-43m-17s> P0001: G0W0 (W PPA) |## | [005%] 01h-00m-13s(E) 20h-04m-34s(X)
<21h-35m-48s> P0001: Reading pp_fragment_6
<22h-28m-53s> P0001: Reading pp_fragment_7
<22h-29m-43s> P0001: G0W0 (W PPA) |### | [007%] 02h-46m-39s(E) 01d-13h-02m-10s(X)
<22h-55m-23s> P0001: Reading pp_fragment_8
<23h-48m-21s> P0001: Reading pp_fragment_9
<23h-49m-27s> P0001: G0W0 (W PPA) |#### | [010%] 04h-06m-24s(E) 01d-17h-04m-05s(X)
<01d-00h-06m-01s> P0001: Reading pp_fragment_10
<01d-01h-52m-16s> P0001: Reading pp_fragment_11
Cheers,
Bjoern
Dr. Bjoern Baumeier
Eindhoven University of Technology
Eindhoven, The Netherlands
Eindhoven University of Technology
Eindhoven, The Netherlands
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Advice on parameters
Dar Bjoern,
I cannot advise you much. If you need a big k point sampling the QP calculation can be quite time consuming, so it is not unreasonable.
It does not look you have problem of swapping due to allocation as it seems you allocate less than 2 Gb, which is not much, but I do not know
how much memory per core you have.
You can try to play with the SE_CPU variable, putting a divisor of 120 as qp in order to better distribute workload among cpus and move more processor on bands.
Best,
Daniele
I cannot advise you much. If you need a big k point sampling the QP calculation can be quite time consuming, so it is not unreasonable.
It does not look you have problem of swapping due to allocation as it seems you allocate less than 2 Gb, which is not much, but I do not know
how much memory per core you have.
You can try to play with the SE_CPU variable, putting a divisor of 120 as qp in order to better distribute workload among cpus and move more processor on bands.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- Davide Sangalli
- Posts: 640
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
P
Dar Bjoern,
I tend to agree with Daniele, it is reasonable.
The estimate growing maybe related to symmetry operation. This is my guess.
The estimate is indeed done considering the time needed to perform a single iteration in a loop and knowing how many iterations in total are needed.
In the first steps operations without rotating the wave--functions are done. Later the code starts to rotate the wave--functions, the time for a single iteration in a loop increases and accordingly the estimate.
Anyway, whatever the reason, it is something I usually experience in GW runs, so I would not say it is strange.
Instead, if you want to try to improve the memory distribution try:
Best,
D.
I tend to agree with Daniele, it is reasonable.
The estimate growing maybe related to symmetry operation. This is my guess.
The estimate is indeed done considering the time needed to perform a single iteration in a loop and knowing how many iterations in total are needed.
In the first steps operations without rotating the wave--functions are done. Later the code starts to rotate the wave--functions, the time for a single iteration in a loop increases and accordingly the estimate.
Anyway, whatever the reason, it is something I usually experience in GW runs, so I would not say it is strange.
Instead, if you want to try to improve the memory distribution try:
Code: Select all
SE_CPU= "1 2 7" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
-
- Posts: 42
- Joined: Wed Aug 04, 2010 8:39 am
- Location: Eindhoven, The Netherlands
- Contact:
Re: Advice on parameters
Hi Daniele and Davide,
Thanks for all your help. Now, I have finally determined all the QP corrections I needed. Now, I wanted to determine the band structures with interpolation in ypp.
However, something seems to go wrong even without QP corrections. Let me tell you what I did:
I am at a loss at what it is that I have done wrong. Any suggestions?
Thanks for all your help. Now, I have finally determined all the QP corrections I needed. Now, I wanted to determine the band structures with interpolation in ypp.
However, something seems to go wrong even without QP corrections. Let me tell you what I did:
- DFT nscf calculation with pw.x for Silicon
- p2y, followed by yambo initialization
- used ypp -n to remove time reversal symmetry
- yambo init in FixSymm folder
- ypp -s b to interpolate band structure between Gamma and X using 10 points along the line
Code: Select all
# |k| b1 b2 b3 b4 b5 b6 b7 b8 kx ky kz
#
0.00000 -12.28288 -0.49402 -0.21131 -0.31651 2.43757 2.38482 1.93734 3.23343 0.00000 0.00000 0.00000
Dr. Bjoern Baumeier
Eindhoven University of Technology
Eindhoven, The Netherlands
Eindhoven University of Technology
Eindhoven, The Netherlands
- Davide Sangalli
- Posts: 640
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
Re: Advice on parameters
Dear Bjoern,
is the gamma point part of the original nscf grid ?
In case it is not, and the degeneracy exist only at the gamma point,
it is not guarantee it will be respected.
If you are using a recent version of ypp you should also have a file in output called "o-*_built_in"
where the energies of the k-point in the nscf grid which are part of the path are printed.
You can check the output of that file.
Please pay attention that the k point there are written in internal k-units, while the k-points in the
output of the file "o-*_interpolated" are written according to the "cooIn=" input variable.
So you cannot compare the two. We are going to fix this in future releases
D.
is the gamma point part of the original nscf grid ?
In case it is not, and the degeneracy exist only at the gamma point,
it is not guarantee it will be respected.
If you are using a recent version of ypp you should also have a file in output called "o-*_built_in"
where the energies of the k-point in the nscf grid which are part of the path are printed.
You can check the output of that file.
Please pay attention that the k point there are written in internal k-units, while the k-points in the
output of the file "o-*_interpolated" are written according to the "cooIn=" input variable.
So you cannot compare the two. We are going to fix this in future releases
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
-
- Posts: 42
- Joined: Wed Aug 04, 2010 8:39 am
- Location: Eindhoven, The Netherlands
- Contact:
Re: Advice on parameters
Hi Davide,
Thanks. Looking at the "o-*_built-in", it looks like the degeneracy at the nscf k-points belonging to the path is ok. But indeed, Gamma is not part of the original grid, so I guess that's just the issue.
Having suspected something like that, I have also been considering calculating the QP corrections at k-points along the high-symmetry lines explicitly. I have tried using ypp -k k to generate a BZ k-point grid as in the documentation here http://www.yambo-code.org/input_file/ypp/ypp_kk.php
What I noticed is that for some points, ypp did not give any index for the actual user point such as in the last line here
Does that indicate that the user k-point is in fact part of the original grid somehow (probably translated by some RL vectors? Also, how many user k-points can I supply in one go? Or would I have to determine each user k-point individually?
Cheers,
Bjoern
Thanks. Looking at the "o-*_built-in", it looks like the degeneracy at the nscf k-points belonging to the path is ok. But indeed, Gamma is not part of the original grid, so I guess that's just the issue.
Having suspected something like that, I have also been considering calculating the QP corrections at k-points along the high-symmetry lines explicitly. I have tried using ypp -k k to generate a BZ k-point grid as in the documentation here http://www.yambo-code.org/input_file/ypp/ypp_kk.php
What I noticed is that for some points, ypp did not give any index for the actual user point such as in the last line here
Code: Select all
0.4333334 -0.4666666 0.0000000 0.1025641
0.1000000 0.2000000 0.0000000 0.1025641 1
Cheers,
Bjoern
Dr. Bjoern Baumeier
Eindhoven University of Technology
Eindhoven, The Netherlands
Eindhoven University of Technology
Eindhoven, The Netherlands
-
- Posts: 42
- Joined: Wed Aug 04, 2010 8:39 am
- Location: Eindhoven, The Netherlands
- Contact:
Re: Advice on parameters
As an addition, I'm not sure if I'm doing this right. I wanted to add the point 0.0 1.0 0.0, ran ypp with the attached output. After re-running quantum espresso nscf with the generated 145 k-points, I tried initializing yambo again, and set IkXLim= 60 because the original grid had 60 k-points.
However, when initializing, yambo complains about the grid not being uniform:
I guess that's not okay?
Bjoern
However, when initializing, yambo complains about the grid not being uniform:
Code: Select all
<---> [01] CPU structure, Files & I/O Directories
<---> CPU-Threads:1(CPU)-4(threads)
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<---> [02.02] Symmetries
<---> [02.03] RL shells
<---> Shells finder |########################################| [100%] --(E) --(X)
<---> [02.04] K-grid lattice
<---> [02.05] Energies [ev] & Occupations
<---> [03] Transferred momenta grid
<---> [WARNING][RL indx] 2 equivalent points in the rlu grid found
<01s> [RL indx] X grid is not uniform. Gamma point only.
<01s> [04] External corrections
<01s> [05] Game Over & Game summary
Bjoern
You do not have the required permissions to view the files attached to this post.
Dr. Bjoern Baumeier
Eindhoven University of Technology
Eindhoven, The Netherlands
Eindhoven University of Technology
Eindhoven, The Netherlands