Advice on parameters

Post by **Davide Sangalli** » Tue Dec 06, 2016 10:30 am

Dear Bjoern,
I confirm the input variable for the number of CPUs to be used for the linear algebra has been renamed.
We are still working on the linear algebra implementation.

A more efficient interface with scalapack will be likely released soon with yambo 4.2

D.

bob · Post by **bob** » Thu Dec 08, 2016 5:18 pm

Hey Daniele,

I'm still struggling to get my head around the G0W0 step. I had to increase the k-point sampling a bit. As you suggested, I split up the QPkrange, and used the MPI/OpenMP setup as we discussed:

Code: Select all

#                                                           
#  __  __   ________   ___ __ __    _______   ______        
# /_/\/_/\ /_______/\ /__//_//_/\ /_______/\ /_____/\       
# \ \ \ \ \\::: _  \ \\::\| \| \ \\::: _  \ \\:::_ \ \      
#  \:\_\ \ \\::(_)  \ \\:.      \ \\::(_)  \/_\:\ \ \ \     
#   \::::_\/ \:: __  \ \\:.\-/\  \ \\::  _  \ \\:\ \ \ \    
#     \::\ \  \:.\ \  \ \\. \  \  \ \\::(_)  \ \\:\_\ \ \   
#      \__\/   \__\/\__\/ \__\/ \__\/ \_______\/ \_____\/   
#                                                           
#                                                           
#             GPL Version 4.1.1 Revision 112                
#                    MPI+OpenMP Build                       
#                http://www.yambo-code.org                  
#
gw0                          # [R GW] GoWo Quasiparticle energy levels
ppa                          # [R Xp] Plasmon Pole Approximation
HF_and_locXC                 # [R XX] Hartree-Fock Self-energy and Vxc
em1d                         # [R Xd] Dynamical Inverse Dielectric Matrix
NLogCPUs=0                   # [PARALLEL] Live-timing CPU`s (0 for all)
X_all_q_CPU= "1 7 2 1"              # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v"            # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_LinAlg_INV= 1   # [PARALLEL] CPUs for Linear Algebra
X_Threads= 2                # [OPENMP/X] Number of threads for response functions
DIP_Threads= 2              # [OPENMP/X] Number of threads for dipoles
SE_CPU= "1 7 2"                   # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b"                 # [PARALLEL] CPUs roles (q,qp,b)
SE_Threads= 2               # [OPENMP/GW] Number of threads for self-energy
EXXRLvcs=  47591       RL    # [XX] Exchange RL components
Chimod= ""                   # [X] IP/Hartree/ALDA/LRC/BSfxc
% BndsRnXp
    1 |  200 |               # [Xp] Polarization function bands
%
NGsBlkXp= 7            Ry    # [Xp] Response block size
% LongDrXp
 1.000000 | 0.000000 | 0.000000 |        # [Xp] [cc] Electric Field
%[code]

PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 400 | # [GW] G[W] bands range
%
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 60| 31|32|
%
[/code]

It takes about 20 hours until it enters the G0W0 step, which appears fine to me. But then it starts doing the strange thing again.

Code: Select all

<19h-42m-43s> P0001: [07] Dyson equation: Newton solver
 <19h-42m-43s> P0001: [07.01] G0W0 (W PPA)
 <19h-42m-44s> P0001: [PARALLEL Self_Energy for QPs on 7 CPU] Loaded/Total (Percentual):18/120(15%)
 <19h-42m-44s> P0001: [PARALLEL Self_Energy for Q(ibz) on 1 CPU] Loaded/Total (Percentual):85/85(100%)
 <19h-42m-44s> P0001: [PARALLEL Self_Energy for G bands on 2 CPU] Loaded/Total (Percentual):200/400(50%)
 <19h-42m-44s> P0001: [M  1.854 Gb] Alloc WF ( 1.760)
 <19h-42m-44s> P0001: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):12000/24000(50%)
 <19h-42m-45s> P0001: [WF] Performing Wave-Functions I/O from ./SAVE
 <19h-42m-45s> P0001: [FFT-GW] Mesh size: 27  27  27
 <19h-42m-45s> P0001: [M  1.872 Gb] Alloc wf_disk ( 0.018)
 <19h-42m-45s> P0001: Reading wf_fragments_1_1
 <19h-42m-45s> P0001: Reading wf_fragments_2_1
 <19h-42m-49s> P0001: Reading wf_fragments_3_1
 <19h-42m-49s> P0001: Reading wf_fragments_4_1
 <19h-42m-49s> P0001: Reading wf_fragments_5_1
 <19h-42m-50s> P0001: Reading wf_fragments_6_1
 <19h-42m-50s> P0001: Reading wf_fragments_7_1
 <19h-42m-51s> P0001: Reading wf_fragments_8_1
 <19h-42m-51s> P0001: Reading wf_fragments_9_1
 <19h-42m-51s> P0001: Reading wf_fragments_10_1
 <19h-42m-52s> P0001: Reading wf_fragments_11_1
 <19h-42m-52s> P0001: Reading wf_fragments_12_1
 <19h-42m-52s> P0001: Reading wf_fragments_13_1
 <19h-42m-52s> P0001: Reading wf_fragments_14_1
 <19h-42m-52s> P0001: Reading wf_fragments_15_1
 <19h-42m-53s> P0001: Reading wf_fragments_16_1
 <19h-42m-53s> P0001: Reading wf_fragments_17_1
 <19h-42m-53s> P0001: Reading wf_fragments_18_1
 <19h-42m-53s> P0001: Reading wf_fragments_19_1
 <19h-42m-54s> P0001: Reading wf_fragments_20_1
 <19h-42m-54s> P0001: Reading wf_fragments_21_1
 <19h-42m-54s> P0001: Reading wf_fragments_22_1
 <19h-42m-54s> P0001: Reading wf_fragments_23_1
 <19h-42m-55s> P0001: Reading wf_fragments_24_1
 <19h-42m-55s> P0001: Reading wf_fragments_25_1
 <19h-42m-55s> P0001: Reading wf_fragments_26_1
 <19h-42m-55s> P0001: Reading wf_fragments_27_1
 <19h-42m-56s> P0001: Reading wf_fragments_28_1
 <19h-42m-56s> P0001: Reading wf_fragments_29_1
 <19h-42m-56s> P0001: Reading wf_fragments_30_1
 <19h-42m-56s> P0001: Reading wf_fragments_31_1
 <19h-42m-56s> P0001: Reading wf_fragments_32_1
 <19h-42m-56s> P0001: Reading wf_fragments_33_1
 <19h-42m-57s> P0001: Reading wf_fragments_34_1
 <19h-42m-57s> P0001: Reading wf_fragments_35_1
 <19h-42m-57s> P0001: Reading wf_fragments_36_1
 <19h-42m-57s> P0001: Reading wf_fragments_37_1
 <19h-42m-57s> P0001: Reading wf_fragments_38_1
 <19h-42m-58s> P0001: Reading wf_fragments_39_1
 <19h-42m-58s> P0001: Reading wf_fragments_40_1
 <19h-42m-58s> P0001: Reading wf_fragments_41_1
 <19h-42m-58s> P0001: Reading wf_fragments_42_1
 <19h-42m-59s> P0001: Reading wf_fragments_43_1
 <19h-42m-59s> P0001: Reading wf_fragments_44_1
 <19h-42m-59s> P0001: Reading wf_fragments_45_1
 <19h-42m-59s> P0001: Reading wf_fragments_46_1
 <19h-42m-59s> P0001: Reading wf_fragments_47_1
 <19h-43m-00s> P0001: Reading wf_fragments_48_1
 <19h-43m-00s> P0001: Reading wf_fragments_49_1
 <19h-43m-00s> P0001: Reading wf_fragments_50_1
 <19h-43m-00s> P0001: Reading wf_fragments_51_1
 <19h-43m-01s> P0001: Reading wf_fragments_52_1
 <19h-43m-01s> P0001: Reading wf_fragments_53_1
 <19h-43m-01s> P0001: Reading wf_fragments_54_1
 <19h-43m-01s> P0001: Reading wf_fragments_55_1
 <19h-43m-02s> P0001: Reading wf_fragments_56_1
 <19h-43m-02s> P0001: Reading wf_fragments_57_1
 <19h-43m-02s> P0001: Reading wf_fragments_58_1
 <19h-43m-02s> P0001: Reading wf_fragments_59_1
 <19h-43m-03s> P0001: Reading wf_fragments_60_1
 <19h-43m-03s> P0001: [M  1.854 Gb] Free wf_disk ( 0.018)
 <19h-43m-03s> P0001: Reading pp_fragment_1
 <19h-43m-03s> P0001: G0W0 (W PPA) |                                        | [000%] --(E) --(X)
 <19h-43m-03s> P0001: Reading pp_fragment_1
 <19h-45m-15s> P0001: Reading pp_fragment_2
 <20h-11m-47s> P0001: Reading pp_fragment_3
 <20h-12m-03s> P0001: G0W0 (W PPA) |#                                       | [002%] 29m-00s(E) 19h-20m-22s(X)
 <20h-29m-28s> P0001: Reading pp_fragment_4
 <20h-42m-43s> P0001: Reading pp_fragment_5
 <20h-43m-17s> P0001: G0W0 (W PPA) |##                                      | [005%] 01h-00m-13s(E) 20h-04m-34s(X)
 <21h-35m-48s> P0001: Reading pp_fragment_6
 <22h-28m-53s> P0001: Reading pp_fragment_7
 <22h-29m-43s> P0001: G0W0 (W PPA) |###                                     | [007%] 02h-46m-39s(E) 01d-13h-02m-10s(X)
 <22h-55m-23s> P0001: Reading pp_fragment_8
 <23h-48m-21s> P0001: Reading pp_fragment_9
 <23h-49m-27s> P0001: G0W0 (W PPA) |####                                    | [010%] 04h-06m-24s(E) 01d-17h-04m-05s(X)
 <01d-00h-06m-01s> P0001: Reading pp_fragment_10
 <01d-01h-52m-16s> P0001: Reading pp_fragment_11

At the end, the timing for 60*2=120 QP corrections approaches 2 days. From your experience, is that final run time realistic for the set of parameters I chose? I'm somehow concerned that there is some slowdown due to allocation or something....

Cheers,
Bjoern

Post by **Daniele Varsano** » Thu Dec 08, 2016 5:35 pm

Dar Bjoern,
I cannot advise you much. If you need a big k point sampling the QP calculation can be quite time consuming, so it is not unreasonable.
It does not look you have problem of swapping due to allocation as it seems you allocate less than 2 Gb, which is not much, but I do not know
how much memory per core you have.
You can try to play with the SE_CPU variable, putting a divisor of 120 as qp in order to better distribute workload among cpus and move more processor on bands.

Best,

Daniele

Post by **Davide Sangalli** » Thu Dec 08, 2016 8:36 pm

Dar Bjoern,
I tend to agree with Daniele, it is reasonable.

The estimate growing maybe related to symmetry operation. This is my guess.
The estimate is indeed done considering the time needed to perform a single iteration in a loop and knowing how many iterations in total are needed.
In the first steps operations without rotating the wave--functions are done. Later the code starts to rotate the wave--functions, the time for a single iteration in a loop increases and accordingly the estimate.

Anyway, whatever the reason, it is something I usually experience in GW runs, so I would not say it is strange.

Instead, if you want to try to improve the memory distribution try:

Code: Select all

SE_CPU= "1 2 7"                   # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b"                 # [PARALLEL] CPUs roles (q,qp,b)

Best,
D.

bob · Post by **bob** » Fri Dec 30, 2016 9:32 pm

Hi Daniele and Davide,

Thanks for all your help. Now, I have finally determined all the QP corrections I needed. Now, I wanted to determine the band structures with interpolation in ypp.

However, something seems to go wrong even without QP corrections. Let me tell you what I did:

DFT nscf calculation with pw.x for Silicon
p2y, followed by yambo initialization
used ypp -n to remove time reversal symmetry
yambo init in FixSymm folder
ypp -s b to interpolate band structure between Gamma and X using 10 points along the line

Looking at the result, the output at Gamma is already strange since the VBM is not degenerate by a lot:

Code: Select all

#   |k|        b1         b2         b3         b4         b5         b6         b7         b8         kx         ky         kz
#
     0.00000  -12.28288   -0.49402   -0.21131   -0.31651    2.43757    2.38482    1.93734    3.23343    0.00000    0.00000    0.00000

I am at a loss at what it is that I have done wrong. Any suggestions?

Post by **Davide Sangalli** » Tue Jan 03, 2017 9:55 am

Dear Bjoern,
is the gamma point part of the original nscf grid ?
In case it is not, and the degeneracy exist only at the gamma point,
it is not guarantee it will be respected.

If you are using a recent version of ypp you should also have a file in output called "o-*_built_in"
where the energies of the k-point in the nscf grid which are part of the path are printed.
You can check the output of that file.

Please pay attention that the k point there are written in internal k-units, while the k-points in the
output of the file "o-*_interpolated" are written according to the "cooIn=" input variable.
So you cannot compare the two. We are going to fix this in future releases

D.

bob · Post by **bob** » Tue Jan 03, 2017 10:21 am

Hi Davide,

Thanks. Looking at the "o-*_built-in", it looks like the degeneracy at the nscf k-points belonging to the path is ok. But indeed, Gamma is not part of the original grid, so I guess that's just the issue.

Having suspected something like that, I have also been considering calculating the QP corrections at k-points along the high-symmetry lines explicitly. I have tried using ypp -k k to generate a BZ k-point grid as in the documentation here http://www.yambo-code.org/input_file/ypp/ypp_kk.php

What I noticed is that for some points, ypp did not give any index for the actual user point such as in the last line here

Code: Select all

 0.4333334  -0.4666666   0.0000000   0.1025641
   0.1000000   0.2000000   0.0000000   0.1025641  1

Does that indicate that the user k-point is in fact part of the original grid somehow (probably translated by some RL vectors? Also, how many user k-points can I supply in one go? Or would I have to determine each user k-point individually?

Cheers,
Bjoern

bob · Post by **bob** » Tue Jan 03, 2017 10:55 am

As an addition, I'm not sure if I'm doing this right. I wanted to add the point 0.0 1.0 0.0, ran ypp with the attached output. After re-running quantum espresso nscf with the generated 145 k-points, I tried initializing yambo again, and set IkXLim= 60 because the original grid had 60 k-points.

However, when initializing, yambo complains about the grid not being uniform:

Code: Select all

 <---> [01] CPU structure, Files & I/O Directories
 <---> CPU-Threads:1(CPU)-4(threads)
 <---> [02] CORE Variables Setup
 <---> [02.01] Unit cells
 <---> [02.02] Symmetries
 <---> [02.03] RL shells
 <---> Shells finder |########################################| [100%] --(E) --(X)
 <---> [02.04] K-grid lattice
 <---> [02.05] Energies [ev] & Occupations
 <---> [03] Transferred momenta grid
 <---> [WARNING][RL indx] 2 equivalent points in the rlu grid found
 <01s> [RL indx] X grid is not uniform.  Gamma point only.
 <01s> [04] External corrections
 <01s> [05] Game Over & Game summary

I guess that's not okay?

Bjoern

Yambo Community Forum

Advice on parameters

Re: Advice on parameters

Re: Advice on parameters

Re: Advice on parameters

P

Re: Advice on parameters

Re: Advice on parameters

Re: Advice on parameters

Re: Advice on parameters