Page 1 of 3
Different results with same input in parallel yambo 4.0.1.
Posted: Wed Oct 07, 2015 11:15 am
by Michael.Friedrich
Hey,
I was playing with/testing different mpi parallelization strategies in yambo 4.0.1. (rev 88) to produce the dielectric function of Lithium niobate. Here is my input file.
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
tddft # [R K] Use TDDFT kernel
X_q_0_CPU= "1 4 2" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k v c" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "1 1 4 2" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_Threads= 24 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 24 # [OPENMP/X] Number of threads for dipoles
Chimod= "LRC" # [X] IP/Hartree/ALDA/LRC/BSfxc
LRC_alpha= -0.450000 # [TDDFT] LRC alpha factor
NGsBlkXd= 1000 RL # [Xd] Response block size
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 60 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 12.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.15000 | 0.15000 | eV # [Xd] Damping range
%
ETStpsXd= 300 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
% XfnQP_E
1.4000 | 1.000000 | 1.000000 | # [EXTQP Xd] E parameters (c/v)
%
and tried also this
X_q_0_CPU= "8 1 1" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k v c" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "1 8 1 1" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
the output dielectric function that was generated was totally different from what I expexted in the first case and approximately 1/8 too low in the second case. No other parameters than the parallelization scheme were changed.
Best,
Michael
Re: Different results with same input in parallel yambo 4.0.
Posted: Wed Oct 07, 2015 11:23 am
by Daniele Varsano
Dear Michael,
thank you very much for reporting.
4.0.1 it is distributed as devel version, and we are working in order to stabilize it. A new release with many bugs fixing is expected in one or two months.
This to say that this kind of feedback are precious for us.
Could you please post your complete input/reports as well together with your scf/nscf quantum espresso (or abinit) input files to generate the databases.
We need to reproduce the problem in order to spot the problem.
Thank you very much,
Daniele
Re: Different results with same input in parallel yambo 4.0.
Posted: Wed Oct 07, 2015 12:03 pm
by Michael.Friedrich
Dear Daniele,
Thank you for your reply. I attached all files that are needed for the runs and the yambo reports if still present.
Best,
Michael
Re: Different results with same input in parallel yambo 4.0.
Posted: Wed Oct 07, 2015 4:18 pm
by Davide Sangalli
Dear Michael,
thank you very much for the feed back.
We suspect the problem could be due to a bug in the devel parallelization we have recently detected.
See also here for further info:
viewtopic.php?f=15&t=1049&p=4759
The stable should not be affected.
Regards,
D.
Re: Different results with same input in parallel yambo 4.0.
Posted: Mon Oct 26, 2015 11:38 am
by Michael.Friedrich
Thank you for your help; I will try my calculations with the coming version..
Unfortunately, there's yet another problem...
I wanted to test if my actual calculation generally runs stable but there's an error within NetCDF:
___ __ _____ __ __ _____ _____
| Y || _ || Y || _ \ | _ |
| | ||. | ||. ||. | / |. | |
\_ _/ |. _ ||.\_/ ||. _ \ |. | |
|: | |: | ||: | ||: | \|: | |
|::| |:.|:.||:.|:.||::. /|::. |
`--" `-- --"`-- --"`-----" `-----"
<03s> P0024: [01] CPU structure, Files & I/O Directories
<04s> P0024: CPU-Threads:64(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)
<04s> P0024: CPU-Threads:X_q_0(environment)-1 8 8(CPUs)-k v c(ROLEs)
<04s> P0024: CPU-Threads:X_finite_q(environment)-1 1 8 8(CPUs)-q k v c(ROLEs)
<04s> P0024: [02] CORE Variables Setup
<04s> P0024: [02.01] Unit cells
<04s> P0024: [02.02] Symmetries
<05s> P0024: [02.03] RL shells
<05s> P0024: [02.04] K-grid lattice
<06s> P0024: [02.05] Energies [ev] & Occupations
<08s> P0024: [03] Transferred momenta grid
<08s> P0024: Reading kindx_fragment_1
<08s> P0024: Reading kindx_fragment_2
<08s> P0024: [M 0.072 Gb] Alloc bare_qpg ( 0.056)
<08s> P0024: [04] External corrections
<09s> P0024: [05] External QP corrections (X)
<10s> P0024: [06] Optics
<10s> P0024: [PARALLEL Response_G_space_Zero_Momentum for CON bands on 8 CPU] Loaded/Total (Percentual): 26/205 (12%)
<11s> P0024: [PARALLEL Response_G_space_Zero_Momentum for VAL bands on 8 CPU] Loaded/Total (Percentual): 34/275 (12%)
<11s> P0024: Matrix Inversion uses 64 CPUs
<11s> P0024: Matrix Diagonalization uses 1 CPUs
<11s> P0024: [M 0.285 Gb] Alloc X ( 0.213)
<12s> P0024: [M 0.368 Gb] Alloc DIP_iR ( 0.083)
<12s> P0024: [M 1.277 Gb] Alloc WF ( 0.909)
<12s> P0024: [PARALLEL distribution for Wave-Function states] Loaded/Total (Percentual): 1680/13440 (12%)
<12s> P0024: [WF] Performing Wave-Functions I/O
<12s> P0024: [M 1.374 Gb] Alloc wf_disk ( 0.096)
<12s> P0024: Reading wf_fragments_1_3
<13s> P0024: Reading wf_fragments_1_4
<14s> P0024: Reading wf_fragments_2_3
<15s> P0024: Reading wf_fragments_2_4
<16s> P0024: Reading wf_fragments_3_3
<16s> P0024: Reading wf_fragments_3_4
<17s> P0024: Reading wf_fragments_4_3
<18s> P0024: Reading wf_fragments_4_4
<18s> P0024: Reading wf_fragments_5_3
<19s> P0024: Reading wf_fragments_5_4
<20s> P0024: Reading wf_fragments_6_3
<20s> P0024: Reading wf_fragments_6_4
<21s> P0024: Reading wf_fragments_7_3
<22s> P0024: Reading wf_fragments_7_4
<23s> P0024: Reading wf_fragments_8_3
<25s> P0024: Reading wf_fragments_8_4
<26s> P0024: Reading wf_fragments_9_3
<27s> P0024: Reading wf_fragments_9_4
<28s> P0024: Reading wf_fragments_10_3
<29s> P0024: Reading wf_fragments_10_4
<29s> P0024: Reading wf_fragments_11_3
<30s> P0024: Reading wf_fragments_11_4
<31s> P0024: Reading wf_fragments_12_3
<31s> P0024: Reading wf_fragments_12_4
<32s> P0024: Reading wf_fragments_13_3
<33s> P0024: Reading wf_fragments_13_4
<33s> P0024: Reading wf_fragments_14_3
<34s> P0024: Reading wf_fragments_14_4
<35s> P0024: [M 1.278 Gb] Free wf_disk ( 0.096)
<36s> P0024: [x,Vnl] computed using 1050 projectors
<36s> P0024: [WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
<36s> P0024: [M 3.550 Gb] Alloc KBV ( 2.273)
<36s> P0024: Dipole (T): | | [000%] --(E) --(X)
<47s> P0024: [M 3.562 Gb] Alloc pp_kbs pp_kb pp_kbd ( 0.012)
<47s> P0024: Reading kb_pp_pwscf_fragment_1
<29m-54s> P0024: Dipole (T): |# | [002%] 29m-18s(E) 19h-31m-53s(X)
<58m-51s> P0024: Dipole (T): |## | [005%] 58m-15s(E) 19h-24m-44s(X)
<01h-23m-38s> P0024: Reading kb_pp_pwscf_fragment_2
<01h-27m-51s> P0024: Dipole (T): |### | [007%] 01h-27m-15s(E) 19h-23m-05s(X)
<01h-56m-44s> P0024: Dipole (T): |#### | [010%] 01h-56m-07s(E) 19h-20m-55s(X)
<02h-25m-37s> P0024: Dipole (T): |##### | [012%] 02h-25m-00s(E) 19h-20m-04s(X)
<02h-46m-17s> P0024: Reading kb_pp_pwscf_fragment_3
<02h-54m-38s> P0024: Dipole (T): |###### | [015%] 02h-54m-02s(E) 19h-20m-11s(X)
<03h-23m-32s> P0024: Dipole (T): |####### | [017%] 03h-22m-55s(E) 19h-19m-30s(X)
<03h-52m-29s> P0024: Dipole (T): |######## | [020%] 03h-51m-52s(E) 19h-19m-15s(X)
<04h-09m-00s> P0024: Reading kb_pp_pwscf_fragment_4
<04h-21m-32s> P0024: Dipole (T): |######### | [022%] 04h-20m-56s(E) 19h-19m-33s(X)
<04h-50m-24s> P0024: Dipole (T): |########## | [025%] 04h-49m-48s(E) 19h-19m-12s(X)
<05h-19m-24s> P0024: Dipole (T): |########### | [027%] 05h-18m-47s(E) 19h-19m-13s(X)
<05h-31m-47s> P0024: Reading kb_pp_pwscf_fragment_5
<05h-48m-24s> P0024: Dipole (T): |############ | [030%] 05h-47m-48s(E) 19h-19m-16s(X)
<06h-17m-23s> P0024: Dipole (T): |############# | [032%] 06h-16m-47s(E) 19h-19m-15s(X)
<06h-46m-13s> P0024: Dipole (T): |############## | [035%] 06h-45m-36s(E) 19h-18m-47s(X)
<06h-54m-27s> P0024: Reading kb_pp_pwscf_fragment_6
<07h-15m-11s> P0024: Dipole (T): |############### | [037%] 07h-14m-35s(E) 19h-18m-54s(X)
<07h-44m-10s> P0024: Dipole (T): |################ | [040%] 07h-43m-34s(E) 19h-18m-54s(X)
<08h-13m-00s> P0024: Dipole (T): |################# | [042%] 08h-12m-24s(E) 19h-18m-33s(X)
<08h-17m-07s> P0024: Reading kb_pp_pwscf_fragment_7
<08h-42m-02s> P0024: Dipole (T): |################## | [045%] 08h-41m-25s(E) 19h-18m-40s(X)
<09h-11m-09s> P0024: Dipole (T): |################### | [047%] 09h-10m-33s(E) 19h-18m-58s(X)
<09h-39m-58s> P0024: Dipole (T): |#################### | [050%] 09h-39m-21s(E) 19h-18m-43s(X)
<09h-39m-58s> P0024: Reading kb_pp_pwscf_fragment_8
<10h-09m-00s> P0024: Dipole (T): |##################### | [052%] 10h-08m-24s(E) 19h-18m-51s(X)
<10h-38m-00s> P0024: Dipole (T): |###################### | [055%] 10h-37m-24s(E) 19h-18m-53s(X)
<11h-02m-43s> P0024: Reading kb_pp_pwscf_fragment_9
<11h-07m-02s> P0024: Dipole (T): |####################### | [057%] 11h-06m-26s(E) 19h-18m-58s(X)
<11h-35m-55s> P0024: Dipole (T): |######################## | [060%] 11h-35m-18s(E) 19h-18m-47s(X)
<12h-04m-51s> P0024: Dipole (T): |######################### | [062%] 12h-04m-15s(E) 19h-18m-48s(X)
<12h-25m-30s> P0024: Reading kb_pp_pwscf_fragment_10
<12h-33m-54s> P0024: Dipole (T): |########################## | [065%] 12h-33m-18s(E) 19h-18m-54s(X)
<13h-02m-48s> P0024: Dipole (T): |########################### | [067%] 13h-02m-12s(E) 19h-18m-47s(X)
<13h-31m-45s> P0024: Dipole (T): |############################ | [070%] 13h-31m-09s(E) 19h-18m-44s(X)
<13h-48m-14s> P0024: Reading kb_pp_pwscf_fragment_11
<14h-00m-44s> P0024: Dipole (T): |############################# | [072%] 14h-00m-08s(E) 19h-18m-45s(X)
<14h-29m-36s> P0024: Dipole (T): |############################## | [075%] 14h-29m-00s(E) 19h-18m-40s(X)
<14h-58m-34s> P0024: Dipole (T): |############################### | [077%] 14h-57m-58s(E) 19h-18m-39s(X)
<15h-10m-59s> P0024: Reading kb_pp_pwscf_fragment_12
<15h-27m-34s> P0024: Dipole (T): |################################ | [080%] 15h-26m-58s(E) 19h-18m-41s(X)
<15h-56m-42s> P0024: Dipole (T): |################################# | [082%] 15h-56m-05s(E) 19h-18m-52s(X)
<16h-25m-02s> P0024: Dipole (T): |################################## | [085%] 16h-24m-26s(E) 19h-18m-07s(X)
<16h-33m-16s> P0024: Reading kb_pp_pwscf_fragment_13
<16h-53m-56s> P0024: Dipole (T): |################################### | [087%] 16h-53m-19s(E) 19h-18m-05s(X)
<17h-23m-00s> P0024: Dipole (T): |#################################### | [090%] 17h-22m-23s(E) 19h-18m-12s(X)
<17h-51m-46s> P0024: Dipole (T): |##################################### | [092%] 17h-51m-10s(E) 19h-17m-59s(X)
<17h-55m-53s> P0024: Reading kb_pp_pwscf_fragment_14
<18h-20m-41s> P0024: Dipole (T): |###################################### | [095%] 18h-20m-05s(E) 19h-17m-57s(X)
<18h-49m-47s> P0024: Dipole (T): |####################################### | [097%] 18h-49m-11s(E) 19h-18m-06s(X)
<19h-18m-33s> P0024: Dipole (T): |########################################| [100%] 19h-17m-57s(E) 19h-17m-57s(X)
<19h-18m-34s> P0024: [M 3.550 Gb] Free pp_kbs pp_kb pp_kbd ( 0.012)
<19h-18m-34s> P0024: [M 1.277 Gb] Free KBV ( 2.273)
<01d-00h-49m-26s> P0024: [M 0.368 Gb] Free WF ( 0.909)
<01d-00h-49m-26s> P0024: Writing dipoles_fragment_1
P0024: [ERROR] STOP signal received while in :[06] Optics
P0024: [ERROR][NetCDF] NetCDF: String match to name in use
I used this input and reduced the reciprocal lattice vectors from 2400 to 300.
Code: Select all
# __ __ ________ ___ __ __ _______ ______
# /_/\/_/\ /_______/\ /__//_//_/\ /_______/\ /_____/\
# \ \ \ \ \\::: _ \ \\::\| \| \ \\::: _ \ \\:::_ \ \
# \:\_\ \ \\::(_) \ \\:. \ \\::(_) \/_\:\ \ \ \
# \::::_\/ \:: __ \ \\:.\-/\ \ \\:: _ \ \\:\ \ \ \
# \::\ \ \:.\ \ \ \\. \ \ \ \\::(_) \ \\:\_\ \ \
# \__\/ \__\/\__\/ \__\/ \__\/ \_______\/ \_____\/
#
# GPL Version 4.0.1 Revision 88
# http://www.yambo-code.org
#
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
tddft # [R K] Use TDDFT kernel
X_q_0_CPU= "1 8 8" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k v c" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "1 1 8 8" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "q k v c" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "LRC" # [X] IP/Hartree/ALDA/LRC/BSfxc
LRC_alpha=-0.480000 # [TDDFT] LRC alpha factor
NGsBlkXd= 300 RL # [Xd] Response block size 2400
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 480 | # [Xd] Polarization function bands
%
% EnRngeXd
0.0000 | 12.0000 | eV # [Xd] Energy range
%
% DmRngeXd
0.15 | 0.15 | eV # [Xd] Damping range
%
ETStpsXd= 300 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
% XfnQP_E
1.4000 | 1.000000 | 1.000000 | # [EXTQP Xd] E parameters (c/v)
%
Is this a problem of the yambo code or of my netCDF installation?
netCDF was installed with large file support and yambo was started yambo -S.
The error does not occur with a unit cell with 10 atoms but with a supercell with 80 atoms.
If the problem is not yet known and if you want to reproduce the error I attached the quantum espresso files.
Best,
Michael
Re: Different results with same input in parallel yambo 4.0.
Posted: Mon Oct 26, 2015 11:46 am
by Daniele Varsano
Dear Michael,
tanks for waiting, the next version with some bug fixes will be released very soon, hopefully in the next days.
The problem you are facing it is not strictly NECDF related, and it should be fixed int he next release.
In the meanwhile, a workaround is to avoid to write dipoles on disk adding in input:
Anyway I can see that the calculations of the dipoles is rather cumbersome, I would try to avoid the inclusion of the non local commutator, by renaming the kb_pp_pwscf file.
Best,
Daniele
Re: Different results with same input in parallel yambo 4.0.
Posted: Mon Oct 26, 2015 12:10 pm
by Michael.Friedrich
Dear Daniele,
Thanks for the information. Nice to know that the next release is coming soon
I tried to exclude the non local commutator but the changes seemed rather clear to me so I decided to leave it in.
Best,
Michael
Re: Different results with same input in parallel yambo 4.0.
Posted: Mon Oct 26, 2015 2:56 pm
by Daniele Varsano
Dear Michael,
this is to inform you that a new release of yambo containing many bug fixes has been released.
For the moment the nre version is in the svn repository only, in the next day we will also prepare the tarball and the release note.
Please note this is still a devel version, and we will be very glad if you could report us any problem you encounter.
Best,
Daniele
Re: Different results with same input in parallel yambo 4.0.
Posted: Mon Nov 02, 2015 12:58 pm
by Michael.Friedrich
Thank you; the new version perfectly works fine

... almost...
The scissor shift which I inserted in the yambo input file is not recognized for the case of a supercell but only for the unit cell. Except for the parameters related to the bigger cell nothing in the input file is different
yambo.in for the unit cell
Code: Select all
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
tddft # [R K] Use TDDFT kernel
X_q_0_CPU= "1 4 4" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k v c" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "1 1 4 4" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_Threads= 1 # [OPENMP/X] Number of threads for response functions
DIP_Threads= 1 # [OPENMP/X] Number of threads for dipoles
Chimod= "LRC" # [X] IP/Hartree/ALDA/LRC/BSfxc
LRC_alpha= -0.460000 # [TDDFT] LRC alpha factor
NGsBlkXd= 300 RL # [Xd] Response block size
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 60 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 12.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.15000 | 0.15000 | eV # [Xd] Damping range
%
ETStpsXd= 300 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
% XfnQP_E
1.39500 | 1.000000 | 1.000000 | # [EXTQP Xd] E parameters (c/v)
%
for the (2x2x2) supercell
Code: Select all
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
tddft # [R K] Use TDDFT kernel
X_q_0_CPU= "1 8 8" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k v c" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "1 1 8 8" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "q k v c" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "LRC" # [X] IP/Hartree/ALDA/LRC/BSfxc
LRC_alpha=-0.460000 # [TDDFT] LRC alpha factor
NGsBlkXd= 2400 RL # [Xd] Response block size
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 480 | # [Xd] Polarization function bands
%
% EnRngeXd
0.0000 | 12.0000 | eV # [Xd] Energy range
%
% DmRngeXd
0.15 | 0.15 | eV # [Xd] Damping range
%
ETStpsXd= 300 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
% XfnQP_E
1.39500 | 1.000000 | 1.000000 | # [EXTQP Xd] E parameters (c/v)
%
for the supercell,
Code: Select all
% XfnQP_E
1.39500 | 1.000000 | 1.000000 | # [EXTQP Xd] E parameters (c/v)
%
is not echoed in the output o.eps* and is not included in the r_optics* file either.
Best,
Michael
Re: Different results with same input in parallel yambo 4.0.
Posted: Mon Nov 02, 2015 1:50 pm
by Daniele Varsano
Dear Michael,
could you please also post the two report files?
Best,
Daniele