Page 1 of 2

parallel problem with a full frequency approach

Posted: Mon Apr 29, 2019 5:42 am
by young
Dear developers,

When I perform a GW calculation with a full frequency approach, I find yambo-4.3.2(or 4.3.0) will not work, if the number of cores are over 64. I will report an error:

Code: Select all

[i][ERROR] STOP signal received while in :[04] Dynamical Dielectric Matrix  [ERROR] clock_find:  too many clocks[/i]
But it still works with plasmon pole approximation, when same paralleled paramaters were used. There are input and compiled paramaters below:

Input from Bulk BN example.

Code: Select all

X_all_q_CPU= "1 1 64 2"  # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v"    # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_LinAlg_INV=64  # [PARALLEL] CPUs for Linear Algebra
SE_CPU= " 4  4  16"        # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b"          # [PARALLEL] CPUs roles (q,qp,b)
gw0                            # [R GW] GoWo Quasiparticle energy levels
HF_and_locXC                   # [R XX] Hartree-Fock Self-energy and Vxc
em1d                           # [R Xd] Dynamical Inverse Dielectric Matrix
EXXRLvcs=  3187        RL      # [XX] Exchange    RL components
Chimod= ""                     # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% GbndRnge
   1 | 100 |                   # [GW] G[W] bands range
%
GDamping=  0.10000     eV      # [GW] G[W] damping
dScStep=  0.10000      eV      # [GW] Energy step to evaluate Z factors
% BndsRnXd
   1 | 100 |                   # [Xd] Polarization function bands
%
NGsBlkXd= 1            RL      # [Xd] Response block size
% DmRngeXd
  0.10000 |  0.10000 | eV      # [Xd] Damping range
%
ETStpsXd= 500                  # [Xd] Total Energy steps
% LongDrXd
 1.000000 | 0.000000 | 0.000000 |        # [Xd] [cc] Electric Field
%
DysSolver= "n"                 # [GW] Dyson Equation solver ("n","s","g")
%QPkrange                      # [GW] QP generalized Kpoint/Band indices
  1| 14|  1|20|
%
Compiled paramaters:

Code: Select all

####################################################################
# - COMPILERS -
#
# FC kind = intel mpiifort for the Intel(R) MPI Library 2019 Technical Preview for Linux*
# MPI kind= Intel(R) MPI Library 2018 Update 2 for Linux* OS
#
# [ CPP ] gcc -E -P  -D_MPI -D_FFTW       -D_TIMING   
# [ FPP ] fpp -free -P  -D_MPI -D_FFTW       -D_TIMING  
# [ CC  ] mpiicc -O2 -D_C_US -D_FORTRAN_US
# [ FC  ] mpiifort -assume bscc -O3 -g -ip     
# [ FCUF] -assume bscc -O0 -g  
# [ F77 ] mpiifort -assume bscc -O3 -g -ip  
# [ F77U] -assume bscc -O0 -g  
# [Cmain] -nofor_main
################################################################################################
Can you help me how to parallel hundreds of cores with a full frequency approach - real axis integration.
Thanks in advanced!

Best
Ke Yang
PHD student
Rensselaer polytechnic institute, Troy, NY, US
Hunan university, changsha, Hunan, China

Re: parallel problem with a full frequency approach

Posted: Mon Apr 29, 2019 10:54 am
by Daniele Varsano
Dear Ke Yang,
thanks for reporting, probably the parallelization strategy of this part of the code has not been extensively checked so far.
For completeness, can you please post also your log and report files?
Best,
Daniele

Re: parallel problem with a full frequency approach

Posted: Mon Apr 29, 2019 11:30 am
by Davide Sangalli
Dear Ke Yang,
one easy solution should be to re-compile yambo with --disable-time-profile option given when running the configure.
Please remember to do make clean_all before doing so.

Best,
D.

Re: parallel problem with a full frequency approach

Posted: Tue Apr 30, 2019 10:35 pm
by young
Dear Daniele,

Thanks so much. I can run it successfully.

Best
Ke
PHD student
Rensselaer polytechnic institute, Troy, NY, US
Hunan university, changsha, Hunan, China

Re: parallel problem with a full frequency approach

Posted: Sun Feb 16, 2020 1:31 pm
by haseebphysics1
Dear Yambors,

I would like to know what is the best parallelization scheme for RPA (without LF) calculations in G-space? I have started the calculation with 18 cores or so, but during dipole calculations, some processes are lagging behind (very slow) and some have reached 100%! Is it normal?

The relevant files are attached!


Regards,
Haseeb Ahmad,
LUMS - Pakistan

Re: parallel problem with a full frequency approach

Posted: Sun Feb 16, 2020 3:56 pm
by Daniele Varsano
Dear Haseeb,
you are using a default parallelization that cannot be optimal.
You can use the command yambo -o c -V par (or -V all) in order to activate additional variable and choose a different parallelization strategy.
Something like:
DIP_CPU= "1 9 2" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
X_CPU= "1 1 1 9 2" # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)

In this way, the parallelism is over bands only, is the best parallelization strategy in order to distribute memory, even if you can have still some unbalance.

Please note that "ETStpsXd= 1001" is very large, this will make your calculation extremely long and it is not needed to sample your spectrum for 1001 frequencies, I would low this parameter of one order of magnitude. The same for EnRngeXd, do you really need to look at the spectrum up to 50eV?
For so high energies you are most probably in the continuum region which is not so meaningful.
Similarly, probably you do not need such large number of bands for the IP spectra to look at the spectrum in the region of interest.
I order to speed-up the calculations of the dipoles you can think to reduce the FFTGvecs value.

Finally, as already said in a previous post, the IP calculation is faster in transition space:
yambo -o b -k ip
maybe you need to change BSKmod from Hartree to
BSKmod="IP"

Best,
Daniele

Re: parallel problem with a full frequency approach

Posted: Sun Feb 16, 2020 4:41 pm
by haseebphysics1
Dear Daniele, thanks very much for your answers and suggestions! It is due to support and help (in this forum) by the people like you due to which I've decided to use Yambo in my research work!
do you really need to look at the spectrum up to 50eV?
Actually, in literature, I found someone has done absorption calcukation up to 80 eV! though this is a semiconductor. And you rightly said that in this high energy regime we might be in the classical realm and the concept of discrete energy levels will not make much sense then. But I just wanted to have the downward trend of the absorption in my material as well and to know where it is absorbing maximum frequency (resonance). Otherwise, my area of interest is 0-6 eV only!

Re: parallel problem with a full frequency approach

Posted: Wed Mar 18, 2020 10:47 pm
by haseebphysics1
Dear Daniele,

I am doing BSE calculations on three compute nodes. There is no error as such but one thing that always made me feel anxious even when I am not using HPC slurm and doing single node is some mpi process on yambo lag very behind and they don't start until other mpi processes are close to finish and hence some CPU remains idle which decreases the performance a lot!

I have attached the log and r-file of one of my such on-going calculation, here you can see that CPU1 and CPU2 is just sitting idle for 3 hours and it started to compute while other four cpus were going to end their kernal computation!

I have plenty of CPUs in the cluster and I have just ran the 6 mpi processes, so why do all 6 are not computing at the same time? Am I missing some parallel parameters?


Thanks,

Re: parallel problem with a full frequency approach

Posted: Thu Mar 19, 2020 8:59 am
by claudio
Dear Haseeb Ahmad

I advise you to use parallelization on k in the BSE, because it is much more efficient,
just add the "-V par" to the input generation and set, for you case:

Code: Select all

BS_CPU= "6 1 1"                     # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t"                   # [PARALLEL] CPUs roles (k,eh,t)
I also advice you to use less threads 2 or 4 if it works

best regards
Claudio

Re: parallel problem with a full frequency approach

Posted: Thu Mar 19, 2020 1:58 pm
by haseebphysics1
Dear claudio,

Thank you for the useful info. But my problem/question persists. Even after using the above strategy, I don't know why some mpi tasks are lagging behind the others. It is evident from the attached file in my last post.



Thanking you,