Page 1 of 1

Parallel and efficiency of yambo

Posted: Wed Apr 10, 2013 3:35 pm
by matdisor
Hi,

First post here and a newbie to yambo. I wonder if anyone has any benchmark of the efficiency of yambo vs. number of CPUS.
I have tested in a Cray (using libsci) and found that the scaling factor is not that good.

For example, for testing purpose, I ran optics_bse_bss for a MoS2 single layer (15 x 15 x 1 k-mesh, 16 bands, MaxGvecs= 20000, NGsBlkXs= 20). Here is what I got:
1 cpu 01h-25m-44s
12 cpus 16m-20s
24 cpus 13m-50s
48 cpus 13m-25s

If you can point out how to get good scaling as I need to do a few tens or few hundred atoms clusters, I appreciate very much.

Re: Parallel and efficiency of yambo

Posted: Wed Apr 10, 2013 3:53 pm
by Daniele Varsano
Dear Duy Le,
are you performing full diagonalization in this tests:

Code: Select all

BSSmod=d
or using Haydock algorithms? In the latter way the scaling should be better. In any case, we are working on total new parallelization strategy in order to let yambo run on BlueGene machine (thousands cpu). These MPI/OpenMP features will be available soon in one of the next releases.
Best,

Daniele

Re: Parallel and efficiency of yambo

Posted: Wed Apr 10, 2013 4:16 pm
by matdisor
I used Haydock method. It seems that this does not relates to linking libraries, as I tried many different way of linking. I got some (little) improvement of walltime but not scaling factor.

Looking forward to the new release. I am still curious if anyone has benchmark for scaling factor.

Best,

Re: Parallel and efficiency of yambo

Posted: Wed Apr 10, 2013 5:12 pm
by Conor Hogan
Dear Duy Le,
Thanks for your comments and welcome to Yambo. I'm a little surprised by your reported scaling, as we found the Haydock scales pretty well - I attach results carried out on Marenostrum in Barcelona.
Can you post the first few lines of your config.log, your whole input file (yambo.in or whatever) and tell us what size is the BS Kernel?
Regards,
Conor

Re: Parallel and efficiency of yambo

Posted: Wed Apr 10, 2013 5:39 pm
by matdisor
Conor Hogan wrote:Dear Duy Le,
Thanks for your comments and welcome to Yambo. I'm a little surprised by your reported scaling, as we found the Haydock scales pretty well - I attach results carried out on Marenostrum in Barcelona.
Can you post the first few lines of your config.log, your whole input file (yambo.in or whatever) and tell us what size is the BS Kernel?
Regards,
Conor
Thank Conor,

I was surprised too. I want to see a benchmark elsewhere to confirm that I did do something incorrectly. The following is the end of configure message:

Code: Select all

#
# [VER] 3.3.0 r.1887
#
# [SYS] linux@x86_64
# [SRC] /lustre/scratch/proj/dle-proj/yambo-3.3.0-rev36_lib4
# [BIN] /lustre/scratch/proj/dle-proj/yambo-3.3.0-rev36_lib4/bin
# [FFT]
#
# [ ] Double precision
# [X] Redundant compilation
# [X] MPI
# [X] PW (5.0) support
# [ ] ETSF I/O support
# [X] SCALAPACK
# [X  ] NETCDF/HDF5/Large Files
# [   ] Built-in BLAS/LAPACK/LOCAL
#
# [ CPP ] gcc -E -P
# [  C  ] gcc -g -O2 -D_C_US -D_FORTRAN_US
# [MPICC] mpicc -g -O2 -D_C_US -D_FORTRAN_US
# [ F90 ] pgf90 -O2 -fast -Munroll -Mnoframe -Mdalign -Mbackslash
# [MPIF ] mpif90 -O2 -fast -Munroll -Mnoframe -Mdalign -Mbackslash
# [ F77 ] pgf90 -O2 -fast -Munroll -Mnoframe -Mdalign -Mbackslash
# [Cmain] -Mnomain
# [NoOpt] -O0 -Mbackslash
#
# [ MAKE ] make
# [EDITOR] vim
#
Three input files:
01.in

Code: Select all

setup                        # [R INI] Initialization
MaxGvecs=  20000          RL 
02.in

Code: Select all

em1s                         # [R Xs] Static Inverse Dielectric Matrix
% QpntsRXs
   1 |  64 |                 # [Xs] Transferred momenta
%
% BndsRnXs
  1 | 16 |                   # [Xs] Polarization function bands
%
NGsBlkXs= 20            RL    # [Xs] Response block size
% LongDrXs
 1.000000 | 0.000000 | 0.000000 |        # [Xs] [cc] Electric Field
%
03.in

Code: Select all

optics                       # [R OPT] Optics
bse                          # [R BSK] Bethe Salpeter Equation.
bss                          # [R BSS] Bethe Salpeter Equation solver
BSresKmod= "xc"              # [BSK] Resonant Kernel mode. (`x`;`c`;`d`)
BScplKmod= "none"            # [BSK] Coupling Kernel mode. (`x`;`c`;`d`)
% BSEBands
  1 | 16 |                   # [BSK] Bands range
%
BSENGBlk= 21           RL    # [BSK] Screened interaction block size
BSENGexx= 19993        RL    # [BSK] Exchange components
BSSmod= "h"                  # [BSS] Solvers `h/d/i/t`
% BEnRange
  0.00000 | 5.00000 | eV    # [BSS] Energy range
%
% BDmRange
  0.10000 |  0.10000 | eV    # [BSS] Damping range
%
BEnSteps= 300                # [BSS] Energy steps
% BLongDir
 1.000000 | 0.000000 | 0.000000 |        # [BSS] [cc] Electric Field
%
Attachments are config.log and optics_bse_bss report. The size of BSE kernel is 14175.
config.log
r_optics_bse_bss.log
You may see weird configure options, however, I had to do so to use ftn and libsci.

Sorry if you find any nonsense, I am new to the theory and the code.

Duy

Re: Parallel and efficiency of yambo

Posted: Fri Apr 12, 2013 10:44 am
by Davide Sangalli
Hi again Duy,
Not completely sure what is going on in your compilation - can you also upload the config/setup file? for instance I'm not 100% sure which BLAS/LAPACK/SCALAPACK libraries you are linking to at the end.
Thanks
Conor (using Davide's account!)
PS we are all away at a school these days so cannot really help until we return to work

Re: Parallel and efficiency of yambo

Posted: Mon Apr 29, 2013 4:39 am
by matdisor
Davide Sangalli wrote:Hi again Duy,
Not completely sure what is going on in your compilation - can you also upload the config/setup file? for instance I'm not 100% sure which BLAS/LAPACK/SCALAPACK libraries you are linking to at the end.
Thanks
Conor (using Davide's account!)
PS we are all away at a school these days so cannot really help until we return to work
Sorry for late reply. I was quite busy. As you can see, I use libsci, which is optimal for cray. I also compiled with other lib but did not see that much effect.
I will investigate closely this issue later, mainly because of time. For now, I am using yambo for small system so it does not hurt that much.