Parallel and efficiency of yambo
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani
-
- Posts: 12
- Joined: Wed Apr 10, 2013 2:40 pm
- Contact:
Parallel and efficiency of yambo
Hi,
First post here and a newbie to yambo. I wonder if anyone has any benchmark of the efficiency of yambo vs. number of CPUS.
I have tested in a Cray (using libsci) and found that the scaling factor is not that good.
For example, for testing purpose, I ran optics_bse_bss for a MoS2 single layer (15 x 15 x 1 k-mesh, 16 bands, MaxGvecs= 20000, NGsBlkXs= 20). Here is what I got:
1 cpu 01h-25m-44s
12 cpus 16m-20s
24 cpus 13m-50s
48 cpus 13m-25s
If you can point out how to get good scaling as I need to do a few tens or few hundred atoms clusters, I appreciate very much.
First post here and a newbie to yambo. I wonder if anyone has any benchmark of the efficiency of yambo vs. number of CPUS.
I have tested in a Cray (using libsci) and found that the scaling factor is not that good.
For example, for testing purpose, I ran optics_bse_bss for a MoS2 single layer (15 x 15 x 1 k-mesh, 16 bands, MaxGvecs= 20000, NGsBlkXs= 20). Here is what I got:
1 cpu 01h-25m-44s
12 cpus 16m-20s
24 cpus 13m-50s
48 cpus 13m-25s
If you can point out how to get good scaling as I need to do a few tens or few hundred atoms clusters, I appreciate very much.
Duy Le
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Parallel and efficiency of yambo
Dear Duy Le,
are you performing full diagonalization in this tests:
or using Haydock algorithms? In the latter way the scaling should be better. In any case, we are working on total new parallelization strategy in order to let yambo run on BlueGene machine (thousands cpu). These MPI/OpenMP features will be available soon in one of the next releases.
Best,
Daniele
are you performing full diagonalization in this tests:
Code: Select all
BSSmod=d
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 12
- Joined: Wed Apr 10, 2013 2:40 pm
- Contact:
Re: Parallel and efficiency of yambo
I used Haydock method. It seems that this does not relates to linking libraries, as I tried many different way of linking. I got some (little) improvement of walltime but not scaling factor.
Looking forward to the new release. I am still curious if anyone has benchmark for scaling factor.
Best,
Looking forward to the new release. I am still curious if anyone has benchmark for scaling factor.
Best,
Duy Le
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
- Conor Hogan
- Posts: 111
- Joined: Tue Mar 17, 2009 12:17 pm
- Contact:
Re: Parallel and efficiency of yambo
Dear Duy Le,
Thanks for your comments and welcome to Yambo. I'm a little surprised by your reported scaling, as we found the Haydock scales pretty well - I attach results carried out on Marenostrum in Barcelona.
Can you post the first few lines of your config.log, your whole input file (yambo.in or whatever) and tell us what size is the BS Kernel?
Regards,
Conor
Thanks for your comments and welcome to Yambo. I'm a little surprised by your reported scaling, as we found the Haydock scales pretty well - I attach results carried out on Marenostrum in Barcelona.
Can you post the first few lines of your config.log, your whole input file (yambo.in or whatever) and tell us what size is the BS Kernel?
Regards,
Conor
You do not have the required permissions to view the files attached to this post.
Dr. Conor Hogan
CNR-ISM, via Fosso del Cavaliere, 00133 Roma, Italy;
Department of Physics and European Theoretical Spectroscopy Facility (ETSF),
University of Rome "Tor Vergata".
CNR-ISM, via Fosso del Cavaliere, 00133 Roma, Italy;
Department of Physics and European Theoretical Spectroscopy Facility (ETSF),
University of Rome "Tor Vergata".
-
- Posts: 12
- Joined: Wed Apr 10, 2013 2:40 pm
- Contact:
Re: Parallel and efficiency of yambo
Thank Conor,Conor Hogan wrote:Dear Duy Le,
Thanks for your comments and welcome to Yambo. I'm a little surprised by your reported scaling, as we found the Haydock scales pretty well - I attach results carried out on Marenostrum in Barcelona.
Can you post the first few lines of your config.log, your whole input file (yambo.in or whatever) and tell us what size is the BS Kernel?
Regards,
Conor
I was surprised too. I want to see a benchmark elsewhere to confirm that I did do something incorrectly. The following is the end of configure message:
Code: Select all
#
# [VER] 3.3.0 r.1887
#
# [SYS] linux@x86_64
# [SRC] /lustre/scratch/proj/dle-proj/yambo-3.3.0-rev36_lib4
# [BIN] /lustre/scratch/proj/dle-proj/yambo-3.3.0-rev36_lib4/bin
# [FFT]
#
# [ ] Double precision
# [X] Redundant compilation
# [X] MPI
# [X] PW (5.0) support
# [ ] ETSF I/O support
# [X] SCALAPACK
# [X ] NETCDF/HDF5/Large Files
# [ ] Built-in BLAS/LAPACK/LOCAL
#
# [ CPP ] gcc -E -P
# [ C ] gcc -g -O2 -D_C_US -D_FORTRAN_US
# [MPICC] mpicc -g -O2 -D_C_US -D_FORTRAN_US
# [ F90 ] pgf90 -O2 -fast -Munroll -Mnoframe -Mdalign -Mbackslash
# [MPIF ] mpif90 -O2 -fast -Munroll -Mnoframe -Mdalign -Mbackslash
# [ F77 ] pgf90 -O2 -fast -Munroll -Mnoframe -Mdalign -Mbackslash
# [Cmain] -Mnomain
# [NoOpt] -O0 -Mbackslash
#
# [ MAKE ] make
# [EDITOR] vim
#
01.in
Code: Select all
setup # [R INI] Initialization
MaxGvecs= 20000 RL
Code: Select all
em1s # [R Xs] Static Inverse Dielectric Matrix
% QpntsRXs
1 | 64 | # [Xs] Transferred momenta
%
% BndsRnXs
1 | 16 | # [Xs] Polarization function bands
%
NGsBlkXs= 20 RL # [Xs] Response block size
% LongDrXs
1.000000 | 0.000000 | 0.000000 | # [Xs] [cc] Electric Field
%
Code: Select all
optics # [R OPT] Optics
bse # [R BSK] Bethe Salpeter Equation.
bss # [R BSS] Bethe Salpeter Equation solver
BSresKmod= "xc" # [BSK] Resonant Kernel mode. (`x`;`c`;`d`)
BScplKmod= "none" # [BSK] Coupling Kernel mode. (`x`;`c`;`d`)
% BSEBands
1 | 16 | # [BSK] Bands range
%
BSENGBlk= 21 RL # [BSK] Screened interaction block size
BSENGexx= 19993 RL # [BSK] Exchange components
BSSmod= "h" # [BSS] Solvers `h/d/i/t`
% BEnRange
0.00000 | 5.00000 | eV # [BSS] Energy range
%
% BDmRange
0.10000 | 0.10000 | eV # [BSS] Damping range
%
BEnSteps= 300 # [BSS] Energy steps
% BLongDir
1.000000 | 0.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
Sorry if you find any nonsense, I am new to the theory and the code.
Duy
You do not have the required permissions to view the files attached to this post.
Duy Le
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
- Davide Sangalli
- Posts: 640
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
Re: Parallel and efficiency of yambo
Hi again Duy,
Not completely sure what is going on in your compilation - can you also upload the config/setup file? for instance I'm not 100% sure which BLAS/LAPACK/SCALAPACK libraries you are linking to at the end.
Thanks
Conor (using Davide's account!)
PS we are all away at a school these days so cannot really help until we return to work
Not completely sure what is going on in your compilation - can you also upload the config/setup file? for instance I'm not 100% sure which BLAS/LAPACK/SCALAPACK libraries you are linking to at the end.
Thanks
Conor (using Davide's account!)
PS we are all away at a school these days so cannot really help until we return to work
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
-
- Posts: 12
- Joined: Wed Apr 10, 2013 2:40 pm
- Contact:
Re: Parallel and efficiency of yambo
Sorry for late reply. I was quite busy. As you can see, I use libsci, which is optimal for cray. I also compiled with other lib but did not see that much effect.Davide Sangalli wrote:Hi again Duy,
Not completely sure what is going on in your compilation - can you also upload the config/setup file? for instance I'm not 100% sure which BLAS/LAPACK/SCALAPACK libraries you are linking to at the end.
Thanks
Conor (using Davide's account!)
PS we are all away at a school these days so cannot really help until we return to work
I will investigate closely this issue later, mainly because of time. For now, I am using yambo for small system so it does not hurt that much.
Duy Le
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle