slow performance after a X0 is finished

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
xyf
Posts: 3
Joined: Mon Apr 24, 2023 8:15 am

slow performance after a X0 is finished

Post by xyf » Sat Feb 17, 2024 6:34 am

Dear developers,

I am working with yambo 5.1.1 for quasiparticle energies using PPA. But in the dynamical dielectric matrix stage, when one X0 is finished, yambo seems to be trapped in some work. Only after several hours yambo will continue his work and calculate X. Like the log below:

Code: Select all

 <04h-03m> P1-cnode399: Xo@q[2] |                                        | [000%] --(E) --(X)
 <04h-03m> P1-cnode399: [MEMORY] Alloc Xo_res( 136.6050 [Mb]) TOTAL:  10.39174 [Gb] (traced)  117.5200 [Mb] (memstat)
 <04h-05m> P1-cnode399: Xo@q[2] |#                                       | [002%] 02m-13s(E) 01h-28m(X)
 <04h-07m> P1-cnode399: Xo@q[2] |##                                      | [005%] 04m-25s(E) 01h-28m(X)
 <04h-09m> P1-cnode399: Xo@q[2] |###                                     | [007%] 06m-37s(E) 01h-28m(X)
 <04h-11m> P1-cnode399: Xo@q[2] |####                                    | [010%] 08m-49s(E) 01h-28m(X)
 <04h-14m> P1-cnode399: Xo@q[2] |#####                                   | [012%] 11m-01s(E) 01h-28m(X)
 <04h-16m> P1-cnode399: Xo@q[2] |######                                  | [015%] 13m-12s(E) 01h-28m(X)
 <04h-18m> P1-cnode399: Xo@q[2] |#######                                 | [017%] 15m-24s(E) 01h-28m(X)
 <04h-20m> P1-cnode399: Xo@q[2] |########                                | [020%] 17m-36s(E) 01h-28m(X)
 <04h-22m> P1-cnode399: Xo@q[2] |#########                               | [022%] 19m-48s(E) 01h-28m(X)
 <04h-25m> P1-cnode399: Xo@q[2] |##########                              | [025%] 21m-59s(E) 01h-27m(X)
 <04h-27m> P1-cnode399: Xo@q[2] |###########                             | [027%] 24m-11s(E) 01h-27m(X)
 <04h-29m> P1-cnode399: Xo@q[2] |############                            | [030%] 26m-22s(E) 01h-27m(X)
 <04h-31m> P1-cnode399: Xo@q[2] |#############                           | [032%] 28m-34s(E) 01h-27m(X)
 <04h-33m> P1-cnode399: Xo@q[2] |##############                          | [035%] 30m-47s(E) 01h-27m(X)
 <04h-36m> P1-cnode399: Xo@q[2] |###############                         | [037%] 32m-58s(E) 01h-27m(X)
 <04h-38m> P1-cnode399: Xo@q[2] |################                        | [040%] 35m-10s(E) 01h-27m(X)
 <04h-40m> P1-cnode399: Xo@q[2] |#################                       | [042%] 37m-23s(E) 01h-27m(X)
 <04h-42m> P1-cnode399: Xo@q[2] |##################                      | [045%] 39m-35s(E) 01h-27m(X)
 <04h-44m> P1-cnode399: Xo@q[2] |###################                     | [047%] 41m-46s(E) 01h-27m(X)
 <04h-47m> P1-cnode399: Xo@q[2] |####################                    | [050%] 43m-58s(E) 01h-27m(X)
 <04h-49m> P1-cnode399: Xo@q[2] |#####################                   | [052%] 46m-09s(E) 01h-27m(X)
 <04h-51m> P1-cnode399: Xo@q[2] |######################                  | [055%] 48m-20s(E) 01h-27m(X)
 <04h-53m> P1-cnode399: Xo@q[2] |#######################                 | [057%] 50m-32s(E) 01h-27m(X)
 <04h-55m> P1-cnode399: Xo@q[2] |########################                | [060%] 52m-44s(E) 01h-27m(X)
 <04h-57m> P1-cnode399: Xo@q[2] |#########################               | [062%] 54m-56s(E) 01h-27m(X)
 <05h-00m> P1-cnode399: Xo@q[2] |##########################              | [065%] 57m-08s(E) 01h-27m(X)
 <05h-02m> P1-cnode399: Xo@q[2] |###########################             | [067%] 59m-21s(E) 01h-27m(X)
 <05h-04m> P1-cnode399: Xo@q[2] |############################            | [070%] 01h-01m(E) 01h-27m(X)
 <05h-06m> P1-cnode399: Xo@q[2] |#############################           | [072%] 01h-03m(E) 01h-27m(X)
 <05h-08m> P1-cnode399: Xo@q[2] |##############################          | [075%] 01h-05m(E) 01h-27m(X)
 <05h-11m> P1-cnode399: Xo@q[2] |###############################         | [077%] 01h-08m(E) 01h-27m(X)
 <05h-13m> P1-cnode399: Xo@q[2] |################################        | [080%] 01h-10m(E) 01h-27m(X)
 <05h-15m> P1-cnode399: Xo@q[2] |#################################       | [082%] 01h-12m(E) 01h-28m(X)
 <05h-18m> P1-cnode399: Xo@q[2] |##################################      | [085%] 01h-15m(E) 01h-28m(X)
 <05h-20m> P1-cnode399: Xo@q[2] |###################################     | [087%] 01h-17m(E) 01h-28m(X)
 <05h-22m> P1-cnode399: Xo@q[2] |####################################    | [090%] 01h-19m(E) 01h-28m(X)
 <05h-26m> P1-cnode399: Xo@q[2] |#####################################   | [092%] 01h-23m(E) 01h-29m(X)
 <05h-29m> P1-cnode399: Xo@q[2] |######################################  | [095%] 01h-26m(E) 01h-31m(X)
 <05h-48m> P1-cnode399: Xo@q[2] |####################################### | [097%] 01h-45m(E) 01h-47m(X)
 <06h-21m> P1-cnode399: Xo@q[2] |########################################| [100%] 02h-18m(E) 02h-18m(X)
 <06h-21m> P1-cnode399: [MEMORY]  Free Xo_res( 136.6050 [Mb]) TOTAL:  10.26575 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-01m> P1-cnode399: [PARALLEL distribution for X Frequencies on 256 CPU] Loaded/Total (Percentual):1/2(50%)
 <12h-01m> P1-cnode399: X@q[2] |                                        | [000%] --(E) --(X)
 <12h-01m> P1-cnode399: [MEMORY] Alloc KERNEL%blc( 1.019056 [Gb]) TOTAL:  11.27420 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-01m> P1-cnode399: [MEMORY] Alloc Xo%blc( 1.019056 [Gb]) TOTAL:  12.29326 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-02m> P1-cnode399: [MEMORY] Alloc BUFFER%blc( 1.019056 [Gb]) TOTAL:  13.31231 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: X@q[2] |########################################| [100%] 02m-03s(E) 02m-03s(X)
 <12h-04m> P1-cnode399: [MEMORY]  Free M_par%blc( 1.019056 [Gb]) TOTAL:  12.29326 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY]  Free M_par%blc( 1.019056 [Gb]) TOTAL:  11.27420 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY]  Free M_par%blc( 273.2110 [Mb]) TOTAL:  11.00099 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY] Alloc X_par%blc( 509.4830 [Mb]) TOTAL:  11.51047 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [PARALLEL distribution for RL vectors(X) on 4 CPU] Loaded/Total (Percentual):32606955/******(25%)
 <12h-04m> P1-cnode399: [MEMORY]  Free M_par%blc( 1.019056 [Gb]) TOTAL:  10.49142 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY]  Free M_par%blc( 509.4830 [Mb]) TOTAL:  9.979949 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY]  Free X_par_lower_triangle%blc( 273.2110 [Mb]) TOTAL:  9.706738 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY] Alloc X_par_lower_triangle%blc( 273.2110 [Mb]) TOTAL:  9.979949 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [MEMORY] Alloc X_par%blc( 273.2110 [Mb]) TOTAL:  10.25316 [Gb] (traced)  117.5200 [Mb] (memstat)
 <12h-04m> P1-cnode399: [PARALLEL distribution for RL vectors(X) on 4 CPU] Loaded/Total (Percentual):17485551/******(13%)
 <12h-04m> P1-cnode399: [X-CG] R(p) Tot o/o(of R):   10998   81000     100
 <12h-04m> P1-cnode399: Xo@q[3] |                                        | [000%] --(E) --(X)
Here the X0 for q2 consumes about 2 hours, but then it stops for about 6 hours(06h-21m to 12h-01m), then X for q2 starts. It seems strange. I wonder what does yambo do here that consumes 6 hours, and if I can do something to improve the performance. Thank you very much.

(I'm using 512 cores to work on a spin polarized system. 10Ry cut for epsilon and 300bands for summation. The system has 110 electrons.)

Best,
Yuanfan Xiong
Yuanfan Xiong
Ph.D. student at USTC, China

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: slow performance after a X0 is finished

Post by Daniele Varsano » Sun Feb 18, 2024 9:38 am

Dear Yuanfan,

please sign your post with your name and affiliation, this is a rule of the forum, you can do once for all by filling your signature in the user profile.

It is possible that you have some unbalance in the parallel structure. Unfortunately I cannot see the task distribution from the snapshot of the log files, If you post your input file or the entire log file, we will have a look, and we can provide some suggestion on how to tune the parallel strategy to avoid such an unbalance.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

xyf
Posts: 3
Joined: Mon Apr 24, 2023 8:15 am

Re: slow performance after a X0 is finished

Post by xyf » Sun Feb 18, 2024 10:52 am

Dear Daniele,

Thank you for your reply. I have added my name and affiliation now. And here are the input file and logs.
input.txt
l-gw_HF_and_locXC_gw0_dyson_rim_cut_em1d_ppa_CPU_1.txt
l-gw_HF_and_locXC_gw0_dyson_rim_cut_em1d_ppa_CPU_129.txt
Best,
Yuanfan Xiong
You do not have the required permissions to view the files attached to this post.
Yuanfan Xiong
Ph.D. student at USTC, China

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: slow performance after a X0 is finished

Post by Daniele Varsano » Mon Feb 19, 2024 8:21 am

Dear Yuanfan,

you can try to set the parallel strategy manually to improve balance and memory distribution, inserting in your input file:

Code: Select all

X_and_IO_CPU= "1 1 1 32 16"                 # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v"               # [PARALLEL] CPUs roles (q,g,k,c,v)
I also strongly suggest updating to a more recent version of the code (5.2).

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

xyf
Posts: 3
Joined: Mon Apr 24, 2023 8:15 am

Re: slow performance after a X0 is finished

Post by xyf » Wed Feb 21, 2024 2:52 am

Dear Daniele,

Thank you very much. I'll try the new version and the parallel strategy.

Best,
Yuanfan Xiong
Yuanfan Xiong
Ph.D. student at USTC, China

Post Reply