low efficency of GPU version

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
Dean
Posts: 106
Joined: Thu Oct 10, 2019 7:03 am

low efficency of GPU version

Post by Dean » Mon Apr 24, 2023 11:10 am

Dear all,
I compiled the GPU version of yambo-5.1.2 based on NVHPC SDK.
But, baesd on my test, I found the GPU version of yambo shows much low efficency than that of CPU version.
For the same test run, it takes 1.5 minutes for CPU version of yambo, but it takes 6.5 minute for GPU version of yambo. Obviously, it is a confusing results.
The GPU devices I used is RTX 3090.
Details can be found in the attacments.
Is there some tips for the variable in inputfile?
Thanks in advance.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.

andrea.ferretti
Posts: 208
Joined: Fri Jan 31, 2014 11:13 am

Re: low efficency of GPU version

Post by andrea.ferretti » Wed Apr 26, 2023 9:00 am

Dear Dean,

thanks for writing.
Concerning the GPU/CPU performance of yambo, typically we expect some good acceleration when running on state-of-the-art GPUs for
HPC (eg nvidia A100 or alike).
If the acceleration is lacking, there could be multiple reasons triggering the behaviour:
  • * the system size is too small (too much data communication wrt computation), which is especially critical on GPUs with smaller band to memory
    * one runs with GPU oversubscription (more MPI tasks on a single GPU)
    * general miscompilation issues
According to your input/output files, you have no oversubscription (good!) and everything looks ok.
I notice that most of the time spent is in the GW(PPA) routine, and that you use the terminator algorithm (BG).
Terminator is not ported on GPUs in the public version (it has been developed but not released yet, probably coming with the next major release)...

This could be one of the reasons.
Moreover, your system looks rather small...

BTW: are CPU and GPU nodes the same in therms of HW ? (I see you are running using 20 MPI tasks in one case and 2 MPI * 4 threads on the other case)
If the HW is the same, probably using 2 MPI tasks + 10 threads in the GPU case would work better

Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

Dean
Posts: 106
Joined: Thu Oct 10, 2019 7:03 am

Re: low efficency of GPU version

Post by Dean » Wed Apr 26, 2023 11:37 am

Dear andrea.ferretti,
Thanks for your reply.
I found a 4 times of speed up without terminator. So it is the main factor. Then, it is expected to port terminator on GPUs in the public version.
Best,
Dr. Yimin Ding
Soochow University, China.

ChekeNerton523
Posts: 1
Joined: Fri May 31, 2024 10:09 pm

Re: low efficency of GPU version

Post by ChekeNerton523 » Tue Jun 04, 2024 10:28 pm

I've noticed a concerning trend regarding the low efficiency of the GPU version. It's evident that the current setup isn't yielding the desired performance levels, leading to significant setbacks in our projects. As someone deeply invested in optimizing our workflow, I believe it's crucial to explore alternative solutions to enhance productivity and performance. In this regard, I highly recommend considering professional assistance to address this issue effectively. Platforms like https://essaypro.com/dissertation-proposal-help offer specialized services, including dissertation proposal help, which can provide valuable insights and expertise to tackle complex challenges like this. Leveraging such resources could prove instrumental in identifying and implementing viable solutions to improve the efficiency of our GPU version.

Post Reply