Dear all,
I compiled the GPU version of yambo-5.1.2 based on NVHPC SDK.
But, baesd on my test, I found the GPU version of yambo shows much low efficency than that of CPU version.
For the same test run, it takes 1.5 minutes for CPU version of yambo, but it takes 6.5 minute for GPU version of yambo. Obviously, it is a confusing results.
The GPU devices I used is RTX 3090.
Details can be found in the attacments.
Is there some tips for the variable in inputfile?
Thanks in advance.
low efficency of GPU version
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano
-
- Posts: 106
- Joined: Thu Oct 10, 2019 7:03 am
low efficency of GPU version
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.
-
- Posts: 208
- Joined: Fri Jan 31, 2014 11:13 am
Re: low efficency of GPU version
Dear Dean,
thanks for writing.
Concerning the GPU/CPU performance of yambo, typically we expect some good acceleration when running on state-of-the-art GPUs for
HPC (eg nvidia A100 or alike).
If the acceleration is lacking, there could be multiple reasons triggering the behaviour:
I notice that most of the time spent is in the GW(PPA) routine, and that you use the terminator algorithm (BG).
Terminator is not ported on GPUs in the public version (it has been developed but not released yet, probably coming with the next major release)...
This could be one of the reasons.
Moreover, your system looks rather small...
BTW: are CPU and GPU nodes the same in therms of HW ? (I see you are running using 20 MPI tasks in one case and 2 MPI * 4 threads on the other case)
If the HW is the same, probably using 2 MPI tasks + 10 threads in the GPU case would work better
Andrea
thanks for writing.
Concerning the GPU/CPU performance of yambo, typically we expect some good acceleration when running on state-of-the-art GPUs for
HPC (eg nvidia A100 or alike).
If the acceleration is lacking, there could be multiple reasons triggering the behaviour:
- * the system size is too small (too much data communication wrt computation), which is especially critical on GPUs with smaller band to memory
* one runs with GPU oversubscription (more MPI tasks on a single GPU)
* general miscompilation issues
I notice that most of the time spent is in the GW(PPA) routine, and that you use the terminator algorithm (BG).
Terminator is not ported on GPUs in the public version (it has been developed but not released yet, probably coming with the next major release)...
This could be one of the reasons.
Moreover, your system looks rather small...
BTW: are CPU and GPU nodes the same in therms of HW ? (I see you are running using 20 MPI tasks in one case and 2 MPI * 4 threads on the other case)
If the HW is the same, probably using 2 MPI tasks + 10 threads in the GPU case would work better
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it
-
- Posts: 106
- Joined: Thu Oct 10, 2019 7:03 am
Re: low efficency of GPU version
Dear andrea.ferretti,
Thanks for your reply.
I found a 4 times of speed up without terminator. So it is the main factor. Then, it is expected to port terminator on GPUs in the public version.
Best,
Thanks for your reply.
I found a 4 times of speed up without terminator. So it is the main factor. Then, it is expected to port terminator on GPUs in the public version.
Best,
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.
-
- Posts: 1
- Joined: Fri May 31, 2024 10:09 pm
Re: low efficency of GPU version
I've noticed a concerning trend regarding the low efficiency of the GPU version. It's evident that the current setup isn't yielding the desired performance levels, leading to significant setbacks in our projects. As someone deeply invested in optimizing our workflow, I believe it's crucial to explore alternative solutions to enhance productivity and performance. In this regard, I highly recommend considering professional assistance to address this issue effectively. Platforms like https://essaypro.com/dissertation-proposal-help offer specialized services, including dissertation proposal help, which can provide valuable insights and expertise to tackle complex challenges like this. Leveraging such resources could prove instrumental in identifying and implementing viable solutions to improve the efficiency of our GPU version.