low efficency of GPU version

Dean · Post by **Dean** » Mon Apr 24, 2023 11:10 am

Dear all,
I compiled the GPU version of yambo-5.1.2 based on NVHPC SDK.
But, baesd on my test, I found the GPU version of yambo shows much low efficency than that of CPU version.
For the same test run, it takes 1.5 minutes for CPU version of yambo, but it takes 6.5 minute for GPU version of yambo. Obviously, it is a confusing results.
The GPU devices I used is RTX 3090.
Details can be found in the attacments.
Is there some tips for the variable in inputfile?
Thanks in advance.

Post by **andrea.ferretti** » Wed Apr 26, 2023 9:00 am

Dear Dean,

thanks for writing.
Concerning the GPU/CPU performance of yambo, typically we expect some good acceleration when running on state-of-the-art GPUs for
HPC (eg nvidia A100 or alike).
If the acceleration is lacking, there could be multiple reasons triggering the behaviour:

* the system size is too small (too much data communication wrt computation), which is especially critical on GPUs with smaller band to memory
* one runs with GPU oversubscription (more MPI tasks on a single GPU)
* general miscompilation issues

According to your input/output files, you have no oversubscription (good!) and everything looks ok.
I notice that most of the time spent is in the GW(PPA) routine, and that you use the terminator algorithm (BG).
Terminator is not ported on GPUs in the public version (it has been developed but not released yet, probably coming with the next major release)...

This could be one of the reasons.
Moreover, your system looks rather small...

BTW: are CPU and GPU nodes the same in therms of HW ? (I see you are running using 20 MPI tasks in one case and 2 MPI * 4 threads on the other case)
If the HW is the same, probably using 2 MPI tasks + 10 threads in the GPU case would work better

Andrea

Dean · Post by **Dean** » Wed Apr 26, 2023 11:37 am

Dear andrea.ferretti，
Thanks for your reply.
I found a 4 times of speed up without terminator. So it is the main factor. Then, it is expected to port terminator on GPUs in the public version.
Best,

Yambo Community Forum

low efficency of GPU version

low efficency of GPU version

Re: low efficency of GPU version

Re: low efficency of GPU version