I have run a same GW case on a CPU HPC and GPU machine and find that the time consuming is completely different in each part. Here I list the configuration of each machine and time consuming in each part.
- CPU HPC: 32MPI+8OMP (256cores total)
- Dipoles: 19s
- Dynamic Dielectric Matrix (PPA): 01h-04m
- Local Exchange-Correlation + Non-Local Fock: 03h-20m
- Dyson equation: 02d-09h-00m
- GPU HPC: 8CPU cores+8GPU (Tesla A100)
- Dipoles: 38s
- Dynamic Dielectric Matrix (PPA): 19h-21m
- Local Exchange-Correlation + Non-Local Fock: 13m-42s
- Dyson equation: 01d-19h-34m
Best,
Jason