Dear YAMBO developers and users,
I'm trying to run a G0W0 calculation with YAMBO 5.2.2 and/or 5.2.3, specifically installed on the LEONARDO supercomputer @CINECA.
When the nscf is run with few empty bands, say 500, the G0W0 calculation is successful without any problem and the output is reasonable.
When the nscf is run with more bands, e.g. 3000, if I try to run a G0W0 calculation with a number of band <=3000 I get this error message when yambo is computing chi:
[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system]performing P(Z/C)GESV
This happens when in the input file I leave the default
X_and_IO_nCPU_LinAlg_INV=-1
If I change the value of this keyword and run LA on gpu, the code proceeds and ends. However, in this case I get NaN results in the output file
# K-point Band Eo [eV] E-Eo [eV] Sc|Eo [eV]
#
1 93 0.000000 NaN NaN
1 94 1.681785 NaN NaN
I suspect the two issues are related. I've tried with different yambo versions (5.2.2 and the newer 5.2.3) without luck.
I've attached the input, output and log files.
Thank you very much for your help!
Best,
Michele
"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano
-
- Posts: 11
- Joined: Thu Sep 12, 2019 6:55 pm
"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
You do not have the required permissions to view the files attached to this post.
---
Michele Re Fiorentin, PhD
Politecnico di Torino
Michele Re Fiorentin, PhD
Politecnico di Torino
-
- Posts: 11
- Joined: Thu Sep 12, 2019 6:55 pm
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear all,
I can add more details to the problem.
By inspecting the ns.wf_fragment_* databases in the SAVE directory, it appears that some fragments contain -Infinity values. This happens ONLY when I run a Quantum ESPRESSO nscf calculation with a larger number of bands (in my case, 3000). When the nscf is performed with fewer bands (500 in my case), all ns.wf_fragment_* are fine and indeed in this case all YAMBO calculations are ok.
Always keeping in mind that I'm running on the GPU-accelerated LEONARDO cluster, I've tried to produce the SAVE with different compilations of p2y, included a non-GPU one, but the problem is always there. I've also tried p2y -b #bands as suggested in the YAMBO github page, without success. In contrast, I've always used GPU-accelerated QE versions.
Now, I'm wondering if this is a p2y problem or rather a QE issue when a "larger" number of bands is considered.
Thank you for your help!
Michele
I can add more details to the problem.
By inspecting the ns.wf_fragment_* databases in the SAVE directory, it appears that some fragments contain -Infinity values. This happens ONLY when I run a Quantum ESPRESSO nscf calculation with a larger number of bands (in my case, 3000). When the nscf is performed with fewer bands (500 in my case), all ns.wf_fragment_* are fine and indeed in this case all YAMBO calculations are ok.
Always keeping in mind that I'm running on the GPU-accelerated LEONARDO cluster, I've tried to produce the SAVE with different compilations of p2y, included a non-GPU one, but the problem is always there. I've also tried p2y -b #bands as suggested in the YAMBO github page, without success. In contrast, I've always used GPU-accelerated QE versions.
Now, I'm wondering if this is a p2y problem or rather a QE issue when a "larger" number of bands is considered.
Thank you for your help!
Michele
---
Michele Re Fiorentin, PhD
Politecnico di Torino
Michele Re Fiorentin, PhD
Politecnico di Torino
- Daniele Varsano
- Posts: 4053
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Michele,
you are not the first reporting this issue.
Can you inspect if the NaN (Infty) are already present in the raw wfs generated by QE or only in the ns.wf*?
If they are only present in the ns.wf* we can inspect what's disturbing p2y in converting the format.
Best,
Daniele
you are not the first reporting this issue.
Can you inspect if the NaN (Infty) are already present in the raw wfs generated by QE or only in the ns.wf*?
If they are only present in the ns.wf* we can inspect what's disturbing p2y in converting the format.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 11
- Joined: Thu Sep 12, 2019 6:55 pm
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Daniele,
Thank you for your reply!
I indeed wanted to investigate that, but unfortunately I don't know how to read into QE's binary wfc#.dat files.
However, I can add that if I run the QE calculation with a NON-GPU-accelerated version, the problem vanishes. Therefore, I strongly suspect the problem lies in QE-GPU.
Michele
Thank you for your reply!
I indeed wanted to investigate that, but unfortunately I don't know how to read into QE's binary wfc#.dat files.
However, I can add that if I run the QE calculation with a NON-GPU-accelerated version, the problem vanishes. Therefore, I strongly suspect the problem lies in QE-GPU.
Michele
---
Michele Re Fiorentin, PhD
Politecnico di Torino
Michele Re Fiorentin, PhD
Politecnico di Torino
- csk
- Posts: 9
- Joined: Wed Aug 28, 2024 9:54 pm
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Hi!
I can confirm the error when using the GPU build of QE (v 7.2, also on the Leonardo cluster) that when running with more bands, the wave functions contain Nan values. The error goes away when running with a CPU only version of QE.
For checking if your QE wave functions contain Nan values, you might find the following code snippet useful:
Do you know if this has been reported to the Quantum Espresso developers or would you assume that this is a problem of the compilation?
Cheers,
Christian
I can confirm the error when using the GPU build of QE (v 7.2, also on the Leonardo cluster) that when running with more bands, the wave functions contain Nan values. The error goes away when running with a CPU only version of QE.
For checking if your QE wave functions contain Nan values, you might find the following code snippet useful:
Code: Select all
import numpy as np
def read_wavefunction_k_qe_dat(dat_file):
# Credits: https://mattermodeling.stackexchange.com/questions/9149/how-to-read-qes-wfc-dat-files-with-python
with open(dat_file, 'rb') as f:
# Moves the cursor 4 bytes to the right
f.seek(4)
ik = np.fromfile(f, dtype='int32', count=1)[0]
xk = np.fromfile(f, dtype='float64', count=3)
ispin = np.fromfile(f, dtype='int32', count=1)[0]
gamma_only = bool(np.fromfile(f, dtype='int32', count=1)[0])
scalef = np.fromfile(f, dtype='float64', count=1)[0]
# Move the cursor 8 byte to the right
f.seek(8, 1)
ngw = np.fromfile(f, dtype='int32', count=1)[0]
igwx = np.fromfile(f, dtype='int32', count=1)[0]
npol = np.fromfile(f, dtype='int32', count=1)[0]
nbnd = np.fromfile(f, dtype='int32', count=1)[0]
# Move the cursor 8 byte to the right
f.seek(8, 1)
b1 = np.fromfile(f, dtype='float64', count=3)
b2 = np.fromfile(f, dtype='float64', count=3)
b3 = np.fromfile(f, dtype='float64', count=3)
f.seek(8,1)
mill = np.fromfile(f, dtype='int32', count=3*igwx)
mill = mill.reshape( (igwx, 3) )
evc = np.zeros( (nbnd, npol*igwx), dtype="complex128")
f.seek(8,1)
for i in range(nbnd):
evc[i,:] = np.fromfile(f, dtype='complex128', count=npol*igwx)
f.seek(8, 1)
return evc
kpoint = 1
wf = read_wavefunction_k_qe_dat('wfc' + str(kpoint) + '.dat')
print('WF contains NaN values: ', np.isnan(wf).any())
Cheers,
Christian
Christian Kern, University of Graz, Austria
- Daniele Varsano
- Posts: 4053
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Christian,
this problem has been reported to the QE communities and I know that it has been (at least partially) fixed by the QE developers. As far as I know, it is needed to avoid the -npools option in the QE run, but I do not know if it is needed to use a specific patched version of QE. You can maybe inquire the Leonardo user support or the QE mailing list.
Best,
Daniele
this problem has been reported to the QE communities and I know that it has been (at least partially) fixed by the QE developers. As far as I know, it is needed to avoid the -npools option in the QE run, but I do not know if it is needed to use a specific patched version of QE. You can maybe inquire the Leonardo user support or the QE mailing list.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/