"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano
-
mrefiore
- Posts: 14
- Joined: Thu Sep 12, 2019 6:55 pm
"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear YAMBO developers and users,
I'm trying to run a G0W0 calculation with YAMBO 5.2.2 and/or 5.2.3, specifically installed on the LEONARDO supercomputer @CINECA.
When the nscf is run with few empty bands, say 500, the G0W0 calculation is successful without any problem and the output is reasonable.
When the nscf is run with more bands, e.g. 3000, if I try to run a G0W0 calculation with a number of band <=3000 I get this error message when yambo is computing chi:
[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system]performing P(Z/C)GESV
This happens when in the input file I leave the default
X_and_IO_nCPU_LinAlg_INV=-1
If I change the value of this keyword and run LA on gpu, the code proceeds and ends. However, in this case I get NaN results in the output file
# K-point Band Eo [eV] E-Eo [eV] Sc|Eo [eV]
#
1 93 0.000000 NaN NaN
1 94 1.681785 NaN NaN
I suspect the two issues are related. I've tried with different yambo versions (5.2.2 and the newer 5.2.3) without luck.
I've attached the input, output and log files.
Thank you very much for your help!
Best,
Michele
I'm trying to run a G0W0 calculation with YAMBO 5.2.2 and/or 5.2.3, specifically installed on the LEONARDO supercomputer @CINECA.
When the nscf is run with few empty bands, say 500, the G0W0 calculation is successful without any problem and the output is reasonable.
When the nscf is run with more bands, e.g. 3000, if I try to run a G0W0 calculation with a number of band <=3000 I get this error message when yambo is computing chi:
[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system]performing P(Z/C)GESV
This happens when in the input file I leave the default
X_and_IO_nCPU_LinAlg_INV=-1
If I change the value of this keyword and run LA on gpu, the code proceeds and ends. However, in this case I get NaN results in the output file
# K-point Band Eo [eV] E-Eo [eV] Sc|Eo [eV]
#
1 93 0.000000 NaN NaN
1 94 1.681785 NaN NaN
I suspect the two issues are related. I've tried with different yambo versions (5.2.2 and the newer 5.2.3) without luck.
I've attached the input, output and log files.
Thank you very much for your help!
Best,
Michele
You do not have the required permissions to view the files attached to this post.
---
Michele Re Fiorentin, PhD
Politecnico di Torino
Michele Re Fiorentin, PhD
Politecnico di Torino
-
mrefiore
- Posts: 14
- Joined: Thu Sep 12, 2019 6:55 pm
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear all,
I can add more details to the problem.
By inspecting the ns.wf_fragment_* databases in the SAVE directory, it appears that some fragments contain -Infinity values. This happens ONLY when I run a Quantum ESPRESSO nscf calculation with a larger number of bands (in my case, 3000). When the nscf is performed with fewer bands (500 in my case), all ns.wf_fragment_* are fine and indeed in this case all YAMBO calculations are ok.
Always keeping in mind that I'm running on the GPU-accelerated LEONARDO cluster, I've tried to produce the SAVE with different compilations of p2y, included a non-GPU one, but the problem is always there. I've also tried p2y -b #bands as suggested in the YAMBO github page, without success. In contrast, I've always used GPU-accelerated QE versions.
Now, I'm wondering if this is a p2y problem or rather a QE issue when a "larger" number of bands is considered.
Thank you for your help!
Michele
I can add more details to the problem.
By inspecting the ns.wf_fragment_* databases in the SAVE directory, it appears that some fragments contain -Infinity values. This happens ONLY when I run a Quantum ESPRESSO nscf calculation with a larger number of bands (in my case, 3000). When the nscf is performed with fewer bands (500 in my case), all ns.wf_fragment_* are fine and indeed in this case all YAMBO calculations are ok.
Always keeping in mind that I'm running on the GPU-accelerated LEONARDO cluster, I've tried to produce the SAVE with different compilations of p2y, included a non-GPU one, but the problem is always there. I've also tried p2y -b #bands as suggested in the YAMBO github page, without success. In contrast, I've always used GPU-accelerated QE versions.
Now, I'm wondering if this is a p2y problem or rather a QE issue when a "larger" number of bands is considered.
Thank you for your help!
Michele
---
Michele Re Fiorentin, PhD
Politecnico di Torino
Michele Re Fiorentin, PhD
Politecnico di Torino
- Daniele Varsano
- Posts: 4298
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Michele,
you are not the first reporting this issue.
Can you inspect if the NaN (Infty) are already present in the raw wfs generated by QE or only in the ns.wf*?
If they are only present in the ns.wf* we can inspect what's disturbing p2y in converting the format.
Best,
Daniele
you are not the first reporting this issue.
Can you inspect if the NaN (Infty) are already present in the raw wfs generated by QE or only in the ns.wf*?
If they are only present in the ns.wf* we can inspect what's disturbing p2y in converting the format.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
mrefiore
- Posts: 14
- Joined: Thu Sep 12, 2019 6:55 pm
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Daniele,
Thank you for your reply!
I indeed wanted to investigate that, but unfortunately I don't know how to read into QE's binary wfc#.dat files.
However, I can add that if I run the QE calculation with a NON-GPU-accelerated version, the problem vanishes. Therefore, I strongly suspect the problem lies in QE-GPU.
Michele
Thank you for your reply!
I indeed wanted to investigate that, but unfortunately I don't know how to read into QE's binary wfc#.dat files.
However, I can add that if I run the QE calculation with a NON-GPU-accelerated version, the problem vanishes. Therefore, I strongly suspect the problem lies in QE-GPU.
Michele
---
Michele Re Fiorentin, PhD
Politecnico di Torino
Michele Re Fiorentin, PhD
Politecnico di Torino
- csk
- Posts: 15
- Joined: Wed Aug 28, 2024 9:54 pm
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Hi!
I can confirm the error when using the GPU build of QE (v 7.2, also on the Leonardo cluster) that when running with more bands, the wave functions contain Nan values. The error goes away when running with a CPU only version of QE.
For checking if your QE wave functions contain Nan values, you might find the following code snippet useful:
Do you know if this has been reported to the Quantum Espresso developers or would you assume that this is a problem of the compilation?
Cheers,
Christian
I can confirm the error when using the GPU build of QE (v 7.2, also on the Leonardo cluster) that when running with more bands, the wave functions contain Nan values. The error goes away when running with a CPU only version of QE.
For checking if your QE wave functions contain Nan values, you might find the following code snippet useful:
Code: Select all
import numpy as np
def read_wavefunction_k_qe_dat(dat_file):
# Credits: https://mattermodeling.stackexchange.com/questions/9149/how-to-read-qes-wfc-dat-files-with-python
with open(dat_file, 'rb') as f:
# Moves the cursor 4 bytes to the right
f.seek(4)
ik = np.fromfile(f, dtype='int32', count=1)[0]
xk = np.fromfile(f, dtype='float64', count=3)
ispin = np.fromfile(f, dtype='int32', count=1)[0]
gamma_only = bool(np.fromfile(f, dtype='int32', count=1)[0])
scalef = np.fromfile(f, dtype='float64', count=1)[0]
# Move the cursor 8 byte to the right
f.seek(8, 1)
ngw = np.fromfile(f, dtype='int32', count=1)[0]
igwx = np.fromfile(f, dtype='int32', count=1)[0]
npol = np.fromfile(f, dtype='int32', count=1)[0]
nbnd = np.fromfile(f, dtype='int32', count=1)[0]
# Move the cursor 8 byte to the right
f.seek(8, 1)
b1 = np.fromfile(f, dtype='float64', count=3)
b2 = np.fromfile(f, dtype='float64', count=3)
b3 = np.fromfile(f, dtype='float64', count=3)
f.seek(8,1)
mill = np.fromfile(f, dtype='int32', count=3*igwx)
mill = mill.reshape( (igwx, 3) )
evc = np.zeros( (nbnd, npol*igwx), dtype="complex128")
f.seek(8,1)
for i in range(nbnd):
evc[i,:] = np.fromfile(f, dtype='complex128', count=npol*igwx)
f.seek(8, 1)
return evc
kpoint = 1
wf = read_wavefunction_k_qe_dat('wfc' + str(kpoint) + '.dat')
print('WF contains NaN values: ', np.isnan(wf).any())
Cheers,
Christian
Christian Kern, University of Graz, Austria
- Daniele Varsano
- Posts: 4298
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Christian,
this problem has been reported to the QE communities and I know that it has been (at least partially) fixed by the QE developers. As far as I know, it is needed to avoid the -npools option in the QE run, but I do not know if it is needed to use a specific patched version of QE. You can maybe inquire the Leonardo user support or the QE mailing list.
Best,
Daniele
this problem has been reported to the QE communities and I know that it has been (at least partially) fixed by the QE developers. As far as I know, it is needed to avoid the -npools option in the QE run, but I do not know if it is needed to use a specific patched version of QE. You can maybe inquire the Leonardo user support or the QE mailing list.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
Aolei Wang
- Posts: 2
- Joined: Sat Dec 06, 2025 6:12 am
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear all,
I am testing k-point convergence for a GW calculation (Yambo v5.3.0) and I encountered the same error:
[ERROR] STOP signal received while in[07] Dynamic Dielectric Matrix (PPA)
[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system] performing P(Z/C)GESVXXX
The error does not appear for coarse k-grids 6×6×1, 9×9×1, 12×12×1, and 15×15×1, but it does appear starting at 18×18×1. If I set X_and_IO_nCPU_LinAlg_INV=1, the run proceeds without that error, but then all GW energy corrections also become NaN,.
I also checked the QE wavefunctions (wfc*) using the script provided by Christian and the ns.wf_fragments_* files produced by p2y after conversion — none of these files contain NaN values.
Could you help me diagnose what might be causing (a) the GESVXXX linear algebra error at higher k-point density, and (b) the NaN GW corrections when forcing X_and_IO_nCPU_LinAlg_INV=1? Any suggestions would be greatly appreciated.
I have attached the relevant input, output files for reference. Thank you very much!
Best regards,
Aolei Wang
Department of Physics & Astronomy
California State University, Northridge
I am testing k-point convergence for a GW calculation (Yambo v5.3.0) and I encountered the same error:
[ERROR] STOP signal received while in[07] Dynamic Dielectric Matrix (PPA)
[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system] performing P(Z/C)GESVXXX
The error does not appear for coarse k-grids 6×6×1, 9×9×1, 12×12×1, and 15×15×1, but it does appear starting at 18×18×1. If I set X_and_IO_nCPU_LinAlg_INV=1, the run proceeds without that error, but then all GW energy corrections also become NaN,.
I also checked the QE wavefunctions (wfc*) using the script provided by Christian and the ns.wf_fragments_* files produced by p2y after conversion — none of these files contain NaN values.
Could you help me diagnose what might be causing (a) the GESVXXX linear algebra error at higher k-point density, and (b) the NaN GW corrections when forcing X_and_IO_nCPU_LinAlg_INV=1? Any suggestions would be greatly appreciated.
I have attached the relevant input, output files for reference. Thank you very much!
Best regards,
Aolei Wang
Department of Physics & Astronomy
California State University, Northridge
You do not have the required permissions to view the files attached to this post.
Dr. Aolei Wang
Department of Physics & Astronomy
California State University, Northridge
Department of Physics & Astronomy
California State University, Northridge
- Daniele Varsano
- Posts: 4298
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Aolei,
actually, it is not straightforward to spot the problem.
Regarding the second run (serial linear algebra|) I notice from the report that yambo is reading some previously calculated plasmon pole databases. You can try to remove all the ./bse/ndb.pp* database and rerun your calculation. Hopefully this will solve the problem.
About the failure when using parallel linear algebra, at moment I do not have a clue.
Best,
Daniele
PS: Note that setting FFTGvecs to a low value (10Ry) could be a source of problem.
actually, it is not straightforward to spot the problem.
Regarding the second run (serial linear algebra|) I notice from the report that yambo is reading some previously calculated plasmon pole databases. You can try to remove all the ./bse/ndb.pp* database and rerun your calculation. Hopefully this will solve the problem.
About the failure when using parallel linear algebra, at moment I do not have a clue.
Best,
Daniele
PS: Note that setting FFTGvecs to a low value (10Ry) could be a source of problem.
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
Aolei Wang
- Posts: 2
- Joined: Sat Dec 06, 2025 6:12 am
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Daniele,
Thank you very much for the quick reply — I agree it’s not straightforward. Following your suggestions I made the following changes and re-ran the job: Increased FFTGvecs, EXXRLvcs, and VXCRLvcs from 10 Ry to 30 Ry and removed all ./bse/ndb.pp* databases. After this the run completes but the GW energy corrections are still NaN. I attach the modified input and the output/log files from this attempt for your reference.
Could you please advise what other diagnostics or experiments? I’m happy to run any targeted tests you suggest and can provide any additional logs. Thanks again for looking — any pointers on the next steps would be greatly appreciated.
Best,
Aolei
Thank you very much for the quick reply — I agree it’s not straightforward. Following your suggestions I made the following changes and re-ran the job: Increased FFTGvecs, EXXRLvcs, and VXCRLvcs from 10 Ry to 30 Ry and removed all ./bse/ndb.pp* databases. After this the run completes but the GW energy corrections are still NaN. I attach the modified input and the output/log files from this attempt for your reference.
Could you please advise what other diagnostics or experiments? I’m happy to run any targeted tests you suggest and can provide any additional logs. Thanks again for looking — any pointers on the next steps would be greatly appreciated.
Best,
Aolei
You do not have the required permissions to view the files attached to this post.
Dr. Aolei Wang
Department of Physics & Astronomy
California State University, Northridge
Department of Physics & Astronomy
California State University, Northridge
- Daniele Varsano
- Posts: 4298
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation
Dear Aolei,
actually, it is not easy to spot the problem.
Probably we should reproduce your error to investigate it deeply.
Before doing that, I suggest you to set up a more handy calculations doing the following.
1) consider to calculate just one or two Qps setting e.g.
this will speed up a lot the calculation, still reproducing the error as you have NAN for all the self energy calculations.
2) Exploit symmetries by slightly modify your atom positions setting them in symmetric points e.g.:
this will require to repeat your scf/nscf calculations but symmetries will be spotted by QE and Yambo reducing the grid of q points in the IBZ.
Best,
Daniele
actually, it is not easy to spot the problem.
Probably we should reproduce your error to investigate it deeply.
Before doing that, I suggest you to set up a more handy calculations doing the following.
1) consider to calculate just one or two Qps setting e.g.
Code: Select all
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|1|83|84|
%2) Exploit symmetries by slightly modify your atom positions setting them in symmetric points e.g.:
Code: Select all
In1 : 3.8602 2.22870 24.33920
In2 : 7.72048 4.4574 32.29734
In3 : 3.8602 2.22870 42.89197
In4 : 0 0 50.8602
Se1 : 0 0 21.9889
Se2 : 3.8602 2.22870 29.13587
Se3 : 0 0 34.8073
Se4 : 7.72048 4.4574 40.55669
Se5 : 3.8602 2.2287 47.68916
Se6 : 7.72048 4.4574 53.3784
this will require to repeat your scf/nscf calculations but symmetries will be spotted by QE and Yambo reducing the grid of q points in the IBZ.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/