"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
mrefiore
Posts: 11
Joined: Thu Sep 12, 2019 6:55 pm

"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Post by mrefiore » Thu Jul 11, 2024 4:51 pm

Dear YAMBO developers and users,

I'm trying to run a G0W0 calculation with YAMBO 5.2.2 and/or 5.2.3, specifically installed on the LEONARDO supercomputer @CINECA.
When the nscf is run with few empty bands, say 500, the G0W0 calculation is successful without any problem and the output is reasonable.

When the nscf is run with more bands, e.g. 3000, if I try to run a G0W0 calculation with a number of band <=3000 I get this error message when yambo is computing chi:

[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system]performing P(Z/C)GESV

This happens when in the input file I leave the default

X_and_IO_nCPU_LinAlg_INV=-1

If I change the value of this keyword and run LA on gpu, the code proceeds and ends. However, in this case I get NaN results in the output file

# K-point Band Eo [eV] E-Eo [eV] Sc|Eo [eV]
#
1 93 0.000000 NaN NaN
1 94 1.681785 NaN NaN

I suspect the two issues are related. I've tried with different yambo versions (5.2.2 and the newer 5.2.3) without luck.
I've attached the input, output and log files.
Thank you very much for your help!
Best,


Michele
You do not have the required permissions to view the files attached to this post.
---
Michele Re Fiorentin, PhD
Politecnico di Torino

mrefiore
Posts: 11
Joined: Thu Sep 12, 2019 6:55 pm

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Post by mrefiore » Thu Jul 11, 2024 6:45 pm

Dear all,

I can add more details to the problem.
By inspecting the ns.wf_fragment_* databases in the SAVE directory, it appears that some fragments contain -Infinity values. This happens ONLY when I run a Quantum ESPRESSO nscf calculation with a larger number of bands (in my case, 3000). When the nscf is performed with fewer bands (500 in my case), all ns.wf_fragment_* are fine and indeed in this case all YAMBO calculations are ok.
Always keeping in mind that I'm running on the GPU-accelerated LEONARDO cluster, I've tried to produce the SAVE with different compilations of p2y, included a non-GPU one, but the problem is always there. I've also tried p2y -b #bands as suggested in the YAMBO github page, without success. In contrast, I've always used GPU-accelerated QE versions.
Now, I'm wondering if this is a p2y problem or rather a QE issue when a "larger" number of bands is considered.
Thank you for your help!


Michele
---
Michele Re Fiorentin, PhD
Politecnico di Torino

User avatar
Daniele Varsano
Posts: 4053
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Post by Daniele Varsano » Mon Jul 15, 2024 9:29 am

Dear Michele,

you are not the first reporting this issue.
Can you inspect if the NaN (Infty) are already present in the raw wfs generated by QE or only in the ns.wf*?
If they are only present in the ns.wf* we can inspect what's disturbing p2y in converting the format.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

mrefiore
Posts: 11
Joined: Thu Sep 12, 2019 6:55 pm

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Post by mrefiore » Mon Jul 15, 2024 10:24 am

Dear Daniele,

Thank you for your reply!
I indeed wanted to investigate that, but unfortunately I don't know how to read into QE's binary wfc#.dat files.
However, I can add that if I run the QE calculation with a NON-GPU-accelerated version, the problem vanishes. Therefore, I strongly suspect the problem lies in QE-GPU.

Michele
---
Michele Re Fiorentin, PhD
Politecnico di Torino

User avatar
csk
Posts: 9
Joined: Wed Aug 28, 2024 9:54 pm

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Post by csk » Fri Oct 04, 2024 12:51 pm

Hi!

I can confirm the error when using the GPU build of QE (v 7.2, also on the Leonardo cluster) that when running with more bands, the wave functions contain Nan values. The error goes away when running with a CPU only version of QE.

For checking if your QE wave functions contain Nan values, you might find the following code snippet useful:

Code: Select all

import numpy as np

def read_wavefunction_k_qe_dat(dat_file):
# Credits: https://mattermodeling.stackexchange.com/questions/9149/how-to-read-qes-wfc-dat-files-with-python

    with open(dat_file, 'rb') as f:
        # Moves the cursor 4 bytes to the right
        f.seek(4)

        ik = np.fromfile(f, dtype='int32', count=1)[0]
        xk = np.fromfile(f, dtype='float64', count=3)
        ispin = np.fromfile(f, dtype='int32', count=1)[0]
        gamma_only = bool(np.fromfile(f, dtype='int32', count=1)[0])
        scalef = np.fromfile(f, dtype='float64', count=1)[0]

        # Move the cursor 8 byte to the right
        f.seek(8, 1)

        ngw = np.fromfile(f, dtype='int32', count=1)[0]
        igwx = np.fromfile(f, dtype='int32', count=1)[0]
        npol = np.fromfile(f, dtype='int32', count=1)[0]
        nbnd = np.fromfile(f, dtype='int32', count=1)[0]

        # Move the cursor 8 byte to the right
        f.seek(8, 1)

        b1 = np.fromfile(f, dtype='float64', count=3)
        b2 = np.fromfile(f, dtype='float64', count=3)
        b3 = np.fromfile(f, dtype='float64', count=3)

        f.seek(8,1)

        mill = np.fromfile(f, dtype='int32', count=3*igwx)
        mill = mill.reshape( (igwx, 3) )

        evc = np.zeros( (nbnd, npol*igwx), dtype="complex128")

        f.seek(8,1)
        for i in range(nbnd):
            evc[i,:] = np.fromfile(f, dtype='complex128', count=npol*igwx)
            f.seek(8, 1)

    return evc

kpoint = 1
wf = read_wavefunction_k_qe_dat('wfc' + str(kpoint) + '.dat')
print('WF contains NaN values: ', np.isnan(wf).any())
Do you know if this has been reported to the Quantum Espresso developers or would you assume that this is a problem of the compilation?

Cheers,
Christian
Christian Kern, University of Graz, Austria

User avatar
Daniele Varsano
Posts: 4053
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Post by Daniele Varsano » Fri Oct 04, 2024 4:48 pm

Dear Christian,

this problem has been reported to the QE communities and I know that it has been (at least partially) fixed by the QE developers. As far as I know, it is needed to avoid the -npools option in the QE run, but I do not know if it is needed to use a specific patched version of QE. You can maybe inquire the Leonardo user support or the QE mailing list.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply