Page 1 of 1

"[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Posted: Thu Jul 11, 2024 4:51 pm
by mrefiore
Dear YAMBO developers and users,

I'm trying to run a G0W0 calculation with YAMBO 5.2.2 and/or 5.2.3, specifically installed on the LEONARDO supercomputer @CINECA.
When the nscf is run with few empty bands, say 500, the G0W0 calculation is successful without any problem and the output is reasonable.

When the nscf is run with more bands, e.g. 3000, if I try to run a G0W0 calculation with a number of band <=3000 I get this error message when yambo is computing chi:

[ERROR] LINEAR ALGEBRA driver [PARALLEL_lin_system]performing P(Z/C)GESV

This happens when in the input file I leave the default

X_and_IO_nCPU_LinAlg_INV=-1

If I change the value of this keyword and run LA on gpu, the code proceeds and ends. However, in this case I get NaN results in the output file

# K-point Band Eo [eV] E-Eo [eV] Sc|Eo [eV]
#
1 93 0.000000 NaN NaN
1 94 1.681785 NaN NaN

I suspect the two issues are related. I've tried with different yambo versions (5.2.2 and the newer 5.2.3) without luck.
I've attached the input, output and log files.
Thank you very much for your help!
Best,


Michele

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Posted: Thu Jul 11, 2024 6:45 pm
by mrefiore
Dear all,

I can add more details to the problem.
By inspecting the ns.wf_fragment_* databases in the SAVE directory, it appears that some fragments contain -Infinity values. This happens ONLY when I run a Quantum ESPRESSO nscf calculation with a larger number of bands (in my case, 3000). When the nscf is performed with fewer bands (500 in my case), all ns.wf_fragment_* are fine and indeed in this case all YAMBO calculations are ok.
Always keeping in mind that I'm running on the GPU-accelerated LEONARDO cluster, I've tried to produce the SAVE with different compilations of p2y, included a non-GPU one, but the problem is always there. I've also tried p2y -b #bands as suggested in the YAMBO github page, without success. In contrast, I've always used GPU-accelerated QE versions.
Now, I'm wondering if this is a p2y problem or rather a QE issue when a "larger" number of bands is considered.
Thank you for your help!


Michele

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Posted: Mon Jul 15, 2024 9:29 am
by Daniele Varsano
Dear Michele,

you are not the first reporting this issue.
Can you inspect if the NaN (Infty) are already present in the raw wfs generated by QE or only in the ns.wf*?
If they are only present in the ns.wf* we can inspect what's disturbing p2y in converting the format.

Best,
Daniele

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Posted: Mon Jul 15, 2024 10:24 am
by mrefiore
Dear Daniele,

Thank you for your reply!
I indeed wanted to investigate that, but unfortunately I don't know how to read into QE's binary wfc#.dat files.
However, I can add that if I run the QE calculation with a NON-GPU-accelerated version, the problem vanishes. Therefore, I strongly suspect the problem lies in QE-GPU.

Michele

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Posted: Fri Oct 04, 2024 12:51 pm
by csk
Hi!

I can confirm the error when using the GPU build of QE (v 7.2, also on the Leonardo cluster) that when running with more bands, the wave functions contain Nan values. The error goes away when running with a CPU only version of QE.

For checking if your QE wave functions contain Nan values, you might find the following code snippet useful:

Code: Select all

import numpy as np

def read_wavefunction_k_qe_dat(dat_file):
# Credits: https://mattermodeling.stackexchange.com/questions/9149/how-to-read-qes-wfc-dat-files-with-python

    with open(dat_file, 'rb') as f:
        # Moves the cursor 4 bytes to the right
        f.seek(4)

        ik = np.fromfile(f, dtype='int32', count=1)[0]
        xk = np.fromfile(f, dtype='float64', count=3)
        ispin = np.fromfile(f, dtype='int32', count=1)[0]
        gamma_only = bool(np.fromfile(f, dtype='int32', count=1)[0])
        scalef = np.fromfile(f, dtype='float64', count=1)[0]

        # Move the cursor 8 byte to the right
        f.seek(8, 1)

        ngw = np.fromfile(f, dtype='int32', count=1)[0]
        igwx = np.fromfile(f, dtype='int32', count=1)[0]
        npol = np.fromfile(f, dtype='int32', count=1)[0]
        nbnd = np.fromfile(f, dtype='int32', count=1)[0]

        # Move the cursor 8 byte to the right
        f.seek(8, 1)

        b1 = np.fromfile(f, dtype='float64', count=3)
        b2 = np.fromfile(f, dtype='float64', count=3)
        b3 = np.fromfile(f, dtype='float64', count=3)

        f.seek(8,1)

        mill = np.fromfile(f, dtype='int32', count=3*igwx)
        mill = mill.reshape( (igwx, 3) )

        evc = np.zeros( (nbnd, npol*igwx), dtype="complex128")

        f.seek(8,1)
        for i in range(nbnd):
            evc[i,:] = np.fromfile(f, dtype='complex128', count=npol*igwx)
            f.seek(8, 1)

    return evc

kpoint = 1
wf = read_wavefunction_k_qe_dat('wfc' + str(kpoint) + '.dat')
print('WF contains NaN values: ', np.isnan(wf).any())
Do you know if this has been reported to the Quantum Espresso developers or would you assume that this is a problem of the compilation?

Cheers,
Christian

Re: "[ERROR] LINEAR ALGEBRA driver" or NaN results in GW calculation

Posted: Fri Oct 04, 2024 4:48 pm
by Daniele Varsano
Dear Christian,

this problem has been reported to the QE communities and I know that it has been (at least partially) fixed by the QE developers. As far as I know, it is needed to avoid the -npools option in the QE run, but I do not know if it is needed to use a specific patched version of QE. You can maybe inquire the Leonardo user support or the QE mailing list.

Best,

Daniele