Haydock error
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan
- malwi
- Posts: 42
- Joined: Mon Feb 29, 2016 1:00 pm
Haydock error
Dear Yambo Team,
I got an error during BSE Haydock step:
[03.01] Haydock Solver for abs @q1, scheme hermitian
====================================================
Accuracy (requested) : -0.020000 [o/o]
[ERROR] STOP signal received while in[03.01] Haydock Solver for abs @q1, scheme hermitian
[ERROR]Bf=NaN likely because some eigenvalue of the BSE is negative.
Yambo 5.0 was compiled with Intel 6.7 on Prometheus (Cyfronet Centre).
Best regards,
Gosia
I got an error during BSE Haydock step:
[03.01] Haydock Solver for abs @q1, scheme hermitian
====================================================
Accuracy (requested) : -0.020000 [o/o]
[ERROR] STOP signal received while in[03.01] Haydock Solver for abs @q1, scheme hermitian
[ERROR]Bf=NaN likely because some eigenvalue of the BSE is negative.
Yambo 5.0 was compiled with Intel 6.7 on Prometheus (Cyfronet Centre).
Best regards,
Gosia
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
- Daniele Varsano
- Posts: 4278
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Haydock error
Dear Gosia,
can you please post your input and report files?
You can upload as attachment by renaming them with an allowed suffix (e.g. .txt).
Best,
Daniele
can you please post your input and report files?
You can upload as attachment by renaming them with an allowed suffix (e.g. .txt).
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- malwi
- Posts: 42
- Joined: Mon Feb 29, 2016 1:00 pm
Re: Haydock error
Dear Daniele,
thank you again.
I attached the files.
Best regards,
Gosia
thank you again.
I attached the files.
Best regards,
Gosia
You do not have the required permissions to view the files attached to this post.
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
- Daniele Varsano
- Posts: 4278
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Haydock error
Dear Gosia,
I can't see anything wrong in your input file.
It seems you end up with a negative eigenvalue, I do not know if something weird happened in the calculation of the BSE kernel you performed in a previous run, but the matrix seems to be read correctly, and I assume your scissor value is correct.
Maybe someone expert on the Haydock algorithm and its parallel implementation can gives you some hint on how to spot the problem.
Best,
Daniele
I can't see anything wrong in your input file.
It seems you end up with a negative eigenvalue, I do not know if something weird happened in the calculation of the BSE kernel you performed in a previous run, but the matrix seems to be read correctly, and I assume your scissor value is correct.
Maybe someone expert on the Haydock algorithm and its parallel implementation can gives you some hint on how to spot the problem.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- malwi
- Posts: 42
- Joined: Mon Feb 29, 2016 1:00 pm
Re: Haydock error
Dear Daniele,
thank you.
I tried also full diagonalization and now it stops with some memory problem.
I changed the number of tasks and nodes to get more memory - I will ask Maciej Czuchry what could be wrong with that.
My doubts now are: can I change the number of cpus used for kernel to different number of cpus for diagonalization or Haydock?
This series reads files from the previous calculations.
The case, which I am calculating is actually the same as in the previous posted error. But now I use k-meshes 12 and 16 in the nscf step;
and before, it was 4. Is any physical/numerical reason which can cause the problem with Haydock? I mean for example: taking the bands
in BSE which exchange the order of symmetry at some k-points?
Best,
Gosia
thank you.
I tried also full diagonalization and now it stops with some memory problem.
I changed the number of tasks and nodes to get more memory - I will ask Maciej Czuchry what could be wrong with that.
My doubts now are: can I change the number of cpus used for kernel to different number of cpus for diagonalization or Haydock?
This series reads files from the previous calculations.
The case, which I am calculating is actually the same as in the previous posted error. But now I use k-meshes 12 and 16 in the nscf step;
and before, it was 4. Is any physical/numerical reason which can cause the problem with Haydock? I mean for example: taking the bands
in BSE which exchange the order of symmetry at some k-points?
Best,
Gosia
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
- Daniele Varsano
- Posts: 4278
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Haydock error
Dear gosia,
The matrix is far too large for a full diagonalisation.
I cannot say much on the parallelism on the haydock procedure. Other developers expert on that will answer.
Best,
Daniele
The matrix is far too large for a full diagonalisation.
I cannot say much on the parallelism on the haydock procedure. Other developers expert on that will answer.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- Davide Sangalli
- Posts: 649
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
Re: Haydock error
Dear all,
not easy to say which could be the problem.
A simple test might be to re-run the Haydock solver putting a higher QP correction in input
% KfnQP_E
1.400000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
%
It's probably unphysical in this system, but just to check if there is some pole with E< 1eV, which would, in turn, give a negative eigenvalue with 0.4 eV of QP corrections.
Best,
D
not easy to say which could be the problem.
A simple test might be to re-run the Haydock solver putting a higher QP correction in input
% KfnQP_E
1.400000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
%
It's probably unphysical in this system, but just to check if there is some pole with E< 1eV, which would, in turn, give a negative eigenvalue with 0.4 eV of QP corrections.
Best,
D
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
- malwi
- Posts: 42
- Joined: Mon Feb 29, 2016 1:00 pm
Re: Haydock error
Dear Davide and Daniele,
I am back to this problem after a break for writing a proposal....
Accidentally, I succeeded to get Haydock result for more k-points:
Maciej Czuchry (from Cyfronet) suggested to take less cpu, and the calculations went through.
But I do not understand it.
When I use nscf k-mesh 8 8 8 (IBZ 35) for BSE, then it runs on 432 cpu.
When I use nscf k-mesh 10 10 10 (56 IBZ) (or more kpoints 12 or 16) for BSE,
then it does not run on 432 cpu, but runs on 144 cpu.
Why? How it is parallelized?
I do not say any thing about the parallel structure in the input, letting it go by default.
Back to Davide advice: maybe I really put to small KfnQP_E
My DFT+SOC gives gap 0.3, GW correction is 0.4 eV
What should be given for KfnQP_E, is it just the GW correction 0.4 or DFT+GW 0.7 ?
Second surprise for me is that the energy parameters for exchange and correlation
converge in BSE much faster than in GW
(for example BSENGexx= 10 Ry and EXXRLvcs= 30 Ry is convergent,
BSENGBlk= 4 Ry and VXCRLvcs= 6 Ry) and similarly NGsBlkXs= 4 Ry is enough in BSE.
The same way, number of bands in polarization is convergent 10 times faster in BSE!
% BndsRnXp
1 | 1000 | # [Xp] Polarization function bands
% BndsRnXs
1 | 108 | # [Xs] Polarization function bands
108 gives no difference in the result with respect to 1000 is used for Xs in BSE, why it is so?
On the other hand, I am not surprised that k-points nscf 4 4 4 are ok for GW, while k-points nscf 16 16 16 are
not enough for BSE. This is because the excitations in this system are not from VBM to CBM but much higher,
which is in agreement with the experiment for the optical pumping.
Best regards,
Gosia
I am back to this problem after a break for writing a proposal....
Accidentally, I succeeded to get Haydock result for more k-points:
Maciej Czuchry (from Cyfronet) suggested to take less cpu, and the calculations went through.
But I do not understand it.
When I use nscf k-mesh 8 8 8 (IBZ 35) for BSE, then it runs on 432 cpu.
When I use nscf k-mesh 10 10 10 (56 IBZ) (or more kpoints 12 or 16) for BSE,
then it does not run on 432 cpu, but runs on 144 cpu.
Why? How it is parallelized?
I do not say any thing about the parallel structure in the input, letting it go by default.
Back to Davide advice: maybe I really put to small KfnQP_E
My DFT+SOC gives gap 0.3, GW correction is 0.4 eV
What should be given for KfnQP_E, is it just the GW correction 0.4 or DFT+GW 0.7 ?
Second surprise for me is that the energy parameters for exchange and correlation
converge in BSE much faster than in GW
(for example BSENGexx= 10 Ry and EXXRLvcs= 30 Ry is convergent,
BSENGBlk= 4 Ry and VXCRLvcs= 6 Ry) and similarly NGsBlkXs= 4 Ry is enough in BSE.
The same way, number of bands in polarization is convergent 10 times faster in BSE!
% BndsRnXp
1 | 1000 | # [Xp] Polarization function bands
% BndsRnXs
1 | 108 | # [Xs] Polarization function bands
108 gives no difference in the result with respect to 1000 is used for Xs in BSE, why it is so?
On the other hand, I am not surprised that k-points nscf 4 4 4 are ok for GW, while k-points nscf 16 16 16 are
not enough for BSE. This is because the excitations in this system are not from VBM to CBM but much higher,
which is in agreement with the experiment for the optical pumping.
Best regards,
Gosia
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
- Daniele Varsano
- Posts: 4278
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Haydock error
Dear Gosia,
the product of nk*neh*nt has to be the number of MPI you are using.
and relative variables ( BndsRnXp, NGsBlkXp).
Best,
Daniele
In yambo there is a default parallelisation that may fail, my advise is to explicitly assign CPU in input on different roles. I suggest you to use cpu on k role as much as possibile:Why? How it is parallelized?
I do not say any thing about the parallel structure in the input, letting it go by default.
Code: Select all
BS_CPU= "nk neh nt" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
it is the correction: 0.4 eVMy DFT+SOC gives gap 0.3, GW correction is 0.4 eV
What should be given for KfnQP_E, is it just the GW correction 0.4 or DFT+GW 0.7 ?
his is not suprising, these are different terms, in GW it is a Fock integral, in BSE it is essentially an Hartree term.converge in BSE much faster than in GW
(for example BSENGexx= 10 Ry and EXXRLvcs= 30 Ry is convergent,
BSENGBlk= 4 Ry and VXCRLvcs= 6 Ry) and similarly NGsBlkXs= 4 Ry is enough in BSE.
This is a but stranger: anyway you can use the screening already calculated for GW stored in ndb.pp for the BSE, yambo will take the static part (use ppa in the input instead em1s)The same way, number of bands in polarization is convergent 10 times faster in BSE!
and relative variables ( BndsRnXp, NGsBlkXp).
As you say, k convergence in BSE can be more problematic, you need a better discretisation to include relevant transition in the BSE matrix.On the other hand, I am not surprised that k-points nscf 4 4 4 are ok for GW, while k-points nscf 16 16 16 are
not enough for BSE. This is because the excitations in this system are not from VBM to CBM but much higher,
which is in agreement with the experiment for the optical pumping.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- malwi
- Posts: 42
- Joined: Mon Feb 29, 2016 1:00 pm
Re: Haydock error
Thank you very much Daniele,
I continue as you said.
Best regards,
Gosia
I continue as you said.
Best regards,
Gosia
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland