BSE calculation stops with NETCDF error

Run-time issues concerning Yambo that are not covered in the above forums.

Moderators: myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

Post Reply
cantele
Posts: 21
Joined: Wed Dec 23, 2009 2:58 pm
Contact:

BSE calculation stops with NETCDF error

Post by cantele » Fri Jan 22, 2010 4:59 pm

Dear all,

I'm trying to make a BSE calculation following a GW calculation. The yambo executable stops with the error (I report also some previous lines)
[P 16] Kernel filling [o/o] 6.250000
[FFT-BSK] Mesh size: 20 9 39
[RD./SAVE//ns.wf]-------------------------------------------
Bands in each block : 174
Blocks : 4
- S/N 007895 ---------------------------- v.03.02.01 r.448 -
[WF loader] Normalization (few states) min/max :0.2547E-9 0.9594
[ERROR] STOP signal received while in :[05.01] Screneed interaction header I/O
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound


By some debug I understood that the stop is in src/bse/K.F, in the line that, below, is bold/red
do iq_W=1,q%nibz
!
isc%qs(2)=iq_W
call scatterGamp(isc,'c')
!
call io_control(ACTION=RD_CL_IF_END,COM=NONE,SEC=(/2*iq_W,2*iq_W+1/),ID=XID)
ioX_err=ioX(X,Xw,XID)
!
forall(i2=1:BS_n_g_W) X_mat(i2,i2,1)=X_mat(i2,i2,1)+1.

The yambo was setup using the -o b -y d -N -V 4 options. Some relevant variables:
FFTGvecs= 2999
% BSEBands
165 | 184 | # [BSK] Bands range
BSENGBlk= 253
The DFT calculation was generated using Quantum-ESPRESSO v. 4.0.5. It has: nat=92, 348 electrons, 170 Ry cutoff, 696 bands, 8 k points.

Do you think that might be an issue of my calculation, or something related, for example, to max. vector size in NETCDF libraries?

Any help is appreciated.

Thanks,

Giovanni
Dr. Giovanni Cantele
CNR-SPIN and Univ. di Napoli "Federico II"
Phone: +39 081 676910
E-mail: giovanni.cantele@cnr.it
giovanni.cantele@na.infn.it
Web: http://people.na.infn.it/cantele
Skype: giocan74

User avatar
andrea marini
Posts: 325
Joined: Mon Mar 16, 2009 4:27 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by andrea marini » Fri Jan 22, 2010 7:03 pm

cantele wrote: [ERROR] STOP signal received while in :[05.01] Screneed interaction header I/O
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound



This is a strange message. It means that you're trying to read beyond the bounds of the variable. It may be related to an incomplete writing of the screened interaction database. Please attach all input/report and log files so that I can check more the dimension of the variables.
cantele wrote: By some debug I understood that the stop is in src/bse/K.F, in the line that, below, is bold/red

Code: Select all

  do iq_W=1,q%nibz
     !
     isc%qs(2)=iq_W
     call scatterGamp(isc,'c')
     !
       call io_control(ACTION=RD_CL_IF_END,COM=NONE,SEC=(/2*iq_W,2*iq_W+1/),ID=XID)
       [b][color=#FF0000]ioX_err=ioX(X,Xw,XID)[/color][/b]
       !
       forall(i2=1:BS_n_g_W) X_mat(i2,i2,1)=X_mat(i2,i2,1)+1.
At this point you are reading the ndb.em1s database. It really seems to me that the database is not complete. To understand if this is the case do a dump of the database and post it here. To dump it use ncdump

> ncdump ndb.em1s > ndb.em1s_DUMPFILE
cantele wrote: Do you think that might be an issue of my calculation, or something related, for example, to max. vector size in NETCDF libraries?
The Netcdf error message that Yambo gives when the number of variables/dimensions is too large is different. Anyway, in this case, it would be enough to recalculate the screened interaction using the -S option (fragmentation).

Andrea

P.S.: Giovanni please fill your signature with your complete affilitation.
Andrea MARINI
Istituto di Struttura della Materia, CNR, (Italy)

cantele
Posts: 21
Joined: Wed Dec 23, 2009 2:58 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by cantele » Sun Jan 24, 2010 10:09 pm

Dear Andrea,

thanks a lot for your prompt reply!
andrea marini wrote:
cantele wrote: [ERROR] STOP signal received while in :[05.01] Screneed interaction header I/O
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound


This is a strange message. It means that you're trying to read beyond the bounds of the variable. It may be related to an incomplete writing of the screened interaction database. Please attach all input/report and log files so that I can check more the dimension of the variables.
I attached a tgz file containing my input as well as log/report files.
andrea marini wrote:
cantele wrote: ioX_err=ioX(X,Xw,XID)
At this point you are reading the ndb.em1s database. It really seems to me that the database is not complete. To understand if this is the case do a dump of the database and post it here. To dump it use ncdump
> ncdump ndb.em1s > ndb.em1s_DUMPFILE
Well, this truly means that I did something completely wrong! Indeed, such database does not exist at all in my SAVE directory! Anyway, my guess is that this is due to a "too expensive calculation"

andrea marini wrote:
P.S.: Giovanni please fill your signature with your complete affilitation.
Very sorry! I didn't realize I could define a signature in my profile, and used to get my signature automatically attached when sending e-mails! Now it should be there!
You do not have the required permissions to view the files attached to this post.
Dr. Giovanni Cantele
CNR-SPIN and Univ. di Napoli "Federico II"
Phone: +39 081 676910
E-mail: giovanni.cantele@cnr.it
giovanni.cantele@na.infn.it
Web: http://people.na.infn.it/cantele
Skype: giocan74

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by Daniele Varsano » Sun Jan 24, 2010 11:56 pm

Dear Giovanni,
I had a look to your input file:
the runlevels options are:
##############
optics # [R OPT] Optics
bse # [R BSK] Bethe Salpeter Equation.
bss # [R BSS] Bethe Salpeter Equation solver
ppa # [R Xp] Plasmon Pole Approximation
#############

In the present version of Yambo, the Kernel of the Bethe-Salpeter equation
is constructed using a static RPA screening, which is missing in you calculation.

You can calculate it, producing an input with the "-b" option. This produce the database ndb.em1s,
so the static screening you need.

Of course you can combine this calculation with the other runlevels, so the option:
-o b -b -y d -N -V 4
should produce the input you need.

Anyway, I don't know why you have in your input the plasmon pole calculation. It is needed for the dynamical
screening in the GW approximation. May be you have it there for previous GW calclations??

Hope this solve your problem,

Cheers,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

cantele
Posts: 21
Joined: Wed Dec 23, 2009 2:58 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by cantele » Mon Jan 25, 2010 11:24 am

Daniele Varsano wrote:Dear Giovanni,
In the present version of Yambo, the Kernel of the Bethe-Salpeter equation
is constructed using a static RPA screening, which is missing in you calculation.
You can calculate it, producing an input with the "-b" option. This produce the database ndb.em1s,
so the static screening you need.
Of course you can combine this calculation with the other runlevels, so the option:
-o b -b -y d -N -V 4
should produce the input you need.
I did the change you suggested, unfortunately, the code stops exactly with the same error:
[ERROR] STOP signal received while in :[07.01] Screneed interaction header I/O
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound
Daniele Varsano wrote:Anyway, I don't know why you have in your input the plasmon pole calculation. It is needed for the dynamical
screening in the GW approximation. May be you have it there for previous GW calclations??
exactly!

My impression is that there is something more subtle than a missing database (now the em1s database is there). I think that if I now just delete
all the present databases, yambo will complain with specific error messages saying that those files are not there.

Maybe, there is also something wrong with my cluster, related to memory issues, but it is not that easy to debug the error. Actually, there was another person
(that maybe you know, A. Iacomino), making such kind of calculations exactly on the same cluster (but on a completely different system), who found the same
error but only when the calculation gets "too" heavy.

Do you have any suggestion on at least how I could debug or where I should search (for example where are the calls to the NetCDF routines giving the error)?

Thanks,

Giovanni
Dr. Giovanni Cantele
CNR-SPIN and Univ. di Napoli "Federico II"
Phone: +39 081 676910
E-mail: giovanni.cantele@cnr.it
giovanni.cantele@na.infn.it
Web: http://people.na.infn.it/cantele
Skype: giocan74

User avatar
andrea marini
Posts: 325
Joined: Mon Mar 16, 2009 4:27 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by andrea marini » Mon Jan 25, 2010 11:42 am

cantele wrote:
Daniele Varsano wrote:Dear Giovanni,
In the present version of Yambo, the Kernel of the Bethe-Salpeter equation
is constructed using a static RPA screening, which is missing in you calculation.
You can calculate it, producing an input with the "-b" option. This produce the database ndb.em1s,
so the static screening you need.
Of course you can combine this calculation with the other runlevels, so the option:
-o b -b -y d -N -V 4
should produce the input you need.
I did the change you suggested, unfortunately, the code stops exactly with the same error:
[ERROR] STOP signal received while in :[07.01] Screneed interaction header I/O
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound
Take it easy, guys. it is not a problem of missing database. In the log it is clear that Yambo checked the ndb.pp header and moved to the wf loading before giving the error message. So there is a screened interaction database. Daniele, yambo can read the Plasmon Pole database to build-up W.
My impression is that there is something more subtle than a missing database (now the em1s database is there). I think that if I now just delete
all the present databases, yambo will complain with specific error messages saying that those files are not there.

Maybe, there is also something wrong with my cluster, related to memory issues, but it is not that easy to debug the error. Actually, there was another person
(that maybe you know, A. Iacomino), making such kind of calculations exactly on the same cluster (but on a completely different system), who found the same
error but only when the calculation gets "too" heavy.
I agree that the error message is pointing to something not easy to trace back. Anyway "too heavy" must always means an incomplete database, a seg fault because of memory overload etc ... So we need to understand where exactly is the problem because maybe the size is not the problem but a code bug.

Please Giovanni post input/report and log of the Plasmon-Pole database calculation. And post also the gzipped output of the command

> ncdump ndb.pp >ndb.pp_DUMPED
Andrea MARINI
Istituto di Struttura della Materia, CNR, (Italy)

cantele
Posts: 21
Joined: Wed Dec 23, 2009 2:58 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by cantele » Mon Jan 25, 2010 11:59 am

andrea marini wrote:I agree that the error message is pointing to something not easy to trace back. Anyway "too heavy" must always means an incomplete database, a seg fault because of memory overload etc ... So we need to understand where exactly is the problem because maybe the size is not the problem but a code bug.
Please Giovanni post input/report and log of the Plasmon-Pole database calculation. And post also the gzipped output of the command
> ncdump ndb.pp >ndb.pp_DUMPED
OK, please download the required data from
people.na.infn.it/~cantele/netcdf_error.tgz
(attachment size too large to be posted here).

You find tow directories, one with the GW + RPA calculation, the second one with the subsequent BSE.

Giovanni
Dr. Giovanni Cantele
CNR-SPIN and Univ. di Napoli "Federico II"
Phone: +39 081 676910
E-mail: giovanni.cantele@cnr.it
giovanni.cantele@na.infn.it
Web: http://people.na.infn.it/cantele
Skype: giocan74

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation stops with NETCDF error

Post by Daniele Varsano » Mon Jan 25, 2010 12:12 pm

Take it easy, guys. it is not a problem of missing database. In the log it is clear that Yambo checked the ndb.pp header and moved to the wf loading before giving the error message. So there is a screened interaction database. Daniele, yambo can read the Plasmon Pole database to build-up W.
I didn't know that!!
Giovanni, Sorry for having you waste your time.

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply