NETCDF error

You can find here problems arising when using old releases of Yambo (< 5.0). Issues as parallelization strategy, performance issues and other technical aspects.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

Locked
vjhalani
Posts: 17
Joined: Mon Jan 22, 2018 8:23 pm

NETCDF error

Post by vjhalani » Mon Apr 29, 2019 6:43 pm

Hi all,

I'd like to check RPA on a 50^3 k-grid in GaAs to check convergence, but am running into an issue. To get the yambo setup to run, I had to add the DBsFRAGpm= "+QINDX" option. When I then run an RPA calculation, I get the following error:

P0001: [ERROR] File ./SAVE//ndb.kindx; Variable Sindx; NetCDF: Variable not found

This was the second time I tried, the first time I got an error related to "Variable Sindx; NetCDF: One or more variable sizes violate format constraints"

Looking in the forum, I saw that this was related to NetCDF databases being >2Gb in size and the solution was to compile with large file support, or recommended was to add the -S option to yambo to fragment the database. The -S option doesn't seem to be a supported feature anymore right?

To clarify, I ran yambo initialization successfully only after adding DBsFRAGpm= "+QINDX". Then doing an RPA I get the format constraint error. Then I tried adding "-S" which obviously didn't work, then trying to run RPA again to reproduce the error I got the variable not found error.

Thanks,
Vatsal
Vatsal A. Jhalani
Postdoctoral Scholar | Department of Applied Physics
California Institute of Technology

User avatar
Davide Sangalli
Posts: 610
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: NETCDF error

Post by Davide Sangalli » Mon Apr 29, 2019 8:34 pm

Ciao Vatsal,
you are really pushing yambo using big k-points grids.

First, as you noticed, the -S option does not exist anymore and it was replaced by the option

Code: Select all

DBsFRAGpm= "+QINDX"
which is DB specific and thus more flexible.

The QINDX db is needed for any run, thus if you put such option in the input for the setup, you need to keep it in all the subsequent input files.
Otherwise yambo checks the ndb.kindx, finds that it is fragmented while the input file does not want it fragmented and tries to recompute it.

Accordingly, this is my guess for what is happening.
a) you generate the ndb.kindx with fragments
b) run the RPA calculation, without specifically asking for fragmentation. yambo tries to re-generate the ndb.kindx and fails for the violation of the formal constraints
c) then running RPA again, yambo finds the corrupted ndb.kindx and gives the message Variable not found

In summary just erase all ndb.kindx and run first the setup and the subsequent RPA calculation with

Code: Select all

DBsFRAGpm= "+QINDX"
in input in both cases.

An alternative is to recompile yambo using netcdf v4, adding to the configure the option

Code: Select all

--enable-netcdf-hdf5
It has less formal constraints and should work also without the fragmentation.
Yambo should still be able to read all the databases in older format, but will generate all dew databases in HDF5 format.

Both solutions are fine for your case from what I understand.
If you push yambo even further you may need the netcdf v4 and the fragmentation together.

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

vjhalani
Posts: 17
Joined: Mon Jan 22, 2018 8:23 pm

Re: NETCDF error

Post by vjhalani » Wed May 01, 2019 7:34 pm

Ciao,

Yeah, I can tell I am pushing yambo, but I know it can do it!!!! :D

Anyways, thanks for the suggestion, I did both fragmentation and compiling with netcdf v4 and was able to get past the error.

However... now I get another one. Yambo crashes at the dipoles, and some of the log files end with the error:
P0303: [ERROR] STOP signal received while in :[04] Dipoles
P0303: [ERROR]Allocation attempt of DIP_S of negative size.

I seem to get this type of negative allocation attempt a lot when pushing to big k-point grids for different variables. Note I am using the Covariant dipole approach.

Grazie!
Vatsal
Vatsal A. Jhalani
Postdoctoral Scholar | Department of Applied Physics
California Institute of Technology

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: NETCDF error

Post by haseebphysics1 » Sun Mar 29, 2020 11:10 pm

Dear developers,

I am facing the NetCDF error while doing BSE! Everything was going well when I was increasing BSEbands but suddenly after 130-220 bands, the error started to come! "P1: [ERROR] File ./BSEBnds_130-225//ndb.BS_Q1_CPU_0; Variable W_DbGd; NetCDF: Invalid dimension size"
Is it that dimension of the matrix has become too large to handle without NetCDF libraries? Files are attached.

I then tried to compile Yambo 4.5.1 (in another folder) with just addition of

Code: Select all

 --enable-netcdf-hdf5
, in the previous ./configure command. It compiled, but interestingly, it compiled in the serial version while ifort and mpi were detected successfully.

My configure command:

Code: Select all

./configure  FC=ifort  F77=ifort  CC=icc  PFC=mpiifort  --with-blas-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core"  --with-lapack-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core"  --with-scalapack-libs="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64"  --with-blacs-libs="-L$MKLROOT/lib/intel64 -lmkl_blacs_intelmpi_lp64"  --with-fft-libs="-mkl"  --enable-time-profile  --enable-memory-profile  --enable-open-mp  --enable-msgs-comps --enable-netcdf-hdf5


So, was it really an error due to the non-NetCDF compilation of code? If yes, then can you please suggest me the way it should be compiled with those libraries?

On Yambo-wiki pages some commands are given to give the path manually in configure. But is there a way to install it on the fly just like Yambo does with other needed packages?

Thanks,
You do not have the required permissions to view the files attached to this post.
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: NETCDF error

Post by Daniele Varsano » Mon Mar 30, 2020 8:07 am

Dear Haseeb,
in Yambo all the database I/O is handled by the netcdf library, they are downloaded and compiled when running the makefile.
What you are trying to do now is to add the hdf5 support. I do not know if this will make things work.
Anyway, the problem seems to be related to the double grid variable (are you using it?), and it is not easy to understand what is going wrong.
Note that your matrix is extremely large, are you sure you need such a large matrix to have your results converged?

About the compilation, it would be needed to have a look to the config.log file.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
Davide Sangalli
Posts: 610
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: NETCDF error

Post by Davide Sangalli » Mon Mar 30, 2020 9:50 am

Dear Haseeb,
let me specify that using the

Code: Select all

--enable-netcdf-hdf5

you switched from netcd3 to netcdf4 which is based on HDF5

It maybe that such change solved the problem.
In general for big BSE simulation I suggest to use

Code: Select all

--enable-hdf5-par-io
If you want to dig deeper in the issue you were having, try to recompile the netcdf library.
You need to add

Code: Select all

--enable-logging
to the configure of the netcf library, i.e. in

Code: Select all

lib/netcdf/Makefile.loc
lib/netcdff/Makefile.loc
add such flag to the AUX_FLAGS

Code: Select all

AUXFLAGS=--prefix=$(LIBPATH) \
         --without-pic --enable-static --disable-shared \
         --disable-dap --enable-logging $(netcdf_opt)
Moreover you need to add the following lines in src/setup/setup.F

Code: Select all

#if defined _HDF5_IO
 ! For I/O debug, see below
 !use IO_m,           ONLY:netcdf_call,nf90_set_log_level
#endif
before the include memory.h, and the following lines

Code: Select all

#if defined _HDF5_IO
 ! This is very useful for I/O debug
 ! NETCDF library need to be compiled with --enable-loggning flag
 call netcdf_call(nf90_set_log_level(3),1)
#endif
just at the beginning of the file (before call section line)

Repeating the run, the NETCDF library will be much more verbose, possibly giving useful information on the source of the NETCDF error

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: NETCDF error

Post by haseebphysics1 » Mon Mar 30, 2020 2:30 pm

Dear developers, thanks for your kind responses!

Since I have not installed NetCDF or hdf5 libraries that's why I wanted to use the internal libraries during compilation!

The problem with non-mpi was new intel installations whose mpi has to be source separately apart from the ifort compiler. So, now new compilation is parallel as it should, but by using just --enable-netcdf-hdf5 (not using --enable-hdf5-par-io), the original error is replaced by:

P1: [ERROR] File ./BSEBnds_130-225//ndb.BS_Q1_CPU_0; Variable BSE_RESONANT; NetCDF: HDF error

and after using the --enable-hdf5-par-io the error seems to be not coming but the kernel is taking a long time to even start! Is it okay? I think it is doing a heavy disk-IO, for it has created a massive file (ndb.BS_PAR_Q1) of around 180 GB!


Note that your matrix is extremely large, are you sure you need such a large matrix to have your results converged?
Dear Daniele, I think I need to, I have attached some results of BSEbands convergence testing. As you can see, results are changing when I am increasing the bands I would be happy if you can see them and comment.

Thank you,
You do not have the required permissions to view the files attached to this post.
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: NETCDF error

Post by Daniele Varsano » Mon Mar 30, 2020 2:44 pm

Dear Haseeb,
convergence really depends on which part of the spectrum you are interested in:
of course the more bands you introduce the more will change the high energy part of the spectrum, the question is if it make sense to look above the continuum region. Now I do not know to what system you are looking at, usually one it is interested in excitations below the quasi-particle gap (bound excitons), and the range 130-200 seems to me provide converged results up to 6eV.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: NETCDF error

Post by haseebphysics1 » Mon Mar 30, 2020 4:21 pm

Dear Daniele,

thanks for looking into it.
I do not know to what system you are looking at
.

Actually, I am interested in the visible portion (due to the photocatalytic system). But even at low energies, how I trust the 130-200 result, since the real part of the dielectric is changing so much (even after 200 bands)! For imaginary parts, it makes sense that more bands mean more transition and hence imaginary part should increase as it is doing! But how should I be certain about the real part which since I need to report the refractive index which in turn is related to static dielectric function.

Moreover, I can also stay within QP bandgap, but since I have the experimental data up to 6.2 eV that's why for comparison of different theoretical models, I'm calculating the results up to 6.5 eV.

Thanking you,
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: NETCDF error

Post by Daniele Varsano » Mon Mar 30, 2020 4:42 pm

Dear Haseeb,

I was referring to the excitation energies, ie your spectrum seems to be converged quantitatively in the low energy part and qualitatively at higher energy.
Real and imaginary parts are related by Kramers-Kronig relations, so also a change in the high energy part in the imaginary part reflects in a change in the real part at all the energies. In any case, the change you have with the higher number of bands is around 1% of the value of the function.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Locked