Variable BSE_RESONANT; NetCDF: HDF error

Deals with issues related to computation of optical spectra in reciprocal space: RPA, TDDFT, local field effects.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

milesj
Posts: 29
Joined: Thu Jan 26, 2023 9:27 pm

Re: Variable BSE_RESONANT; NetCDF: HDF error

Post by milesj » Wed Mar 15, 2023 8:57 am

Hi Davide,

Yes that run was with the haydock solver (note BSSmod="h" in the input file). I've attached a config.log file for the same configuration call.

I also tried modifying the ./config/setup file as pictured below since the report file said I could. I replaced all instances of 'serial' with 'parallel', and changed '--disable-parallel' to '--enable-parallel'. This resulted in it saying it was using the parallel directories for hdf5, but it still had parallel io turned off. The compilation finished, but unfortunately my 8x8x8 run failed in exactly the same way again.

The same error refers to

Code: Select all

Writing File ./conv_k//ndb.BS_Q1_CPU_0; Variable BSE_RESONANT; NetCDF: HDF error
which occurs when the main kernel loop starts.

Segmentation fault error only occurs when I try to use MPI parallelization as well. I guess it did seem to get a step further with MPI parallelization, so I'll try using that with the haydock solver, but I don't think this solves the parallelization issue you've discovered. I'll also try looking for more instances of 'serial' in the configuration files and see if changing them helps. Interestingly, when I configure yambo-5.0.4 and yambo-4.5.3 parallel-io appears to be enabled just fine (although the 'make all' call fails for other reasons that I don't want to figure out right not). Let me know if you have any more ideas.

Thanks again for all the help!
-Miles
You do not have the required permissions to view the files attached to this post.
Miles Johnson
California Institute of Technology
PhD candidate in Applied Physics

User avatar
Davide Sangalli
Posts: 610
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: Variable BSE_RESONANT; NetCDF: HDF error

Post by Davide Sangalli » Wed Mar 15, 2023 9:15 am

Ciao,
if you modified

Code: Select all

config/setup
then you compiled the parallel libraries.

However, you also need to edit the following file

Code: Select all

config/mk/global/defs.mk
by setting at the beginning (see the -D_PAR_IO)

Code: Select all

cpu         = x86_64
os          = linux
mpi         = -D_MPI
make        = make
netcdf      = -D_HDF5_LIB -D_HDF5_IO -D_PAR_IO
scalapack   = -D_SCALAPACK
slepc       = -D_SLEPC
fft         = -D_FFTW
xcpp        = -D_HDF5_LIB -D_HDF5_IO -D_PAR_IO -D_MPI -D_FFTW -D_SLEPC -D_SCALAPACK   -D_OPENMP -D_TIMING
p2ycpp      = -D_P2Y_QEXSD_HDF5
As suggested before, try to add this option when running the configure

Code: Select all

--with-extlibs-path="/home/milesj/nips3_yambo_converge/YAMBO/yambo-libs"
It will install all the libraries in the folder specified, and you will not need to recompile them every time.
To clean it in the future, just remove the folder.
You can also manually move the content of "/home/milesj/nips3_yambo_converge/YAMBO/yambo-5.1.1/lib/external/" into "/home/milesj/nips3_yambo_converge/YAMBO/yambo-libs" and then use the option

Best,
D.

P.S.: I do not see the config.log attached in the last message.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

milesj
Posts: 29
Joined: Thu Jan 26, 2023 9:27 pm

Re: Variable BSE_RESONANT; NetCDF: HDF error

Post by milesj » Thu Mar 16, 2023 8:10 pm

Eureka! Looks like both of these solutions worked. When I also edited the defs.mk files, it makes ndb.BS_PAR_Q1 and the computation finishes without error. Also, even without doing this if I use the haydock solver and hybrid parallelization everything finishes without error as well (though it takes a bit longer).

Thanks so much for all the help!

p.s. There is a config.log.txt file attached in my previous message if you still want to look at it to try and figure out why the IO was serial by default. I've reattached it here.
You do not have the required permissions to view the files attached to this post.
Miles Johnson
California Institute of Technology
PhD candidate in Applied Physics

Post Reply