mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Having trouble compiling the Yambo source? Using an unusual architecture? Problems with the "configure" script? Problems in GPU architectures? This is the place to look.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Forum rules
If you have trouble compiling Yambo, please make sure to list:
(1) the compiler (vendor and release: e.g. intel 10.1)
(2) the architecture (e.g. 64-bit IBM SP5)
(3) if the problems occur compiling in serial/in parallel
(4) the version of Yambo (revision number/major release version)
(5) the relevant compiler error message
Post Reply
ruoshi
Posts: 5
Joined: Fri Feb 04, 2022 9:08 pm

mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Post by ruoshi » Tue Jan 09, 2024 6:57 pm

I am trying to build parallel Yambo 5.2.0 with NVHPC 23.7 and OpenMPI 4.1.4 on our Rocky 8.7 cluster. It always fails with

Code: Select all

NVFORTRAN-S-0155-Could not resolve generic procedure mpi_bcast (/tmp/tmp.IBSNxAyQOW/yambo-5.2.0/src/modules/mod_parallel.F: 723)
  0 inform,   0 warnings,   1 severes, 0 fatal for create_hosts_comm
in compile_yambo.log, whether I load the preinstalled modules HDF5, FFTW, NetCDF, LibXC built under this toolchain or not (in the latter case they are marked as "internal lib to be compiled").

Attached are the config.log, report, and compile_yambo.log files.

Additional information that may help:
  • I also tried 5.1.2 but got the same error.
  • I was able to build QE 7.2 under this toolchain with MPI and it could run on multiple GPUs, so our OpenMPI module should be working properly.
  • I had no issue with Yambo 5.0-5.1 with NVHPC 21.9 and OpenMPI 3.1.6 on CentOS 7.9, thanks to the kind assistance of Andrea in an earlier post.
You do not have the required permissions to view the files attached to this post.
Ruoshi Sun
Lead Scientist of Scientific Computing
Research Computing, University of Virginia

User avatar
Nicola Spallanzani
Posts: 63
Joined: Thu Nov 21, 2019 10:15 am

Re: mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Post by Nicola Spallanzani » Fri Jan 12, 2024 11:50 am

Dear Ruoshi,
about the failure of using the loaded libraries, I see from the file config.log that the configure line launched is this one:

Code: Select all

./configure CFLAGS=-O2 'FCFLAGS=-O2 -gopt -Mnoframe -Mdalign -Mbackslash -I/apps/software/standard/compiler/nvhpc/23.7/openmpi/4.1.4/lib' 'FFLAGS=-O2 -gopt -Mnoframe -Mdalign -Mbackslash -I/apps/software/standard/compiler/nvhpc/23.7/openmpi/4.1.4/lib' CC=nvc FC=nvfortran F77=nvfortran MPIFC=mpif90 MPIF77=mpif77 MPICC=mpicc MPICXX=mpic++ 'CPP=cpp -E -P' 'FPP=nvfortran -Mpreprocess -E' --enable-cuda=cuda12.2,cc70,cc80 --enable-open-mp --enable-mpi --enable-dp --enable-hdf5-par-io --enable-time-profile --enable-memory-profile --enable-msgs-comps '--with-blas-libs=-L/apps/software/standard/core/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -lblas' '--with-lapack-libs=-L/apps/software/standard/core/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -llapack' --with-netcdf-path= --with-netcdff-path= --with-fft-path= --with-hdf5-path= --with-libxc-path=
Here are missing the paths to the libraries. I suppose that you passed them to the configure using environment variables. So this means that the variables are not properly set (maybe for problems in the modules, or wrong variables passed). Check the variables that are set by the modules with the command

Code: Select all

module show library_name
However, this does not explain the compilation failure. So here some questions that could help me to understand what's happening:
1) are you using the internal mpi from the nvhpc sdk?
2) why are you compiling for two GPU architectures (cc70 and cc80)? "--enable-cuda=cuda12.2,cc70,cc80"

Please try from scratch (that means deleting the source directory and unpacking a new one) using this configure line:

Code: Select all

export FC=nvfortran 
export F77=nvfortran
export CC=nvc
export CPP="cpp -E"
export FPP="nvfortran -Mpreprocess -E"
export F90SUFFIX=".f90"

./configure \
  --enable-cuda=cuda11.8,cc80 \
  --enable-mpi --enable-open-mp --enable-dp --enable-hdf5-par-io \
  --enable-msgs-comps --enable-time-profile --enable-memory-profile \
  --with-blas-libs="-lblas" \
  --with-lapack-libs="-llapack" \
  --with-hdf5-path=$HDF5_HOME \
  --with-netcdf-path=$NETCDF_C_HOME \
  --with-netcdff-path=$NETCDF_FORTRAN_HOME \
  --with-libxc-path=$LIBXC_HOME \
  --with-fft-path=$FFTW_HOME 
Remember to load the modules before and use the proper environment variables.

Best,
Nicola
Nicola Spallanzani, PhD
S3 Centre, Istituto Nanoscienze CNR and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu

ruoshi
Posts: 5
Joined: Fri Feb 04, 2022 9:08 pm

Re: mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Post by ruoshi » Thu Feb 15, 2024 7:10 pm

Thank you for looking into this. As it turns out, there may be some problem with our OpenMPI module after all. I prepared a new toolchain with NVHPC 24.1 + OpenMPI 4.1.6 and the error went away.

P.S. I defined two architectures because we have different types of GPUs on our cluster.
Ruoshi Sun
Lead Scientist of Scientific Computing
Research Computing, University of Virginia

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Post by Davide Sangalli » Sat Apr 06, 2024 11:08 pm

Just for reference.

The error mentioned above with OpenMPI 4.1.4 is a known issue of compiling the OpenMPI library with nvfortran.
Fix here: https://forums.developer.nvidia.com/t/h ... 4-1/283219

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

Post Reply