Page 1 of 1

mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Posted: Tue Jan 09, 2024 6:57 pm
by ruoshi
I am trying to build parallel Yambo 5.2.0 with NVHPC 23.7 and OpenMPI 4.1.4 on our Rocky 8.7 cluster. It always fails with

Code: Select all

NVFORTRAN-S-0155-Could not resolve generic procedure mpi_bcast (/tmp/tmp.IBSNxAyQOW/yambo-5.2.0/src/modules/mod_parallel.F: 723)
  0 inform,   0 warnings,   1 severes, 0 fatal for create_hosts_comm
in compile_yambo.log, whether I load the preinstalled modules HDF5, FFTW, NetCDF, LibXC built under this toolchain or not (in the latter case they are marked as "internal lib to be compiled").

Attached are the config.log, report, and compile_yambo.log files.

Additional information that may help:
  • I also tried 5.1.2 but got the same error.
  • I was able to build QE 7.2 under this toolchain with MPI and it could run on multiple GPUs, so our OpenMPI module should be working properly.
  • I had no issue with Yambo 5.0-5.1 with NVHPC 21.9 and OpenMPI 3.1.6 on CentOS 7.9, thanks to the kind assistance of Andrea in an earlier post.

Re: mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Posted: Fri Jan 12, 2024 11:50 am
by Nicola Spallanzani
Dear Ruoshi,
about the failure of using the loaded libraries, I see from the file config.log that the configure line launched is this one:

Code: Select all

./configure CFLAGS=-O2 'FCFLAGS=-O2 -gopt -Mnoframe -Mdalign -Mbackslash -I/apps/software/standard/compiler/nvhpc/23.7/openmpi/4.1.4/lib' 'FFLAGS=-O2 -gopt -Mnoframe -Mdalign -Mbackslash -I/apps/software/standard/compiler/nvhpc/23.7/openmpi/4.1.4/lib' CC=nvc FC=nvfortran F77=nvfortran MPIFC=mpif90 MPIF77=mpif77 MPICC=mpicc MPICXX=mpic++ 'CPP=cpp -E -P' 'FPP=nvfortran -Mpreprocess -E' --enable-cuda=cuda12.2,cc70,cc80 --enable-open-mp --enable-mpi --enable-dp --enable-hdf5-par-io --enable-time-profile --enable-memory-profile --enable-msgs-comps '--with-blas-libs=-L/apps/software/standard/core/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -lblas' '--with-lapack-libs=-L/apps/software/standard/core/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -llapack' --with-netcdf-path= --with-netcdff-path= --with-fft-path= --with-hdf5-path= --with-libxc-path=
Here are missing the paths to the libraries. I suppose that you passed them to the configure using environment variables. So this means that the variables are not properly set (maybe for problems in the modules, or wrong variables passed). Check the variables that are set by the modules with the command

Code: Select all

module show library_name
However, this does not explain the compilation failure. So here some questions that could help me to understand what's happening:
1) are you using the internal mpi from the nvhpc sdk?
2) why are you compiling for two GPU architectures (cc70 and cc80)? "--enable-cuda=cuda12.2,cc70,cc80"

Please try from scratch (that means deleting the source directory and unpacking a new one) using this configure line:

Code: Select all

export FC=nvfortran 
export F77=nvfortran
export CC=nvc
export CPP="cpp -E"
export FPP="nvfortran -Mpreprocess -E"
export F90SUFFIX=".f90"

./configure \
  --enable-cuda=cuda11.8,cc80 \
  --enable-mpi --enable-open-mp --enable-dp --enable-hdf5-par-io \
  --enable-msgs-comps --enable-time-profile --enable-memory-profile \
  --with-blas-libs="-lblas" \
  --with-lapack-libs="-llapack" \
  --with-hdf5-path=$HDF5_HOME \
  --with-netcdf-path=$NETCDF_C_HOME \
  --with-netcdff-path=$NETCDF_FORTRAN_HOME \
  --with-libxc-path=$LIBXC_HOME \
  --with-fft-path=$FFTW_HOME 
Remember to load the modules before and use the proper environment variables.

Best,
Nicola

Re: mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Posted: Thu Feb 15, 2024 7:10 pm
by ruoshi
Thank you for looking into this. As it turns out, there may be some problem with our OpenMPI module after all. I prepared a new toolchain with NVHPC 24.1 + OpenMPI 4.1.6 and the error went away.

P.S. I defined two architectures because we have different types of GPUs on our cluster.

Re: mpi_bcast error in NVHPC 23.7 OpenMPI 4.1.4

Posted: Sat Apr 06, 2024 11:08 pm
by Davide Sangalli
Just for reference.

The error mentioned above with OpenMPI 4.1.4 is a known issue of compiling the OpenMPI library with nvfortran.
Fix here: https://forums.developer.nvidia.com/t/h ... 4-1/283219

Best,
D.