Multi-node execution failing on Lengau Cluster (CHPC) with Yambo 5.2.0 (MPI+OpenMP)

M. Hordelain · Post by **M. Hordelain** » Fri May 08, 2026 8:52 am

Dear Yambo community,
I am seeking help regarding a parallelization issue on the Lengau Cluster (CHPC South Africa). I am using Yambo 5.2.0, and although the build is correctly identified as MPI+OpenMP, my jobs remain stuck on a single node even when multiple nodes are requested via PBS.

Technical Context:
The log confirms the build is correct:
Version 5.2.0 Revision 22184 Hash 2871b0cee
MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO Build
The Issue:
When I submit the following script, the calculation only uses one node, leading to crashes due to memory limitations when NGsBlkXd is increased to 3 Ry.

Submission Script:

Code: Select all

#PBS -l select=4:ncpus=24:mpiprocs=2:nodetype=haswell_reg
export OMP_NUM_THREADS=12
export OMP_STACKSIZE=1G
export I_MPI_PIN_DOMAIN=omp

# Path to binaries
module load gcc/9.2.0
module load chpc/yambo/5.2.0/gcc-9.2.0-cpu

MPI_EXEC=$(which mpiexec.hydra || which mpirun)
$MPI_EXEC -np 8 yambo -F RPA_3Ry.in -J RPA_3Ry

Questions for the community:

On Lengau, should I be using a specific MPI wrapper (like mpirun vs mpiexec.hydra) to ensure PBS node propagation?
Has anyone successfully used the chpc/yambo/5.2.0 module across multiple nodes?
Could the issue come from how the select statement interacts with the Intel/GCC environment on this specific cluster?

Any advice on the correct mpiprocs / OMP_NUM_THREADS balance for these Haswell nodes (24 cores per node) would be very helpful.

Please find attached the following files to help diagnose the issue:

RPA_3Ry.in: My input file, including the parallelization strategy (CPU/ROLEs).

Code: Select all

#                                                                     
# :   :::   :::     ::::    ::::  :::::::::   ::::::::                
# :+:   :+: :+: :+:   +:+:+: :+:+:+ :+:    :+: :+:    :+              
#  +:+ +:+ +:+   +:+  +:+ +:+:+ +:+ +:+    +:+ +:+    +:+             
#   +#++: +#++:++#++: +#+  +:+  +#+ +#++:++#+  +#+    +:+             
#    +#+  +#+     +#+ +#+       +#+ +#+    +#+ +#+    +#+             
#    #+#  #+#     #+# #+#       #+# #+#    #+# #+#    #+#             
#    ###  ###     ### ###       ### #########   ########              
#                                                                     
#                                                                     
# Version 5.2.0 Revision 22184 Hash (prev commit) 2871b0cee           
#                      Branch is 5.2                                  
#         MPI+OpenMP+SLK+SLEPC+HDF5_MPI_IO Build                      
#                http://www.yambo-code.org                            
#
optics                           # [R] Linear Response optical properties
infver                           # [R] Input file variables verbosity
kernel                           # [R] Kernel
chi                              # [R][CHI] Dyson equation for Chi.
dipoles                          # [R] Oscillator strenghts (or dipoles)
FFTGvecs= 6                Ry    # [FFT] Plane-waves
DIP_Threads=0                    # [OPENMP/X] Number of threads for dipoles
X_Threads=0                      # [OPENMP/X] Number of threads for response functions
Chimod= "HARTREE"                # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
NGsBlkXd= 3                Ry    # [Xd] Response block size
% QpntsRXd
  1 | 1 |                           # [Xd] Transferred momenta
%
% BndsRnXd
    250 | 820 |                       # [Xd] Polarization function bands
%
% EnRngeXd
  0.00000 | 10.00000 |         eV    # [Xd] Energy range
%
% DmRngeXd
 0.100000 | 0.100000 |         eV    # [Xd] Damping range
%
ETStpsXd= 1200                    # [Xd] Total Energy steps
% LongDrXd
 1.000000 | 0.000000 | 0.000000 |        # [Xd] [cc] Electric Field
%
CUTGeo= "slab z"                   # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY..
% CUTBox
 0.000000 | 0.000000 | 10.000000 |        # [CUT] [au] Box sides
%
X_all_q_nCPU_LinAlg_INV= 8
X_and_IO_CPU= "1 1 8"
X_all_q_ROLEs= "q k c v"

I am really struggling to get this to scale beyond a single node and any help from the community or someone familiar with the South African CHPC environment would be a lifesaver.
Best regards,

Thank you for your time and assistance.

Julien H. OKOUEMBE
Master's Degree, Faculté des Sciences et Techniques
Université Marien NGOUABI, Congo

Post by **Daniele Varsano** » Sun May 10, 2026 2:47 pm

Dear Julien,

there could be many reasons for the failure. Can you post the report file of the run that is failing? at the beginning are reported some info that could be useful.

Next, try to assign machinefile as:

Code: Select all

$MPI_EXEC -np 8 -machinefile $PBS_NODEFILE yambo -F RPA_3Ry.in -J RPA_3Ry

and see if the problem is solved.

Next please, even if not related with your problem, in your input file:

1 FFTGvecs= 6 # careful this is a very low value and could provide non accurate results
2. ETStpsXd= 1200 # [Xd] Total Energy steps # This is an high value, are you sure you need so many energy steps. In transition space this is not a problem, but in this calculation in G space it makes the calculation very cumbersome.
3. CUTGeo= "slab z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws/slab X/Y/Z/XY.. # If you use the slab you do not need to set the box. Moreover, id you want to use the box gemoetry, the boxZ value is unrealistic

note that (ROLES and CPU) shoudl be consistent, something like e.g.:

X_and_IO_CPU= "1 1 1 8 1"
X_and_IO_ROLEs= "q g k c v"

you are mixing two different variables.

Best,

Daniele

Best,
Daniele

M. Hordelain · Post by **M. Hordelain** » Sun May 10, 2026 5:16 pm

Dear Daniele,

Thank you very much for your detailed reply and for taking the time to look into my issue.

After further investigation on the Lengau cluster, I believe the problem is indeed more related to the MPI/PBS environment and node propagation than to Yambo itself.

I discovered that:

Yambo 5.2.0 was compiled against OpenMPI 4.1.3

however, the cluster environment was initially exposing a different MPI runtime

additionally,
mpiexec.hydra
(Intel MPI launcher) was accidentally being picked up instead of the OpenMPI launcher

Using:

Code: Select all

ldd $(which yambo) | grep libmpi

I confirmed that the executable is linked against OpenMPI 4.1.3.

Therefore I am now switching to the matching OpenMPI runtime and explicitly forcing:

Code: Select all

mpirun -np 8 -machinefile $PBS_NODEFILE ...

instead of relying on automatic PBS node propagation.

I suspect the issue comes from the way OpenMPI and PBS/Torque are integrated on Lengau, since the jobs appear to allocate multiple nodes correctly, but MPI ranks do not spread across them unless the machinefile is explicitly specified.

I will now test:

Code: Select all

mpirun --mca pml ob1 \
       --mca btl tcp,self \
       -np 8 \
       -machinefile $PBS_NODEFILE \
       yambo -F RPA_3Ry.in -J RPA_3Ry

to verify whether the problem is entirely due to MPI node propagation.

Thank you also for pointing out the inconsistency between the ROLEs and CPU variables. You are absolutely right I mistakenly mixed two different parallelization syntaxes. I will correct this and use consistent ROLE/CPU definitions.

I also appreciate the comments regarding:

ETStpsXd=1200

FFTGvecs=6 Ry

and the Coulomb cutoff setup.

I initially increased ETStpsXd to obtain a smoother dielectric spectrum, but I now understand that this becomes extremely expensive in G-space calculations.

For completeness, I have attached a

.zip

archive containing:

the report/log files,

the input file,

and the relevant r- files

so that you can better inspect the MPI behavior and runtime configuration.

Thank you again for your help and guidance.

Best regards,

Julien

Post by **Daniele Varsano** » Mon May 11, 2026 6:42 am

Dear Julien,

from your report file you can see that the problem is not only the multi-node execution, but MPI itself is not working:

Cores-Threads : 1(CPU)-12(threads)
Cores-Threads : X_and_IO(environment)-1 1 8(CPUs)-(ROLEs)
MPI Cores : 1
Threads per core : 12
Threads total : 12
Nodes Computing : 1
Nodes IO : 1

your reports show that you are running with one single task, so as you pointed out there is a mismatch with the MPI wrapper.

Best,

Daniele

Yambo Community Forum

Multi-node execution failing on Lengau Cluster (CHPC) with Yambo 5.2.0 (MPI+OpenMP)

Multi-node execution failing on Lengau Cluster (CHPC) with Yambo 5.2.0 (MPI+OpenMP)

Re: Multi-node execution failing on Lengau Cluster (CHPC) with Yambo 5.2.0 (MPI+OpenMP)

Re: Multi-node execution failing on Lengau Cluster (CHPC) with Yambo 5.2.0 (MPI+OpenMP)

Re: Multi-node execution failing on Lengau Cluster (CHPC) with Yambo 5.2.0 (MPI+OpenMP)