parallel problem with a full frequency approach

Run-time issues concerning Yambo that are not covered in the above forums.

Moderators: myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: parallel problem with a full frequency approach

Post by Daniele Varsano » Thu Mar 19, 2020 2:51 pm

Dear Haseeb,
the kernel building seems to be rather balanced, the expected time to complete the task in each cpu is similar (around 3h30m).
The problem here is that the first 2 cpus starts the calculations after around 3h which is unexpected and actually I do not have an idea on what is going on. From the log and report it seems all is ok.
Can you try to repeat the calculation using a different number of cpus? this could help to spot the problem.
As suggested by Claudio it is a good practice to include the parallelization strategy explicitly as the default one could not be optimal.

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: parallel problem with a full frequency approach

Post by haseebphysics1 » Sun Mar 22, 2020 12:49 pm

Dear Yambo developers,

I was having the memory issues in solving BSE and I could not go with more than 40 BSEBands in my 125 GB single node. So, I decided to add another node to have more memory. But what I have found running the calculation on more than one node also not helpful since I need to run 2 mpi processes at least to spread the calculation on multiple nodes.

Q1: If a single mpi tasks is taking 100 GB ram then will 2 mpi tasks take 200 GB ram in Yambo? If this is so, then what will be the benefit of adding more nodes in terms of ram, if I can do the same calculation on a single core and one a single node?
Here is my SLURM script,

Code: Select all

#!/bin/bash
#
#SBATCH --job-name=H1
#SBATCH --partition=debug
#SBATCH --output=res_bse.txt
#SBATCH -t 999:00:00
#SBATCH --tasks-per-node=1
#SBATCH --nodes=2
#SBATCH --nodelist=compute-0-2,compute-0-3
#SBATCH --mem-per-cpu=122000M
#SBATCH --cpus-per-task=1  ### Number of threads per task (OMP threads)


export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK

nodes=2
tasks_per_node=1
nthreads=1
ncpu=`echo $nodes $tasks_per_node | awk '{print $1*$2}'` 

bindir=/export/installs/Yambo_intel/Yambo4.5.1_OpenMP/yambo-4.5.1/bin
echo "Running on $ncpu MPI, $nthreads OpenMP threads"
srun hostname
source /export/installs/intelcc/parallel_studio_xe_2019.1.053/bin/psxevars.sh
which mpirun
export I_MPI_HYDRA_TOPOLIB=ipl; gdb --args mpiexec.hydra --verbose -n 1 /bin/ls
mpirun -np $ncpu $bindir/yambo $bindir/yambo -F bse.in -C bse

Thanks,
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: parallel problem with a full frequency approach

Post by Daniele Varsano » Mon Mar 23, 2020 10:21 am

Dear Haseeb,
from your last post, it is not clear if you have problem in building the BSE matrix, or in solving (diagonalize) it.
If the question is about diagonalization, the parallelism enters only if you linked yambo with the ScaLAPACK libraries, anyway
I'm not sure if at this stage there will be a distribution of the memory among CPUs.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply