Dear Yambo community,
I have been working on some calculations with a very large system, and have managed to run them on a single cluster node. When I try running the same calculation on 3 instances of the same node, I encounter an OOM error. Is there a way I could quickly resolve this?
Please find the log, input file, and the job launch script attached and let me know if you have additional questions. Note: I tried running on both MPI (which worked in the first place, on only one node) and OpenMP, to no difference.
Kind regards,
Stefan Velja
Parallelization error when running on multiple nodes
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani
-
- Posts: 5
- Joined: Fri Mar 24, 2023 1:11 pm
Parallelization error when running on multiple nodes
You do not have the required permissions to view the files attached to this post.
- Daniele Varsano
- Posts: 3980
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Parallelization error when running on multiple nodes
Dear Stefano,
please note, you are asking for 144 tasks, but then you are assigning 288 tasks in the input variables.
I do not know how many cores you have per node. Of course, you can use less of them in order to have more memory per task available.
Indeed, the effectively used parallel distribution in the log file is not the one indicated in input.
In order to optimize memory distribution among tasks, you can try to set:
if you plan to use 288 tasks,
or something like:
if you plan to use 188 tasks.
Finally, I'm not sure if you gain much going in hyperthreading.
Best,
Daniele
please note, you are asking for 144 tasks, but then you are assigning 288 tasks in the input variables.
I do not know how many cores you have per node. Of course, you can use less of them in order to have more memory per task available.
Indeed, the effectively used parallel distribution in the log file is not the one indicated in input.
In order to optimize memory distribution among tasks, you can try to set:
Code: Select all
X_and_IO_CPU= "1 1 1 32 9" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
or something like:
Code: Select all
X_and_IO_CPU= "1 1 1 47 4" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
Finally, I'm not sure if you gain much going in hyperthreading.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- Nicola Spallanzani
- Posts: 65
- Joined: Thu Nov 21, 2019 10:15 am
Re: Parallelization error when running on multiple nodes
Dear Stefan,
as additional information, in the jobscript there are these two lines:
they have to be set at the same value. To make it automatic you can do this:
Best,
Nicola
as additional information, in the jobscript there are these two lines:
Code: Select all
#SBATCH --cpus-per-task=2
export OMP_NUM_THREADS=6
Code: Select all
#SBATCH --cpus-per-task=2
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
Nicola
Nicola Spallanzani, PhD
S3 Centre, Istituto Nanoscienze CNR and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu
S3 Centre, Istituto Nanoscienze CNR and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu