Page 1 of 1

MPI calculation stops and does not continue

Posted: Tue May 30, 2023 8:38 am
by peiogargor
Dear all,

I am experiencing a technical issue while doing a mpi calculation in my cluster.

On the one hand, I am sending the run_configure.sh I have used for configuring the compilation of the code. Then, I did "make core" for installing the executables. Likewise, I am sending the config.log, ./config/report and ./config/setup. If anything more is needed for checking the parallel installation, please ask me. As you can see, the installation ends and it seems to be a parallelized one.

With this installation, on the other hand, I have been playing with the GW parallel tutorial: https://www.yambo-code.eu/wiki/index.ph ... strategies. For this, I have done 3 different calculations: MPI 1 + OMP 1 (which ends without any problem in 28 mins 23 s, I am sending the respective log) MPI 1 + OMP 16 (which also ends without any problem, in 5 mins 40 s, I am sending the respective log) and the problematic one, MPI 16 + OMP 1. For the latter, the calculation does not fail (I mean there is not any error message and it continues to be in queue) but it stops and does not continue. Moreover, if you one compares the r-*_1 and any other r-*_X in the LOG folder generated, it can be seen that all cores are seemingly doing the same calculation. Attached I send you the log files for the 1 and 2 cpu. I am also sending the run.sh I have been using for sending the jobs.

I do not very well understand what is going on. If the problem is related with the installation, with the script for sending the jobs (which I took almost from the wiki) or with some aspects I might have missed within the architecture of my cluster.

If anything more is needed to trace back the issue, please ask me.

Thank you very much in advance,

Peio.

Re: MPI calculation stops and does not continue

Posted: Tue May 30, 2023 8:53 am
by Daniele Varsano
Dear Peio,

it seems that Yambo is running in parallel, anyway the adopted parallelization strategy reported in the log files it is not the one you specified in the script, so something happened. Moreover, such parallelization seems to be not optimal.
Can you please post your MPI input and report file?

Best,
Daniele

Re: MPI calculation stops and does not continue

Posted: Tue May 30, 2023 9:00 am
by peiogargor
Hello Daniele,

I send you attached the input file of the MPI calculation as well as its report file.

Thank you,

Peio.

Re: MPI calculation stops and does not continue

Posted: Tue May 30, 2023 5:52 pm
by Daniele Varsano
Dear Peio,

I'm not totally sure where the problem arises.
But it is possible that some of the variables of that tutorials are obsolete.

Can you try to replace the keywords in your input:

Code: Select all

X_and_IO_CPU= ""                 # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= ""               # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=-1      # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
with the same values you set before.

I would also set the parallel linear algebra to 1, as you have a very small matrix to invert.

Best,
Daniele

PS: variables controlling the parallelism are added in the input when generate with the -par verbosity: e.g.

Code: Select all

yambo -gw0 p -g n -V par
In this way you are sure you are using the correct names for variables.

Re: MPI calculation stops and does not continue

Posted: Tue May 30, 2023 7:43 pm
by peiogargor
Dear Daniele,

You hit the mark! It seems that the problem was related with the name of the input variables, once changed it worked perfectly, i.e.:

X_CPU -> X_and_IO_CPU
X_ROLES -> X_and_IO_ROLES
X_nCPU_LinAlg_INV -> X_and_IO_nCPU_LinAlg_INV

The value of the parallel linear algebra does not seem to play any big role, as the calculation performed with equal time for -1, 1 and 16. It lasted 1 m 54 s, so big difference even compared to the openmp calculation.

Thank you very much!

PD: I would update the script in the tutorial for avoiding any other to get confused with the input variable names.

Re: MPI calculation stops and does not continue

Posted: Tue May 30, 2023 8:32 pm
by Daniele Varsano
Hola Peio,

great it is solved.
yes, we need to update the tutorial, I will do as asap! Thanks for spotting it!

Daniele