Difference between revisions of "Parallelization for non-linear response calculations"

From The Yambo Project
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 14: Line 14:


'''Open-MP'''<br>
'''Open-MP'''<br>
Another possibility is to compile the code with the -''-enable-open-mp''  flag and then use the OpenMP parallelization. \\
Another possibility is active openMP in the configure with the flags <span style="color:blue">--enable-open-mp --enable-openmp-int-linalg</span>,
recompile the code and then you can use the [https://www.openmp.org/ OpenMP] parallelization. <br>
For example set the number of threads to 2 with the command:
For example set the number of threads to 2 with the command:


Line 24: Line 25:
  <---> P1: MPI Cores-Threads  : NL(environment)-4 4(CPUs)-w k(ROLEs)
  <---> P1: MPI Cores-Threads  : NL(environment)-4 4(CPUs)-w k(ROLEs)
  .....
  .....
Notice the setting of threads number can depend from the queue system and configuration of your machine.<br>


Using all these parallelization you can use a large number of cores for example
Using all these parallelization you can use a large number of cores.
image you want to calculate the response for 40 different frequencies you could set
Image you want to calculate the response for 40 different frequencies you could set


"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.
"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.


we advice not to use more than 4 open-MP threads,  unless you have memory problems  in your calculations.<br>
We advice not to use more than 4 open-MP threads,  unless you have memory problems  in your calculations, in order to have an efficient parallelization.<br>
Notice that the restart for interrupted calculations works only on frequencies.
Finally remember that the restart for interrupted calculations works only on frequencies.

Latest revision as of 12:41, 23 March 2022

By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code:

K-points parallelization
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:

.....
NL_CPU= "4 4"                   # [PARALLEL] CPUs for each role
NL_ROLEs= "w k"                 # [PARALLEL] CPUs roles (w,k)
DIP_CPU= "4 2 2"                      # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"                    # [PARALLEL] CPUs roles (k,c,v)
.....                           

in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory. If this is not enough you can use the Open-MP parallelization, see below.

Open-MP
Another possibility is active openMP in the configure with the flags --enable-open-mp --enable-openmp-int-linalg, recompile the code and then you can use the OpenMP parallelization.
For example set the number of threads to 2 with the command:

export OMP_NUM_THREADS=2                     

and yambo_nl automatically will use the threads available. In the log file will find:

.....
<---> P1: MPI Cores-Threads   : 16(CPU)-2(threads)
<---> P1: MPI Cores-Threads   : NL(environment)-4 4(CPUs)-w k(ROLEs)
.....

Notice the setting of threads number can depend from the queue system and configuration of your machine.

Using all these parallelization you can use a large number of cores. Image you want to calculate the response for 40 different frequencies you could set

"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.

We advice not to use more than 4 open-MP threads, unless you have memory problems in your calculations, in order to have an efficient parallelization.
Finally remember that the restart for interrupted calculations works only on frequencies.