Difference between revisions of "Parallelization for non-linear response calculations"

From The Yambo Project
Jump to navigation Jump to search
(Created page with "By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code: '''K-points parallelization''' if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelizat...")
 
 
(11 intermediate revisions by the same user not shown)
Line 2: Line 2:
other two parallelizations are available in the code:
other two parallelizations are available in the code:


'''K-points parallelization'''
'''K-points parallelization'''<br>
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:
 
.....
  NL_CPU= "4 4"                  # [PARALLEL] CPUs for each role
  NL_CPU= "4 4"                  # [PARALLEL] CPUs for each role
  NL_ROLEs= "w k"                # [PARALLEL] CPUs roles (w,k)
  NL_ROLEs= "w k"                # [PARALLEL] CPUs roles (w,k)
  DIP_CPU= "4 2 2"                      # [PARALLEL] CPUs for each role
  DIP_CPU= "4 2 2"                      # [PARALLEL] CPUs for each role
  DIP_ROLEs= "k c v"                    # [PARALLEL] CPUs roles (k,c,v)
  DIP_ROLEs= "k c v"                    # [PARALLEL] CPUs roles (k,c,v)
                           
.....                         
in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory.
in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory.
If this is not enough you can use the Open-MP parallelization, see below.
If this is not enough you can use the Open-MP parallelization, see below.<br>


'''Open-MP'''
'''Open-MP'''<br>
Another possibility is to compile the code with the -''-enable-open-mp''  flag and then use the OpenMP parallelization. \\
Another possibility is active openMP in the configure with the flags <span style="color:blue">--enable-open-mp --enable-openmp-int-linalg</span>,
recompile the code and then you can use the [https://www.openmp.org/ OpenMP] parallelization. <br>
For example set the number of threads to 2 with the command:
For example set the number of threads to 2 with the command:


  export OMP_NUM_THREADS="2"
  export OMP_NUM_THREADS=2                    
                         
and yambo_nl automatically will use the threads available. In the log file will find:
and yambo_nl automatically will use the threads available. In the log file will find:
   
   
Line 25: Line 25:
  <---> P1: MPI Cores-Threads  : NL(environment)-4 4(CPUs)-w k(ROLEs)
  <---> P1: MPI Cores-Threads  : NL(environment)-4 4(CPUs)-w k(ROLEs)
  .....
  .....
Notice the setting of threads number can depend from the queue system and configuration of your machine.<br>


Using all these parallelization you can use a large number of cores for example
Using all these parallelization you can use a large number of cores.
image you want to calculate the response for 40 different frequencies you could set
Image you want to calculate the response for 40 different frequencies you could set


"4 cores Open-MP" x "10 cores k-points" x "60 cores frequencies" = 2400 cores.
"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.


we do not advice to use more than 4 open-MP threads at least you need more memory in the calculations.
We advice not to use more than 4 open-MP threads,  unless you have memory problems  in your calculations, in order to have an efficient parallelization.<br>
Notice that the restart for interrupted calculations works only on frequencies.
Finally remember that the restart for interrupted calculations works only on frequencies.

Latest revision as of 12:41, 23 March 2022

By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code:

K-points parallelization
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:

.....
NL_CPU= "4 4"                   # [PARALLEL] CPUs for each role
NL_ROLEs= "w k"                 # [PARALLEL] CPUs roles (w,k)
DIP_CPU= "4 2 2"                      # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"                    # [PARALLEL] CPUs roles (k,c,v)
.....                           

in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory. If this is not enough you can use the Open-MP parallelization, see below.

Open-MP
Another possibility is active openMP in the configure with the flags --enable-open-mp --enable-openmp-int-linalg, recompile the code and then you can use the OpenMP parallelization.
For example set the number of threads to 2 with the command:

export OMP_NUM_THREADS=2                     

and yambo_nl automatically will use the threads available. In the log file will find:

.....
<---> P1: MPI Cores-Threads   : 16(CPU)-2(threads)
<---> P1: MPI Cores-Threads   : NL(environment)-4 4(CPUs)-w k(ROLEs)
.....

Notice the setting of threads number can depend from the queue system and configuration of your machine.

Using all these parallelization you can use a large number of cores. Image you want to calculate the response for 40 different frequencies you could set

"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.

We advice not to use more than 4 open-MP threads, unless you have memory problems in your calculations, in order to have an efficient parallelization.
Finally remember that the restart for interrupted calculations works only on frequencies.