Difference between revisions of "Parallelization for non-linear response calculations"

Latest revision as of 12:41, 23 March 2022

By default yambo_nl is parallelized on frequencies, that is the most efficient way to distribute calculations among the different processors, other two parallelizations are available in the code:

K-points parallelization
if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:

.....
NL_CPU= "4 4"                   # [PARALLEL] CPUs for each role
NL_ROLEs= "w k"                 # [PARALLEL] CPUs roles (w,k)
DIP_CPU= "4 2 2"                      # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v"                    # [PARALLEL] CPUs roles (k,c,v)
.....

in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory. If this is not enough you can use the Open-MP parallelization, see below.

Open-MP
Another possibility is active openMP in the configure with the flags --enable-open-mp --enable-openmp-int-linalg, recompile the code and then you can use the OpenMP parallelization.
For example set the number of threads to 2 with the command:

export OMP_NUM_THREADS=2

and yambo_nl automatically will use the threads available. In the log file will find:

.....
<---> P1: MPI Cores-Threads   : 16(CPU)-2(threads)
<---> P1: MPI Cores-Threads   : NL(environment)-4 4(CPUs)-w k(ROLEs)
.....

Notice the setting of threads number can depend from the queue system and configuration of your machine.

Using all these parallelization you can use a large number of cores. Image you want to calculate the response for 40 different frequencies you could set

"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.

We advice not to use more than 4 open-MP threads, unless you have memory problems in your calculations, in order to have an efficient parallelization.
Finally remember that the restart for interrupted calculations works only on frequencies.

@@ Line 2: / Line 2: @@
 other two parallelizations are available in the code:
-'''K-points parallelization'''
+'''K-points parallelization'''<br>
 if your system is large and requires more memory or you have few frequencies you can change the parallelization strategy. By using the flag "-V par" you will get the parallelization options in your input, you can decide to turn on the parallelization on k-points in such a way that the product of cores in k-space and in frequency-space is equal to the total number of cores. For example if have 16 cores you can set:
+ .....
   NL_CPU= "4 4"                   # [PARALLEL] CPUs for each role
   NL_ROLEs= "w k"                 # [PARALLEL] CPUs roles (w,k)
   DIP_CPU= "4 2 2"                      # [PARALLEL] CPUs for each role
   DIP_ROLEs= "k c v"                    # [PARALLEL] CPUs roles (k,c,v)
+ .....
 in this way the code will distribute the wave-function on 4 cores and reduce the amount of memory.
-If this is not enough you can use the Open-MP parallelization, see below.
+If this is not enough you can use the Open-MP parallelization, see below.<br>
-'''Open-MP'''
+'''Open-MP'''<br>
-Another possibility is to compile the code with the -''-enable-open-mp''  flag and then use the OpenMP parallelization. \\
+Another possibility is active openMP in the configure with the flags <span style="color:blue">--enable-open-mp --enable-openmp-int-linalg</span>,
+recompile the code and then you can use the [https://www.openmp.org/ OpenMP] parallelization. <br>
 For example set the number of threads to 2 with the command:
-  export OMP_NUM_THREADS="2"
+  export OMP_NUM_THREADS=2
 and yambo_nl automatically will use the threads available. In the log file will find:
@@ Line 25: / Line 25: @@
   <---> P1: MPI Cores-Threads   : NL(environment)-4 4(CPUs)-w k(ROLEs)
   .....
+Notice the setting of threads number can depend from the queue system and configuration of your machine.<br>
-Using all these parallelization you can use a large number of cores for example
+Using all these parallelization you can use a large number of cores.
-image you want to calculate the response for 40 different frequencies you could set
+Image you want to calculate the response for 40 different frequencies you could set
-"4 cores Open-MP" x "10 cores k-points" x "60 cores frequencies" = 2400 cores.
+"4 cores Open-MP" x "10 cores k-points" x "40 cores frequencies" = 1600 cores.
-we do not advice to use more than 4 open-MP threads at least you need more memory in the calculations.
+We advice not to use more than 4 open-MP threads,  unless you have memory problems  in your calculations, in order to have an efficient parallelization.<br>
-Notice that the restart for interrupted calculations works only on frequencies.
+Finally remember that the restart for interrupted calculations works only on frequencies.

Difference between revisions of "Parallelization for non-linear response calculations"

Latest revision as of 12:41, 23 March 2022

Navigation menu

Search