Dear YAMBO,
Can you provide me some tips for BSE kernel parallelization. I followed some variables of BSE parallel
BS_CPU= "47 1 28" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 8 # [PARALLEL] CPUs for matrix inversion
BS_nCPU_LinAlg_DIAGO= 8 # [PARALLEL] CPUs for matrix diagonalization
but it doesn't work. Looking at output file, these configurations were not taken into account.
Another question is which parallelism gives memory distribution. It seems that this calculation requires a lot of memory.
I attached here input and output files. Thank you.
Parallelism of BSE kernel
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, Daniele Varsano
-
- Posts: 20
- Joined: Mon Nov 18, 2019 6:53 am
- Location: Austin, TX, USA
- Contact:
Parallelism of BSE kernel
You do not have the required permissions to view the files attached to this post.
Viet-Anh Ha,
Oden Institute for Computational Engineering and Sciences,
https://www.oden.utexas.edu/
The University of Texas at Austin,
https://www.utexas.edu/
201 E 24th St, Austin, TX 78712, USA.
Oden Institute for Computational Engineering and Sciences,
https://www.oden.utexas.edu/
The University of Texas at Austin,
https://www.utexas.edu/
201 E 24th St, Austin, TX 78712, USA.
- Daniele Varsano
- Posts: 4010
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Parallelism of BSE kernel
Dear Viet-Anh Ha,,
please note that parallel linear algebra will be not in place if you do not compile the code using SCALAPACK libraries:
Anyway this are need for the diagoalization of the BSE matrix, while you are experiencing problems in the building of the kernel.
Why do you say that BS_CPU and BS_ROLEs are not taken into account? The parallel distribution of the kernel is reported in the log files.
You have a lot of k points (1000 in the BZ) so you can easily end up with very large matrices I really suggest you to reduce the number of bands in BSE.
The first band is around 41 eV below the Fermi level, this will not participate in the first excitations ie in the low energy part of the spectrum.
Also, the last bands 16/17/18 are 40eV above the Fermi energy. I would reduce drastically the range of bands in the BSE, this will reduce a lot the computational burden.
Best,
Daniele
please note that parallel linear algebra will be not in place if you do not compile the code using SCALAPACK libraries:
Code: Select all
BS_nCPU_LinAlg_INV= 8 # [PARALLEL] CPUs for matrix inversion
BS_nCPU_LinAlg_DIAGO= 8 # [PARALLEL] CPUs for matrix diagonalization
Why do you say that BS_CPU and BS_ROLEs are not taken into account? The parallel distribution of the kernel is reported in the log files.
You have a lot of k points (1000 in the BZ) so you can easily end up with very large matrices I really suggest you to reduce the number of bands in BSE.
Code: Select all
% BSEBands
1 | 18 | # [BSK] Bands range
%
Also, the last bands 16/17/18 are 40eV above the Fermi energy. I would reduce drastically the range of bands in the BSE, this will reduce a lot the computational burden.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 20
- Joined: Mon Nov 18, 2019 6:53 am
- Location: Austin, TX, USA
- Contact:
Re: Parallelism of BSE kernel
Thank you Daniele.
I did not pay attention to log files and didn't recognize that parallel was performed. I just looked at the beginning of output file with the following text
* CPU-Threads :1316(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)-1(threads@NL)
* MPI CPU : 1316
* THREADS (max): 1
* THREADS TOT(max): 1316
* I/O NODES : 1
* Fragmented WFs :yes
so I thought the parallelization was not successfully.
Another question is why NLogCPUs does not work in this run. I set NLogCPUs = 2 but there are still a number of log files written (=number of MPIs).
I did not pay attention to log files and didn't recognize that parallel was performed. I just looked at the beginning of output file with the following text
* CPU-Threads :1316(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)-1(threads@NL)
* MPI CPU : 1316
* THREADS (max): 1
* THREADS TOT(max): 1316
* I/O NODES : 1
* Fragmented WFs :yes
so I thought the parallelization was not successfully.
Another question is why NLogCPUs does not work in this run. I set NLogCPUs = 2 but there are still a number of log files written (=number of MPIs).
Viet-Anh Ha,
Oden Institute for Computational Engineering and Sciences,
https://www.oden.utexas.edu/
The University of Texas at Austin,
https://www.utexas.edu/
201 E 24th St, Austin, TX 78712, USA.
Oden Institute for Computational Engineering and Sciences,
https://www.oden.utexas.edu/
The University of Texas at Austin,
https://www.utexas.edu/
201 E 24th St, Austin, TX 78712, USA.
- Daniele Varsano
- Posts: 4010
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Parallelism of BSE kernel
Dear Viet-Anh Ha,
Finally, I can see you are using a rather old version of Yambo ( GPL Version 4.4.0), I suggest you to update to a newer version.
Best,
Daniele
That's strange, are you sure that there are no logs from previous runs? These are not deleted, and an incremental number is used *_01 etc is used for new ones.Another question is why NLogCPUs does not work in this run. I set NLogCPUs = 2 but there are still a number of log files written (=number of MPIs).
Finally, I can see you are using a rather old version of Yambo ( GPL Version 4.4.0), I suggest you to update to a newer version.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 20
- Joined: Mon Nov 18, 2019 6:53 am
- Location: Austin, TX, USA
- Contact:
Re: Parallelism of BSE kernel
I tried to delete log files in LOG directory but still see the same problem. Yes, I'll update the newest version.
Viet-Anh Ha,
Oden Institute for Computational Engineering and Sciences,
https://www.oden.utexas.edu/
The University of Texas at Austin,
https://www.utexas.edu/
201 E 24th St, Austin, TX 78712, USA.
Oden Institute for Computational Engineering and Sciences,
https://www.oden.utexas.edu/
The University of Texas at Austin,
https://www.utexas.edu/
201 E 24th St, Austin, TX 78712, USA.
-
- Posts: 36
- Joined: Tue Jun 06, 2023 2:55 am
Re: Parallelism of BSE kernel
Dear Daniele,
We also encountered issues during our BSE calculations. The final output of our LOG file is as follows
I believe there is a problem with how we specified our parallel parameters
Therefore, we hope to receive your advice. In this calculation, there are a total of 600 k-points and 36 occupied band
Best wishes
Jingda Guo
We also encountered issues during our BSE calculations. The final output of our LOG file is as follows
Code: Select all
<51m-31s> P1-n7: [LA] SERIAL linear algebra
<51m-31s> P1-n7: [09.01.07] Diago Solver @q1
<51m-31s> P1-n7: Folding BSE Kernel | | [000%] --(E) --(X)
<51m-31s> P1-n7: Folding BSE Kernel |########################################| [100%] --(E) --(X)
Code: Select all
DIP_CPU= "1 24 3" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k,c,v" # [PARALLEL] CPUs roles (k,c,v)
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
X_and_IO_CPU= "1 1 1 24 3" # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q,g,k,c,v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_and_IO_nCPU_LinAlg_INV=-1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
X_Threads=0 # [OPENMP/X] Number of threads for response functions
BS_CPU= "12 1 6" # [PARALLEL] CPUs for each role
BS_ROLEs= "k,eh,t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV=-1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
BS_nCPU_LinAlg_DIAGO=-1 # [PARALLEL] CPUs for Linear Algebra (if -1 it is automatically set)
K_Threads=0 # [OPENMP/BSK] Number of threads for response functions
Best wishes
Jingda Guo
Jingda Guo
Beijing Institute of Technology
Beijing Institute of Technology
- Daniele Varsano
- Posts: 4010
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Parallelism of BSE kernel
Dear Jingda,
this can be a memory issue, but we would need to inspect your input and report file.
In particular, if you compile the code using the flag --enable-memory-profile you can have trace of the memory allocated in the log files.
Can you check the dimension of your BSE matrix? It is reported in the report file.
In any case, you can try to run the diagonalization step using fewer CPUs, in this way you will have more memory-per-node.
Best,
Daniele
this can be a memory issue, but we would need to inspect your input and report file.
In particular, if you compile the code using the flag --enable-memory-profile you can have trace of the memory allocated in the log files.
Can you check the dimension of your BSE matrix? It is reported in the report file.
In any case, you can try to run the diagonalization step using fewer CPUs, in this way you will have more memory-per-node.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/