CRASH of BSE calculation in parallel?
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan
-
- Posts: 109
- Joined: Thu Oct 10, 2019 7:03 am
CRASH of BSE calculation in parallel?
Dear all,
I am doing a BSE calculation in parallel, but it crashed without clear hint. I have tried to use more nodes to ensure enough memory or change the BSE-related variables: BS_CPU/ BS_ROLEs, but it didn't work. The input and output files is attached.
Any suggestion would be appreciated.
I am doing a BSE calculation in parallel, but it crashed without clear hint. I have tried to use more nodes to ensure enough memory or change the BSE-related variables: BS_CPU/ BS_ROLEs, but it didn't work. The input and output files is attached.
Any suggestion would be appreciated.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.
- Davide Sangalli
- Posts: 641
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
Re: CRASH of BSE calculation in parallel?
Dr. Yimin Ding,
it is very hard to get the reason from just input file and report.
Anyway It seems that your run correctly performs the "perturbative inversion" part for 142 of the frequency points in your frequency axis and then crashes when trying the "full inversion" for the remaining 259.
The total is what you set in input
The first thing you can try is to set in input (default il "pf")
and play with the variable (0.01 is the default, in my experience the negative values do not work properly)
The perturbative only part will likely not get all the frequencies, but at least you can get the solution for some frequencies.
For the others you will have some zeros. Playing with (i.e. increasing) the smearing may also help
The next step is to try the full inversion and for that I'd advise to check if it is a memory problem by first solving the BSE without double grid in diago mode.
I think the full inversion may require the same amount of memory. Moreover the memory in that step is not distributed and the inversion operation is serial, unless you use scalapack.
So you may prefer to do that last step in serial (loading all data from the previous step)
Best,
D.
it is very hard to get the reason from just input file and report.
Anyway It seems that your run correctly performs the "perturbative inversion" part for 142 of the frequency points in your frequency axis and then crashes when trying the "full inversion" for the remaining 259.
The total is what you set in input
Code: Select all
BEnSteps=401 # [BSS] Energy steps
Code: Select all
BSSInvMode="p"
Code: Select all
BSEPSInvTrs=0.01 '[BSS EPS] Inversion treshold. Relative[o/o](>0)/Absolute(<0)'
For the others you will have some zeros. Playing with (i.e. increasing) the smearing may also help
The next step is to try the full inversion and for that I'd advise to check if it is a memory problem by first solving the BSE without double grid in diago mode.
I think the full inversion may require the same amount of memory. Moreover the memory in that step is not distributed and the inversion operation is serial, unless you use scalapack.
So you may prefer to do that last step in serial (loading all data from the previous step)
Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
-
- Posts: 109
- Joined: Thu Oct 10, 2019 7:03 am
Re: CRASH of BSE calculation in parallel?
Dear Davide,
Thanks for your reply firstly.
When I set small BSEbands (such as 37 | 49 |), yambo runs with correct outputs, please see the attached file.
But, when I set a little larger BSEbands (such as 35 | 50 |), it crashed with solution for some not all frequencies.
I tryied to use many nodes so as to get enough RAM menory, but it did not work and I guess the memory is not distributed parallelly in these nodes.
So, how to detarmin the running is in serial or in paralle?
Any suggestion would be appreciated.
Thanks for your reply firstly.
When I set small BSEbands (such as 37 | 49 |), yambo runs with correct outputs, please see the attached file.
But, when I set a little larger BSEbands (such as 35 | 50 |), it crashed with solution for some not all frequencies.
I tryied to use many nodes so as to get enough RAM menory, but it did not work and I guess the memory is not distributed parallelly in these nodes.
So, how to detarmin the running is in serial or in paralle?
Any suggestion would be appreciated.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.
- claudio
- Posts: 526
- Joined: Tue Mar 31, 2009 11:33 pm
- Location: Marseille
- Contact:
Re: CRASH of BSE calculation in parallel?
Dr. Yimin Ding,
in order to have more memory for the BSE inversion you can use two strategies:
1) increase the number of threads, 2, 3 or 4 threads,
in some system, you have to set
OMP_NUM_THREADS=2 (or 3, 4)
2) compile yambo with Scalpack and increase the number of processor in linear algebra (4, 9, 16 or more).
To do this, add the flag -V par when you generate the input and set:
BS_nCPU_LinAlg_INV=4
BS_nCPU_LinAlg_DIAGO=4
for example.
You can play with these two strategies together. For example is you have 32 cores you can set 2 Threads and 16 processors in linear algebra.
best
Claudio
in order to have more memory for the BSE inversion you can use two strategies:
1) increase the number of threads, 2, 3 or 4 threads,
in some system, you have to set
OMP_NUM_THREADS=2 (or 3, 4)
2) compile yambo with Scalpack and increase the number of processor in linear algebra (4, 9, 16 or more).
To do this, add the flag -V par when you generate the input and set:
BS_nCPU_LinAlg_INV=4
BS_nCPU_LinAlg_DIAGO=4
for example.
You can play with these two strategies together. For example is you have 32 cores you can set 2 Threads and 16 processors in linear algebra.
best
Claudio
Claudio Attaccalite
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com
-
- Posts: 109
- Joined: Thu Oct 10, 2019 7:03 am
Re: CRASH of BSE calculation in parallel?
Dear Claudio,
Thanks for your reply. I will try it.
Best,
Yimin Ding
Thanks for your reply. I will try it.
Best,
Yimin Ding
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.
-
- Posts: 109
- Joined: Thu Oct 10, 2019 7:03 am
Re: CRASH of BSE calculation in parallel?
Dear Claudio,
According to your suggestion, I compile yambo with Scalpack and set "BS_nCPU_LinAlg_INV , BS_nCPU_LinAlg_DIAGO" in BSE calculations.
When I use two nodes (24 cores per node) and set "BS_nCPU_LinAlg_INV=4 , BS_nCPU_LinAlg_DIAGO=4", the jobs runs successfully.
But, when I use more nodes (such as 3, 4, 5,6,7,8) and set many valuse of "BS_nCPU_LinAlg_INV , BS_nCPU_LinAlg_DIAGO" , the jobs always crashed with no output.
As time goes on, my frustration boiled over.
Any suggestion would be appreciated.
According to your suggestion, I compile yambo with Scalpack and set "BS_nCPU_LinAlg_INV , BS_nCPU_LinAlg_DIAGO" in BSE calculations.
When I use two nodes (24 cores per node) and set "BS_nCPU_LinAlg_INV=4 , BS_nCPU_LinAlg_DIAGO=4", the jobs runs successfully.
But, when I use more nodes (such as 3, 4, 5,6,7,8) and set many valuse of "BS_nCPU_LinAlg_INV , BS_nCPU_LinAlg_DIAGO" , the jobs always crashed with no output.
As time goes on, my frustration boiled over.
Any suggestion would be appreciated.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.
- Daniele Varsano
- Posts: 4201
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: CRASH of BSE calculation in parallel?
Dear Yimin,
can you show also the input files?
My poor man suggestion here is to reduce the number of CPU and raise the number of threads, this will allow having more memory per core inside the node. May be others, experts in the inversion procedure can give you a better suggestion.
Best,
Daniele
can you show also the input files?
My poor man suggestion here is to reduce the number of CPU and raise the number of threads, this will allow having more memory per core inside the node. May be others, experts in the inversion procedure can give you a better suggestion.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- claudio
- Posts: 526
- Joined: Tue Mar 31, 2009 11:33 pm
- Location: Marseille
- Contact:
Re: CRASH of BSE calculation in parallel?
Dear Yimm
my suggestions are:
1) try to double the number of nodes from 2 to 4 and at the same moment double the number of threads
2) try other methods like Haydock, they are much more parallalerized
and distributed in memory. With Haydock I solved BSE matrices up a size of 200.000.
best
Claudio
my suggestions are:
1) try to double the number of nodes from 2 to 4 and at the same moment double the number of threads
2) try other methods like Haydock, they are much more parallalerized
and distributed in memory. With Haydock I solved BSE matrices up a size of 200.000.
best
Claudio
Claudio Attaccalite
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com
-
- Posts: 109
- Joined: Thu Oct 10, 2019 7:03 am
Re: CRASH of BSE calculation in parallel?
Dear Claudio,
Thanks for your reply.
I will try this method "1) try to double the number of nodes from 2 to 4 and at the same moment double the number of threads"
I want to use double-grid method speed up dielectric constant calculations, then I have to use inversion method.
Best,
Yimin Ding
Thanks for your reply.
I will try this method "1) try to double the number of nodes from 2 to 4 and at the same moment double the number of threads"
I want to use double-grid method speed up dielectric constant calculations, then I have to use inversion method.
Best,
Yimin Ding
Dr. Yimin Ding
Soochow University, China.
Soochow University, China.