CRASH of BSE calculation in parallel?

Deals with issues related to computation of optical spectra in reciprocal space: RPA, TDDFT, local field effects.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

Post Reply
Dean
Posts: 103
Joined: Thu Oct 10, 2019 7:03 am

CRASH of BSE calculation in parallel?

Post by Dean » Tue Oct 27, 2020 2:22 am

Dear all,
I am doing a BSE calculation in parallel, but it crashed without clear hint. I have tried to use more nodes to ensure enough memory or change the BSE-related variables: BS_CPU/ BS_ROLEs, but it didn't work. The input and output files is attached.
Any suggestion would be appreciated.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: CRASH of BSE calculation in parallel?

Post by Davide Sangalli » Tue Oct 27, 2020 6:19 pm

Dr. Yimin Ding,
it is very hard to get the reason from just input file and report.

Anyway It seems that your run correctly performs the "perturbative inversion" part for 142 of the frequency points in your frequency axis and then crashes when trying the "full inversion" for the remaining 259.
The total is what you set in input

Code: Select all

BEnSteps=401                     # [BSS] Energy steps
The first thing you can try is to set in input (default il "pf")

Code: Select all

BSSInvMode="p"  
and play with the variable (0.01 is the default, in my experience the negative values do not work properly)

Code: Select all

BSEPSInvTrs=0.01   '[BSS EPS] Inversion treshold. Relative[o/o](>0)/Absolute(<0)'
The perturbative only part will likely not get all the frequencies, but at least you can get the solution for some frequencies.
For the others you will have some zeros. Playing with (i.e. increasing) the smearing may also help

The next step is to try the full inversion and for that I'd advise to check if it is a memory problem by first solving the BSE without double grid in diago mode.
I think the full inversion may require the same amount of memory. Moreover the memory in that step is not distributed and the inversion operation is serial, unless you use scalapack.
So you may prefer to do that last step in serial (loading all data from the previous step)

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

Dean
Posts: 103
Joined: Thu Oct 10, 2019 7:03 am

Re: CRASH of BSE calculation in parallel?

Post by Dean » Wed Oct 28, 2020 9:37 am

Dear Davide,
Thanks for your reply firstly.
When I set small BSEbands (such as 37 | 49 |), yambo runs with correct outputs, please see the attached file.
But, when I set a little larger BSEbands (such as 35 | 50 |), it crashed with solution for some not all frequencies.
I tryied to use many nodes so as to get enough RAM menory, but it did not work and I guess the memory is not distributed parallelly in these nodes.
So, how to detarmin the running is in serial or in paralle?
Any suggestion would be appreciated.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.

User avatar
claudio
Posts: 458
Joined: Tue Mar 31, 2009 11:33 pm
Location: Marseille
Contact:

Re: CRASH of BSE calculation in parallel?

Post by claudio » Mon Nov 02, 2020 9:24 am

Dr. Yimin Ding,

in order to have more memory for the BSE inversion you can use two strategies:

1) increase the number of threads, 2, 3 or 4 threads,
in some system, you have to set

OMP_NUM_THREADS=2 (or 3, 4)

2) compile yambo with Scalpack and increase the number of processor in linear algebra (4, 9, 16 or more).
To do this, add the flag -V par when you generate the input and set:

BS_nCPU_LinAlg_INV=4
BS_nCPU_LinAlg_DIAGO=4

for example.

You can play with these two strategies together. For example is you have 32 cores you can set 2 Threads and 16 processors in linear algebra.

best
Claudio
Claudio Attaccalite
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com

Dean
Posts: 103
Joined: Thu Oct 10, 2019 7:03 am

Re: CRASH of BSE calculation in parallel?

Post by Dean » Tue Nov 03, 2020 9:05 am

Dear Claudio,
Thanks for your reply. I will try it.
Best,
Yimin Ding
Dr. Yimin Ding
Soochow University, China.

Dean
Posts: 103
Joined: Thu Oct 10, 2019 7:03 am

Re: CRASH of BSE calculation in parallel?

Post by Dean » Fri Dec 11, 2020 2:09 am

Dear Claudio,
According to your suggestion, I compile yambo with Scalpack and set "BS_nCPU_LinAlg_INV , BS_nCPU_LinAlg_DIAGO" in BSE calculations.
When I use two nodes (24 cores per node) and set "BS_nCPU_LinAlg_INV=4 , BS_nCPU_LinAlg_DIAGO=4", the jobs runs successfully.
But, when I use more nodes (such as 3, 4, 5,6,7,8) and set many valuse of "BS_nCPU_LinAlg_INV , BS_nCPU_LinAlg_DIAGO" , the jobs always crashed with no output.
As time goes on, my frustration boiled over.
Any suggestion would be appreciated.
You do not have the required permissions to view the files attached to this post.
Dr. Yimin Ding
Soochow University, China.

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: CRASH of BSE calculation in parallel?

Post by Daniele Varsano » Fri Dec 11, 2020 10:51 am

Dear Yimin,
can you show also the input files?
My poor man suggestion here is to reduce the number of CPU and raise the number of threads, this will allow having more memory per core inside the node. May be others, experts in the inversion procedure can give you a better suggestion.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
claudio
Posts: 458
Joined: Tue Mar 31, 2009 11:33 pm
Location: Marseille
Contact:

Re: CRASH of BSE calculation in parallel?

Post by claudio » Fri Dec 11, 2020 2:30 pm

Dear Yimm

my suggestions are:

1) try to double the number of nodes from 2 to 4 and at the same moment double the number of threads

2) try other methods like Haydock, they are much more parallalerized
and distributed in memory. With Haydock I solved BSE matrices up a size of 200.000.

best
Claudio
Claudio Attaccalite
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com

Dean
Posts: 103
Joined: Thu Oct 10, 2019 7:03 am

Re: CRASH of BSE calculation in parallel?

Post by Dean » Mon Dec 14, 2020 3:51 am

Dear Claudio,
Thanks for your reply.
I will try this method "1) try to double the number of nodes from 2 to 4 and at the same moment double the number of threads"
I want to use double-grid method speed up dielectric constant calculations, then I have to use inversion method.
Best,
Yimin Ding
Dr. Yimin Ding
Soochow University, China.

Post Reply