Crash with SLEPC solver

You can find here problems arising when using old releases of Yambo (< 5.0). Issues as parallelization strategy, performance issues and other technical aspects.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

Locked
Pierre Lechifflart
Posts: 2
Joined: Wed Apr 07, 2021 3:43 pm

Crash with SLEPC solver

Post by Pierre Lechifflart » Thu May 20, 2021 3:58 pm

Hello,

I am trying to do BSE calculations on a system with low symmetries (strained hBN), hence the matrix to diagonalize is quite large and this makes the calculations crash (I suppose).
The yambo code returns no error but the processes are killed by the MPI KILLED BY SIGNAL: 9 when I run in parallel on a cluster. I read this usually happens when the memory per core is unsufficient.

I am using the SLEPC solver, and even with a low number of eigenvalues the calculations stop. I attach the input and report files and an example of a log file.

Do you have any hints on why the calculations crash ? If so, could you advice a potential workaround ?

Thank you,
Pierre
You do not have the required permissions to view the files attached to this post.
Pierre Lechifflart, PhD student
Aix-Marseille Université/ CNRS-CINaM laboratory
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09

User avatar
Daniele Varsano
Posts: 4231
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Crash with SLEPC solver

Post by Daniele Varsano » Thu May 20, 2021 4:26 pm

Dear Pierre,
your BSE matrix is rather large, but I would stay not enormous and should be still possible to diagonalize.
Have you tried to diagonalise using parallel algebra, i.e. scalapack?

About slepc, I'm not expert on that so I cannot advise, hopefully someone else will do.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
Davide Sangalli
Posts: 643
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: Crash with SLEPC solver

Post by Davide Sangalli » Thu May 20, 2021 7:11 pm

The input file looks ok.

Do you get any specific error message?
Are you using the slepc library internally compiled by yambo or an external one?

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

Pierre Lechifflart
Posts: 2
Joined: Wed Apr 07, 2021 3:43 pm

Re: Crash with SLEPC solver

Post by Pierre Lechifflart » Fri May 21, 2021 9:50 am

Thank you for your replies.

To Daniele : at first I was using the full diagonalization with -y d and it kept crashing, this is why Claudio Attaccalite (my PhD advisor) recommended using the SLEPC solver.

To Davide : the error messages I am getting look like the following

Code: Select all

/opt/intel/oneapi/mpi/2021.2.0//lib/release/libmpi.so.12(+0x17d690) [0x2af8cb857690]
Note that I am using the intel compilers with internally compiled libraries.

We have investigated the problem with Claudio and found that the BSE calculations terminate correctly when changing the parallelization from

Code: Select all

BS_CPU= "8 4 1" 
BS_ROLEs= "k eh t"
to

Code: Select all

BS_CPU= "32 1 1"  
BS_ROLEs= "k eh t"
and using only one node.

I will keep you udpated.

Thank you again
Best regards

Pierre
Pierre Lechifflart, PhD student
Aix-Marseille Université/ CNRS-CINaM laboratory
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09

User avatar
claudio
Posts: 534
Joined: Tue Mar 31, 2009 11:33 pm
Location: Marseille
Contact:

Re: Crash with SLEPC solver

Post by claudio » Fri May 21, 2021 10:40 am

Dear all

I confirm that the bug is due to the parallelization on 'eh', if I put 4 core on 'eh' it crashes,
other combinations are fine

we have a system with 4 valence, 4 conduction and 492 k-points,
and I performed the test with the bug-fixes

the size of the matrix is small
BS_MAT on HOST compute-8-3.local with size 840.2910 [Mb]

and we have 3GB for each core


Claudio
Claudio Attaccalite
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com

Locked