Hello,
I am trying to do BSE calculations on a system with low symmetries (strained hBN), hence the matrix to diagonalize is quite large and this makes the calculations crash (I suppose).
The yambo code returns no error but the processes are killed by the MPI KILLED BY SIGNAL: 9 when I run in parallel on a cluster. I read this usually happens when the memory per core is unsufficient.
I am using the SLEPC solver, and even with a low number of eigenvalues the calculations stop. I attach the input and report files and an example of a log file.
Do you have any hints on why the calculations crash ? If so, could you advice a potential workaround ?
Thank you,
Pierre
Crash with SLEPC solver
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan
-
- Posts: 2
- Joined: Wed Apr 07, 2021 3:43 pm
Crash with SLEPC solver
You do not have the required permissions to view the files attached to this post.
Pierre Lechifflart, PhD student
Aix-Marseille Université/ CNRS-CINaM laboratory
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
Aix-Marseille Université/ CNRS-CINaM laboratory
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
- Daniele Varsano
- Posts: 4231
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Crash with SLEPC solver
Dear Pierre,
your BSE matrix is rather large, but I would stay not enormous and should be still possible to diagonalize.
Have you tried to diagonalise using parallel algebra, i.e. scalapack?
About slepc, I'm not expert on that so I cannot advise, hopefully someone else will do.
Best,
Daniele
your BSE matrix is rather large, but I would stay not enormous and should be still possible to diagonalize.
Have you tried to diagonalise using parallel algebra, i.e. scalapack?
About slepc, I'm not expert on that so I cannot advise, hopefully someone else will do.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- Davide Sangalli
- Posts: 643
- Joined: Tue May 29, 2012 4:49 pm
- Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
- Contact:
Re: Crash with SLEPC solver
The input file looks ok.
Do you get any specific error message?
Are you using the slepc library internally compiled by yambo or an external one?
Best,
D.
Do you get any specific error message?
Are you using the slepc library internally compiled by yambo or an external one?
Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/
-
- Posts: 2
- Joined: Wed Apr 07, 2021 3:43 pm
Re: Crash with SLEPC solver
Thank you for your replies.
To Daniele : at first I was using the full diagonalization with -y d and it kept crashing, this is why Claudio Attaccalite (my PhD advisor) recommended using the SLEPC solver.
To Davide : the error messages I am getting look like the following
Note that I am using the intel compilers with internally compiled libraries.
We have investigated the problem with Claudio and found that the BSE calculations terminate correctly when changing the parallelization from
to
and using only one node.
I will keep you udpated.
Thank you again
Best regards
Pierre
To Daniele : at first I was using the full diagonalization with -y d and it kept crashing, this is why Claudio Attaccalite (my PhD advisor) recommended using the SLEPC solver.
To Davide : the error messages I am getting look like the following
Code: Select all
/opt/intel/oneapi/mpi/2021.2.0//lib/release/libmpi.so.12(+0x17d690) [0x2af8cb857690]
We have investigated the problem with Claudio and found that the BSE calculations terminate correctly when changing the parallelization from
Code: Select all
BS_CPU= "8 4 1"
BS_ROLEs= "k eh t"
Code: Select all
BS_CPU= "32 1 1"
BS_ROLEs= "k eh t"
I will keep you udpated.
Thank you again
Best regards
Pierre
Pierre Lechifflart, PhD student
Aix-Marseille Université/ CNRS-CINaM laboratory
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
Aix-Marseille Université/ CNRS-CINaM laboratory
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
- claudio
- Posts: 534
- Joined: Tue Mar 31, 2009 11:33 pm
- Location: Marseille
- Contact:
Re: Crash with SLEPC solver
Dear all
I confirm that the bug is due to the parallelization on 'eh', if I put 4 core on 'eh' it crashes,
other combinations are fine
we have a system with 4 valence, 4 conduction and 492 k-points,
and I performed the test with the bug-fixes
the size of the matrix is small
BS_MAT on HOST compute-8-3.local with size 840.2910 [Mb]
and we have 3GB for each core
Claudio
I confirm that the bug is due to the parallelization on 'eh', if I put 4 core on 'eh' it crashes,
other combinations are fine
we have a system with 4 valence, 4 conduction and 492 k-points,
and I performed the test with the bug-fixes
the size of the matrix is small
BS_MAT on HOST compute-8-3.local with size 840.2910 [Mb]
and we have 3GB for each core
Claudio
Claudio Attaccalite
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com
[CNRS/ Aix-Marseille Université/ CINaM laborarory / TSN department
Campus de Luminy – Case 913
13288 MARSEILLE Cedex 09
web site: http://www.attaccalite.com