Page 1 of 1

Crash with SLEPC solver

Posted: Thu May 20, 2021 3:58 pm
by Pierre Lechifflart
Hello,

I am trying to do BSE calculations on a system with low symmetries (strained hBN), hence the matrix to diagonalize is quite large and this makes the calculations crash (I suppose).
The yambo code returns no error but the processes are killed by the MPI KILLED BY SIGNAL: 9 when I run in parallel on a cluster. I read this usually happens when the memory per core is unsufficient.

I am using the SLEPC solver, and even with a low number of eigenvalues the calculations stop. I attach the input and report files and an example of a log file.

Do you have any hints on why the calculations crash ? If so, could you advice a potential workaround ?

Thank you,
Pierre

Re: Crash with SLEPC solver

Posted: Thu May 20, 2021 4:26 pm
by Daniele Varsano
Dear Pierre,
your BSE matrix is rather large, but I would stay not enormous and should be still possible to diagonalize.
Have you tried to diagonalise using parallel algebra, i.e. scalapack?

About slepc, I'm not expert on that so I cannot advise, hopefully someone else will do.
Best,
Daniele

Re: Crash with SLEPC solver

Posted: Thu May 20, 2021 7:11 pm
by Davide Sangalli
The input file looks ok.

Do you get any specific error message?
Are you using the slepc library internally compiled by yambo or an external one?

Best,
D.

Re: Crash with SLEPC solver

Posted: Fri May 21, 2021 9:50 am
by Pierre Lechifflart
Thank you for your replies.

To Daniele : at first I was using the full diagonalization with -y d and it kept crashing, this is why Claudio Attaccalite (my PhD advisor) recommended using the SLEPC solver.

To Davide : the error messages I am getting look like the following

Code: Select all

/opt/intel/oneapi/mpi/2021.2.0//lib/release/libmpi.so.12(+0x17d690) [0x2af8cb857690]
Note that I am using the intel compilers with internally compiled libraries.

We have investigated the problem with Claudio and found that the BSE calculations terminate correctly when changing the parallelization from

Code: Select all

BS_CPU= "8 4 1" 
BS_ROLEs= "k eh t"
to

Code: Select all

BS_CPU= "32 1 1"  
BS_ROLEs= "k eh t"
and using only one node.

I will keep you udpated.

Thank you again
Best regards

Pierre

Re: Crash with SLEPC solver

Posted: Fri May 21, 2021 10:40 am
by claudio
Dear all

I confirm that the bug is due to the parallelization on 'eh', if I put 4 core on 'eh' it crashes,
other combinations are fine

we have a system with 4 valence, 4 conduction and 492 k-points,
and I performed the test with the bug-fixes

the size of the matrix is small
BS_MAT on HOST compute-8-3.local with size 840.2910 [Mb]

and we have 3GB for each core


Claudio