BSE restart with slepc solver

You can find here problems arising when using old releases of Yambo (< 5.0). Issues as parallelization strategy, performance issues and other technical aspects.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

Locked
Xiaoming Wang
Posts: 67
Joined: Fri Dec 18, 2020 7:14 am

BSE restart with slepc solver

Post by Xiaoming Wang » Mon May 24, 2021 2:07 am

Hello,

I'm trying to restart my BSE calculation using slepc solver with a different BSSEnTarget since my first guess is too high. However, I always fail with the log file stoped at "BSK resp. funct" :

Code: Select all

 <---> P1: [01] MPI/OPENMP structure, Files & I/O Directories
 <---> P1-r103u03: MPI Cores-Threads   : 72(CPU)-1(threads)
 <---> P1-r103u03: MPI Cores-Threads   : BS(environment)-2 1 36(CPUs)-k eh t(ROLEs)
 <---> P1-r103u03: [02] CORE Variables Setup
 <---> P1-r103u03: [02.01] Unit cells
 <---> P1-r103u03: [02.02] Symmetries
 <---> P1-r103u03: [02.03] Reciprocal space
 <---> P1-r103u03: [02.04] K-grid lattice
 <---> P1-r103u03: Grid dimensions      :   36   36    3
 <---> P1-r103u03: [02.05] Energies & Occupations
 <---> P1-r103u03: [03] Transferred momenta grid and indexing
 <---> P1-r103u03: [04] Dipoles
 <---> P1-r103u03: [DIP] Checking dipoles header
 <---> P1-r103u03: [WARNING] [r,Vnl^pseudo] included in position and velocity dipoles.
 <---> P1-r103u03: [WARNING] In case H contains other non local terms, these are neglected
 <---> P1-r103u03: [05] Bethe Salpeter Equation @q1
 <02s> P1-r103u03: [PARALLEL Response_T_space for K(ibz) on 2 CPU] Loaded/Total (Percentual):127/254(50%)
 <02s> P1-r103u03: [PARALLEL Response_T_space for (e/h) Groups on 1 CPU] Loaded/Total (Percentual):127/254(50%)
 <02s> P1-r103u03: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 36 CPU] Loaded/Total (Percentual):451/32385(1%)
 <02s> P1-r103u03: [PARALLEL Response_T_space for CON bands on 1 CPU] Loaded/Total (Percentual):2/2(100%)
 <02s> P1-r103u03: [PARALLEL Response_T_space for VAL bands on 36 CPU] Loaded/Total (Percentual):1/2(50%)
 <02s> P1-r103u03: [05.01] Transition Groups build-up @q1
 <02s> P1-r103u03: [MEMORY] Parallel distribution of BS_MAT on HOST r103u03 with size  245.6667 [Mb]
 <02s> P1-r103u03: [BSK] Size (resonant):  15552
 <02s> P1-r103u03: [BSK]         (total):  15552
 <02s> P1-r103u03: [BSK] Matricies      :  1
 <02s> P1-r103u03: [PARALLEL Response_T_space for Kernel matrix elements] Loaded/Total(Percentual): 0.154906E+7/ 0.120940E+9 (1%)
 <02s> P1-r103u03: [05.02] Independent Particles properties @q1
 <02s> P1-r103u03: [DIP] Checking dipoles header
 <02s> P1-r103u03: [05.03] BSE Kernel @q1 (Resonant CORRRELATION EXCHANGE)
 <02s> P1-r103u03: Complete BSE file found in ./BSE//ndb.BS_PAR_Q1. Loading kernel.
 <02s> P1-r103u03: Loading BSE kernel |                                        | [000%] --(E) --(X)
 <08s> P1-r103u03: Loading BSE kernel |#                                       | [002%] 05s(E) 03m-38s(X)
 <11s> P1-r103u03: Loading BSE kernel |########################################| [100%] 08s(E) 08s(X)
 <11s> P1-r103u03: [06] BSE solver(s) @q1
 <11s> P1-r103u03: [06.01] Slepc Solver @q1
 <11s> P1-r103u03: [SLEPC] Slower alogorithm but BSE matrix distributed over MPI tasks
 <11s> P1-r103u03: BSK resp. funct |                                        | [000%] --(E) --(X)
and the error report:

Code: Select all

[61]PETSC ERROR: ------------------------------------------------------------------------
[30]PETSC ERROR: ------------------------------------------------------------------------
[68]PETSC ERROR: ------------------------------------------------------------------------
[68]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[68]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[57]PETSC ERROR: ------------------------------------------------------------------------
[57]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[57]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[57]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[57]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[57]PETSC ERROR: likely location of problem given in stack below
[57]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[59]PETSC ERROR: ------------------------------------------------------------------------
[59]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[59]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[59]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[60]PETSC ERROR: ------------------------------------------------------------------------
[60]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[60]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[60]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[60]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[60]PETSC ERROR: likely location of problem given in stack below
[65]PETSC ERROR: ------------------------------------------------------------------------
[65]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[65]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[65]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[65]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[65]PETSC ERROR: likely location of problem given in stack below
[65]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[65]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[65]PETSC ERROR:       INSTEAD the line number of the start of the function
[65]PETSC ERROR:       is given.
[65]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[29]PETSC ERROR: ------------------------------------------------------------------------
[29]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[29]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[29]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[29]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[68]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[68]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[68]PETSC ERROR: likely location of problem given in stack below
[68]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[68]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[68]PETSC ERROR:       INSTEAD the line number of the start of the function
[68]PETSC ERROR:       is given.
[68]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[68]PETSC ERROR: Signal received
[68]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[68]PETSC ERROR: Petsc Release Version 3.13.4, Aug 01, 2020 
[68]PETSC ERROR: Unknown Name on a  named r103u05 by wxiaom86 Sun May 23 18:43:16 2021
[2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[2]PETSC ERROR:       INSTEAD the line number of the start of the function
[2]PETSC ERROR:       is given.
[2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[2]PETSC ERROR: Signal received
[68]PETSC ERROR: Configure options --prefix=/home/wxiaom86/soft/yambo-5.0.2/lib/external/intel/mpiifort/single --PETSC_ARCH=yambo_single_c
omplex --with-ssl=0 --with-x=0 --with-cxx=0 --with-shared-libraries=0 --with-blaslapack-lib="-L/nopt/nrel/apps/base/2019-01-02/spack/opt/s
pack/linux-centos7-x86_64/intel-18.0.3/intel-mkl-2018.3.222-dzfj7xvn6uy7tqmmgzwfcjkucomyxkui/compilers_and_libraries_2018.3.222/linux/mkl 
-lmkl_intel_lp64  -lmkl_sequential -lmkl_core  -L/nopt/nrel/apps/base/2019-01-02/spack/opt/spack/linux-centos7-x86_64/intel-18.0.3/intel-m
kl-2018.3.222-dzfj7xvn6uy7tqmmgzwfcjkucomyxkui/compilers_and_libraries_2018.3.222/linux/mkl -lmkl_intel_lp64  -lmkl_sequential -lmkl_core 
" --with-scalar-type=complex --with-precision=single --with-cc=mpiicc --with-fc=mpiifort
[68]PETSC ERROR: #1 User provided function() line 0 in  unknown file
application called MPI_Abort(MPI_COMM_WORLD, 50162059) - process 68
In: PMI_Abort(50162059, application called MPI_Abort(MPI_COMM_WORLD, 50162059) - process 68)
The slepc lib was built by yambo internally. BSE with slepc solver works for a brand-new run. And restart with a different solver also works. Any comment on this situation?

Best,
Xiaoming
Xiaoming Wang
The University of Toledo

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: BSE restart with slepc solver

Post by Davide Sangalli » Mon May 24, 2021 8:50 am

Dear Xiaoming,
the restart should work also with the SLEPC solver.

Can you try to delete the ndb.BS_diago_Q1 file before running the restart?

Which yambo version are you using?

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

Xiaoming Wang
Posts: 67
Joined: Fri Dec 18, 2020 7:14 am

Re: BSE restart with slepc solver

Post by Xiaoming Wang » Mon May 24, 2021 1:31 pm

Dear Davide,

Thank your for your help. Deleting the ndb.BS_diago_Q1 works for me. Btw, I'm using yambo-5.0.2.

Best,
Xiaoming
Xiaoming Wang
The University of Toledo

Locked