Page 1 of 1

Haydock in BSE - segmentation fault

Posted: Fri Oct 27, 2023 10:00 am
by malwi
Dear Team,

I got the error in BSE with SOC in the last step - Haydock diagonalization - the slurm is listed below.
LOG file end is also below.
It happens both with the versions with "time-profile" and "no-time-profile".

Best regards,
Malgorzata

==============================
GCCcore/11.3.0 loaded.
zlib/1.2.12 loaded.
binutils/2.38 loaded.
numactl/2.0.14 loaded.
CUDA/11.7.0 loaded.
NVHPC/22.11-CUDA-11.7.0 loaded.
XZ/5.2.5 loaded.
libxml2/2.9.13 loaded.
libpciaccess/0.16 loaded.
hwloc/2.7.1 loaded.
OpenSSL/1.1 loaded.
libevent/2.1.12 loaded.
UCX/1.12.1 loaded.
GDRCopy/2.3 loaded.
UCX-CUDA/1.12.1-CUDA-11.7.0 loaded.
libfabric/1.15.1 loaded.
PMIx/4.1.2 loaded.
UCC/1.0.0 loaded.
NCCL/2.12.12-CUDA-11.7.0 loaded.
UCC-CUDA/1.0.0-CUDA-11.7.0 loaded.
OpenMPI/4.1.4 loaded.
Yambo/5.1.1-991f327-no-time-profile loaded.
[t0024:2940403:0:2940403] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc57dc2d0)
[t0024:2940407:0:2940407] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc54bbcb0)
[t0024:2940401:0:2940401] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc61a92d0)
[t0024:2940402:0:2940402] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc63e1e60)
[t0024:2940408:0:2940408] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc46b8c10)
[t0024:2940404:0:2940404] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc4de0380)
[t0024:2940405:0:2940405] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc47aff00)
[t0024:2940406:0:2940406] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc6e419d0)
==== backtrace (tid:2940403) ====
0 0x0000000000054df0 __GI___sigaction() :0
1 0x0000000000628bf4 sym_init_table() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_symbols.c:44
2 0x0000000000626f5b parse_init() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER.c:71
3 0x0000000000626c06 iparse_init_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_interface.c:31
4 0x00000000006184a3 it_tools_it_reset_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/mod_it_tools.f90:61
==============================

LOG fie end

<35s> P1: Loading full BSE kernel |###### | [015%] 26s(E) 02m-55s(X)
<44s> P1: Loading full BSE kernel |####### | [017%] 35s(E) 03m-23s(X)
<57s> P1: Loading full BSE kernel |######## | [020%] 48s(E) 04m-04s(X)
<01m-21s> P1: Loading full BSE kernel |######### | [022%] 01m-12s(E) 05m-22s(X)
<01m-37s> P1: Loading full BSE kernel |########## | [025%] 01m-28s(E) 05m-53s(X)
<01m-57s> P1: Loading full BSE kernel |########### | [027%] 01m-48s(E) 06m-34s(X)
<02m-28s> P1: Loading full BSE kernel |############ | [030%] 02m-19s(E) 07m-45s(X)
<03m-13s> P1: Loading full BSE kernel |############# | [032%] 03m-04s(E) 09m-28s(X)
<04m-21s> P1: Loading full BSE kernel |############## | [035%] 04m-12s(E) 12m-00s(X)
<05m-28s> P1: Loading full BSE kernel |############### | [037%] 05m-19s(E) 14m-11s(X)
<06m-06s> P1: Loading full BSE kernel |################ | [040%] 05m-57s(E) 14m-53s(X)
<06m-58s> P1: Loading full BSE kernel |################# | [042%] 06m-49s(E) 16m-04s(X)
<08m-27s> P1: Loading full BSE kernel |################## | [045%] 08m-19s(E) 18m-28s(X)
<09m-17s> P1: Loading full BSE kernel |################### | [047%] 09m-08s(E) 19m-15s(X)
<10m-24s> P1: Loading full BSE kernel |#################### | [050%] 10m-16s(E) 20m-31s(X)
<11m-25s> P1: Loading full BSE kernel |##################### | [052%] 11m-16s(E) 21m-29s(X)
<12m-37s> P1: Loading full BSE kernel |###################### | [055%] 12m-29s(E) 22m-41s(X)
<13m-39s> P1: Loading full BSE kernel |####################### | [057%] 13m-30s(E) 23m-29s(X)
<15m-23s> P1: Loading full BSE kernel |######################## | [060%] 15m-14s(E) 25m-23s(X)
<17m-32s> P1: Loading full BSE kernel |######################### | [062%] 17m-23s(E) 27m-50s(X)
<19m-53s> P1: Loading full BSE kernel |########################## | [065%] 19m-44s(E) 30m-22s(X)
<21m-50s> P1: Loading full BSE kernel |########################### | [067%] 21m-42s(E) 32m-09s(X)
<24m-31s> P1: Loading full BSE kernel |############################ | [070%] 24m-22s(E) 34m-48s(X)
<27m-27s> P1: Loading full BSE kernel |############################# | [072%] 27m-18s(E) 37m-40s(X)
<29m-29s> P1: Loading full BSE kernel |############################## | [075%] 29m-20s(E) 39m-07s(X)
<32m-11s> P1: Loading full BSE kernel |############################### | [077%] 32m-02s(E) 41m-20s(X)
<33m-23s> P1: Loading full BSE kernel |################################ | [080%] 33m-14s(E) 41m-33s(X)
<35m-25s> P1: Loading full BSE kernel |################################# | [082%] 35m-16s(E) 42m-45s(X)
<37m-08s> P1: Loading full BSE kernel |################################## | [085%] 36m-59s(E) 43m-31s(X)
<38m-49s> P1: Loading full BSE kernel |################################### | [087%] 38m-41s(E) 44m-12s(X)
<39m-59s> P1: Loading full BSE kernel |#################################### | [090%] 39m-50s(E) 44m-16s(X)
<41m-32s> P1: Loading full BSE kernel |##################################### | [092%] 41m-23s(E) 44m-45s(X)
<42m-12s> P1: Loading full BSE kernel |###################################### | [095%] 42m-03s(E) 44m-16s(X)
<42m-37s> P1: Loading full BSE kernel |####################################### | [097%] 42m-29s(E) 43m-34s(X)
<42m-45s> P1: Loading full BSE kernel |########################################| [100%] 42m-37s(E) 42m-37s(X)
<46m-39s> P1: [05.02] BSE solver(s) @q1
<46m-39s> P1: [05.03] Haydock Solver in the optics basis @q1 using the hermitian scheme
===================================================

Re: Haydock in BSE - segmentation fault

Posted: Mon Oct 30, 2023 11:26 am
by Daniele Varsano
Dear Gosia,
can you attach your input and report files? You can use the attachments function below the message and add files after renaming the suffix (e.g. input.txt, report.txt).
Best,
Daniele

Re: Haydock in BSE - segmentation fault

Posted: Tue Oct 31, 2023 12:31 am
by malwi
Dear Daniele,
thank you. I attach the files. It was run with 8 cpu and 8 gpu, 1 thread per cpu.
Gosia

Re: Haydock in BSE - segmentation fault

Posted: Thu Nov 02, 2023 10:01 am
by Daniele Varsano
Dear Gosia,
not easy to spot the problem!

The only thing I can see is that in your input file you are not including any QP correction (nor from database, nor as scissor operator).
BSE on top of KS can lead to negative excitation energy. In this case, I'm not sure the haydock solver in hermitian scheme is able to handle this. To verify if this is actually the case, can you add a QP scissor correction by hand and see if yambo runs without error?

Best,
Daniele

Re: Haydock in BSE - segmentation fault

Posted: Fri Nov 03, 2023 9:47 pm
by malwi
Dear Daniele,

This run is GaN (4 atoms in the cell) with SOC. I know where is the first peak, because now I make the third run with more and more dense k-mesh.
Previous calculations with less k-points went well. I got the Haydock results for this system when I had 131 k-points in IBZ.
Now it failed when I have 315 k-points in the IBZ. I have "force_symmorphic = .true."

Another run for this system without SOC went well with 627 kpoints in IBZ and failed at Haydock for 1103 k-points.

I am looking at the parallelization and try to change the cpu distribution, still having only 8 cpu and 8 gpu in total.
Maciej Czuchry suggested using "ulimit -s unlimited", but it did not help.

If you have any other idea.... thanks :-)
Gosia