Haydock in BSE - segmentation fault
Posted: Fri Oct 27, 2023 10:00 am
Dear Team,
I got the error in BSE with SOC in the last step - Haydock diagonalization - the slurm is listed below.
LOG file end is also below.
It happens both with the versions with "time-profile" and "no-time-profile".
Best regards,
Malgorzata
==============================
GCCcore/11.3.0 loaded.
zlib/1.2.12 loaded.
binutils/2.38 loaded.
numactl/2.0.14 loaded.
CUDA/11.7.0 loaded.
NVHPC/22.11-CUDA-11.7.0 loaded.
XZ/5.2.5 loaded.
libxml2/2.9.13 loaded.
libpciaccess/0.16 loaded.
hwloc/2.7.1 loaded.
OpenSSL/1.1 loaded.
libevent/2.1.12 loaded.
UCX/1.12.1 loaded.
GDRCopy/2.3 loaded.
UCX-CUDA/1.12.1-CUDA-11.7.0 loaded.
libfabric/1.15.1 loaded.
PMIx/4.1.2 loaded.
UCC/1.0.0 loaded.
NCCL/2.12.12-CUDA-11.7.0 loaded.
UCC-CUDA/1.0.0-CUDA-11.7.0 loaded.
OpenMPI/4.1.4 loaded.
Yambo/5.1.1-991f327-no-time-profile loaded.
[t0024:2940403:0:2940403] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc57dc2d0)
[t0024:2940407:0:2940407] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc54bbcb0)
[t0024:2940401:0:2940401] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc61a92d0)
[t0024:2940402:0:2940402] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc63e1e60)
[t0024:2940408:0:2940408] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc46b8c10)
[t0024:2940404:0:2940404] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc4de0380)
[t0024:2940405:0:2940405] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc47aff00)
[t0024:2940406:0:2940406] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc6e419d0)
==== backtrace (tid:2940403) ====
0 0x0000000000054df0 __GI___sigaction() :0
1 0x0000000000628bf4 sym_init_table() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_symbols.c:44
2 0x0000000000626f5b parse_init() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER.c:71
3 0x0000000000626c06 iparse_init_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_interface.c:31
4 0x00000000006184a3 it_tools_it_reset_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/mod_it_tools.f90:61
==============================
LOG fie end
<35s> P1: Loading full BSE kernel |###### | [015%] 26s(E) 02m-55s(X)
<44s> P1: Loading full BSE kernel |####### | [017%] 35s(E) 03m-23s(X)
<57s> P1: Loading full BSE kernel |######## | [020%] 48s(E) 04m-04s(X)
<01m-21s> P1: Loading full BSE kernel |######### | [022%] 01m-12s(E) 05m-22s(X)
<01m-37s> P1: Loading full BSE kernel |########## | [025%] 01m-28s(E) 05m-53s(X)
<01m-57s> P1: Loading full BSE kernel |########### | [027%] 01m-48s(E) 06m-34s(X)
<02m-28s> P1: Loading full BSE kernel |############ | [030%] 02m-19s(E) 07m-45s(X)
<03m-13s> P1: Loading full BSE kernel |############# | [032%] 03m-04s(E) 09m-28s(X)
<04m-21s> P1: Loading full BSE kernel |############## | [035%] 04m-12s(E) 12m-00s(X)
<05m-28s> P1: Loading full BSE kernel |############### | [037%] 05m-19s(E) 14m-11s(X)
<06m-06s> P1: Loading full BSE kernel |################ | [040%] 05m-57s(E) 14m-53s(X)
<06m-58s> P1: Loading full BSE kernel |################# | [042%] 06m-49s(E) 16m-04s(X)
<08m-27s> P1: Loading full BSE kernel |################## | [045%] 08m-19s(E) 18m-28s(X)
<09m-17s> P1: Loading full BSE kernel |################### | [047%] 09m-08s(E) 19m-15s(X)
<10m-24s> P1: Loading full BSE kernel |#################### | [050%] 10m-16s(E) 20m-31s(X)
<11m-25s> P1: Loading full BSE kernel |##################### | [052%] 11m-16s(E) 21m-29s(X)
<12m-37s> P1: Loading full BSE kernel |###################### | [055%] 12m-29s(E) 22m-41s(X)
<13m-39s> P1: Loading full BSE kernel |####################### | [057%] 13m-30s(E) 23m-29s(X)
<15m-23s> P1: Loading full BSE kernel |######################## | [060%] 15m-14s(E) 25m-23s(X)
<17m-32s> P1: Loading full BSE kernel |######################### | [062%] 17m-23s(E) 27m-50s(X)
<19m-53s> P1: Loading full BSE kernel |########################## | [065%] 19m-44s(E) 30m-22s(X)
<21m-50s> P1: Loading full BSE kernel |########################### | [067%] 21m-42s(E) 32m-09s(X)
<24m-31s> P1: Loading full BSE kernel |############################ | [070%] 24m-22s(E) 34m-48s(X)
<27m-27s> P1: Loading full BSE kernel |############################# | [072%] 27m-18s(E) 37m-40s(X)
<29m-29s> P1: Loading full BSE kernel |############################## | [075%] 29m-20s(E) 39m-07s(X)
<32m-11s> P1: Loading full BSE kernel |############################### | [077%] 32m-02s(E) 41m-20s(X)
<33m-23s> P1: Loading full BSE kernel |################################ | [080%] 33m-14s(E) 41m-33s(X)
<35m-25s> P1: Loading full BSE kernel |################################# | [082%] 35m-16s(E) 42m-45s(X)
<37m-08s> P1: Loading full BSE kernel |################################## | [085%] 36m-59s(E) 43m-31s(X)
<38m-49s> P1: Loading full BSE kernel |################################### | [087%] 38m-41s(E) 44m-12s(X)
<39m-59s> P1: Loading full BSE kernel |#################################### | [090%] 39m-50s(E) 44m-16s(X)
<41m-32s> P1: Loading full BSE kernel |##################################### | [092%] 41m-23s(E) 44m-45s(X)
<42m-12s> P1: Loading full BSE kernel |###################################### | [095%] 42m-03s(E) 44m-16s(X)
<42m-37s> P1: Loading full BSE kernel |####################################### | [097%] 42m-29s(E) 43m-34s(X)
<42m-45s> P1: Loading full BSE kernel |########################################| [100%] 42m-37s(E) 42m-37s(X)
<46m-39s> P1: [05.02] BSE solver(s) @q1
<46m-39s> P1: [05.03] Haydock Solver in the optics basis @q1 using the hermitian scheme
===================================================
I got the error in BSE with SOC in the last step - Haydock diagonalization - the slurm is listed below.
LOG file end is also below.
It happens both with the versions with "time-profile" and "no-time-profile".
Best regards,
Malgorzata
==============================
GCCcore/11.3.0 loaded.
zlib/1.2.12 loaded.
binutils/2.38 loaded.
numactl/2.0.14 loaded.
CUDA/11.7.0 loaded.
NVHPC/22.11-CUDA-11.7.0 loaded.
XZ/5.2.5 loaded.
libxml2/2.9.13 loaded.
libpciaccess/0.16 loaded.
hwloc/2.7.1 loaded.
OpenSSL/1.1 loaded.
libevent/2.1.12 loaded.
UCX/1.12.1 loaded.
GDRCopy/2.3 loaded.
UCX-CUDA/1.12.1-CUDA-11.7.0 loaded.
libfabric/1.15.1 loaded.
PMIx/4.1.2 loaded.
UCC/1.0.0 loaded.
NCCL/2.12.12-CUDA-11.7.0 loaded.
UCC-CUDA/1.0.0-CUDA-11.7.0 loaded.
OpenMPI/4.1.4 loaded.
Yambo/5.1.1-991f327-no-time-profile loaded.
[t0024:2940403:0:2940403] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc57dc2d0)
[t0024:2940407:0:2940407] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc54bbcb0)
[t0024:2940401:0:2940401] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc61a92d0)
[t0024:2940402:0:2940402] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc63e1e60)
[t0024:2940408:0:2940408] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc46b8c10)
[t0024:2940404:0:2940404] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc4de0380)
[t0024:2940405:0:2940405] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc47aff00)
[t0024:2940406:0:2940406] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc6e419d0)
==== backtrace (tid:2940403) ====
0 0x0000000000054df0 __GI___sigaction() :0
1 0x0000000000628bf4 sym_init_table() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_symbols.c:44
2 0x0000000000626f5b parse_init() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER.c:71
3 0x0000000000626c06 iparse_init_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_interface.c:31
4 0x00000000006184a3 it_tools_it_reset_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/mod_it_tools.f90:61
==============================
LOG fie end
<35s> P1: Loading full BSE kernel |###### | [015%] 26s(E) 02m-55s(X)
<44s> P1: Loading full BSE kernel |####### | [017%] 35s(E) 03m-23s(X)
<57s> P1: Loading full BSE kernel |######## | [020%] 48s(E) 04m-04s(X)
<01m-21s> P1: Loading full BSE kernel |######### | [022%] 01m-12s(E) 05m-22s(X)
<01m-37s> P1: Loading full BSE kernel |########## | [025%] 01m-28s(E) 05m-53s(X)
<01m-57s> P1: Loading full BSE kernel |########### | [027%] 01m-48s(E) 06m-34s(X)
<02m-28s> P1: Loading full BSE kernel |############ | [030%] 02m-19s(E) 07m-45s(X)
<03m-13s> P1: Loading full BSE kernel |############# | [032%] 03m-04s(E) 09m-28s(X)
<04m-21s> P1: Loading full BSE kernel |############## | [035%] 04m-12s(E) 12m-00s(X)
<05m-28s> P1: Loading full BSE kernel |############### | [037%] 05m-19s(E) 14m-11s(X)
<06m-06s> P1: Loading full BSE kernel |################ | [040%] 05m-57s(E) 14m-53s(X)
<06m-58s> P1: Loading full BSE kernel |################# | [042%] 06m-49s(E) 16m-04s(X)
<08m-27s> P1: Loading full BSE kernel |################## | [045%] 08m-19s(E) 18m-28s(X)
<09m-17s> P1: Loading full BSE kernel |################### | [047%] 09m-08s(E) 19m-15s(X)
<10m-24s> P1: Loading full BSE kernel |#################### | [050%] 10m-16s(E) 20m-31s(X)
<11m-25s> P1: Loading full BSE kernel |##################### | [052%] 11m-16s(E) 21m-29s(X)
<12m-37s> P1: Loading full BSE kernel |###################### | [055%] 12m-29s(E) 22m-41s(X)
<13m-39s> P1: Loading full BSE kernel |####################### | [057%] 13m-30s(E) 23m-29s(X)
<15m-23s> P1: Loading full BSE kernel |######################## | [060%] 15m-14s(E) 25m-23s(X)
<17m-32s> P1: Loading full BSE kernel |######################### | [062%] 17m-23s(E) 27m-50s(X)
<19m-53s> P1: Loading full BSE kernel |########################## | [065%] 19m-44s(E) 30m-22s(X)
<21m-50s> P1: Loading full BSE kernel |########################### | [067%] 21m-42s(E) 32m-09s(X)
<24m-31s> P1: Loading full BSE kernel |############################ | [070%] 24m-22s(E) 34m-48s(X)
<27m-27s> P1: Loading full BSE kernel |############################# | [072%] 27m-18s(E) 37m-40s(X)
<29m-29s> P1: Loading full BSE kernel |############################## | [075%] 29m-20s(E) 39m-07s(X)
<32m-11s> P1: Loading full BSE kernel |############################### | [077%] 32m-02s(E) 41m-20s(X)
<33m-23s> P1: Loading full BSE kernel |################################ | [080%] 33m-14s(E) 41m-33s(X)
<35m-25s> P1: Loading full BSE kernel |################################# | [082%] 35m-16s(E) 42m-45s(X)
<37m-08s> P1: Loading full BSE kernel |################################## | [085%] 36m-59s(E) 43m-31s(X)
<38m-49s> P1: Loading full BSE kernel |################################### | [087%] 38m-41s(E) 44m-12s(X)
<39m-59s> P1: Loading full BSE kernel |#################################### | [090%] 39m-50s(E) 44m-16s(X)
<41m-32s> P1: Loading full BSE kernel |##################################### | [092%] 41m-23s(E) 44m-45s(X)
<42m-12s> P1: Loading full BSE kernel |###################################### | [095%] 42m-03s(E) 44m-16s(X)
<42m-37s> P1: Loading full BSE kernel |####################################### | [097%] 42m-29s(E) 43m-34s(X)
<42m-45s> P1: Loading full BSE kernel |########################################| [100%] 42m-37s(E) 42m-37s(X)
<46m-39s> P1: [05.02] BSE solver(s) @q1
<46m-39s> P1: [05.03] Haydock Solver in the optics basis @q1 using the hermitian scheme
===================================================