Dear Team,
I got the error in BSE with SOC in the last step - Haydock diagonalization - the slurm is listed below.
LOG file end is also below.
It happens both with the versions with "time-profile" and "no-time-profile".
Best regards,
Malgorzata
==============================
GCCcore/11.3.0 loaded.
zlib/1.2.12 loaded.
binutils/2.38 loaded.
numactl/2.0.14 loaded.
CUDA/11.7.0 loaded.
NVHPC/22.11-CUDA-11.7.0 loaded.
XZ/5.2.5 loaded.
libxml2/2.9.13 loaded.
libpciaccess/0.16 loaded.
hwloc/2.7.1 loaded.
OpenSSL/1.1 loaded.
libevent/2.1.12 loaded.
UCX/1.12.1 loaded.
GDRCopy/2.3 loaded.
UCX-CUDA/1.12.1-CUDA-11.7.0 loaded.
libfabric/1.15.1 loaded.
PMIx/4.1.2 loaded.
UCC/1.0.0 loaded.
NCCL/2.12.12-CUDA-11.7.0 loaded.
UCC-CUDA/1.0.0-CUDA-11.7.0 loaded.
OpenMPI/4.1.4 loaded.
Yambo/5.1.1-991f327-no-time-profile loaded.
[t0024:2940403:0:2940403] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc57dc2d0)
[t0024:2940407:0:2940407] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc54bbcb0)
[t0024:2940401:0:2940401] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc61a92d0)
[t0024:2940402:0:2940402] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc63e1e60)
[t0024:2940408:0:2940408] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc46b8c10)
[t0024:2940404:0:2940404] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc4de0380)
[t0024:2940405:0:2940405] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc47aff00)
[t0024:2940406:0:2940406] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xffffffffc6e419d0)
==== backtrace (tid:2940403) ====
0 0x0000000000054df0 __GI___sigaction() :0
1 0x0000000000628bf4 sym_init_table() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_symbols.c:44
2 0x0000000000626f5b parse_init() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER.c:71
3 0x0000000000626c06 iparse_init_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/PARSER_interface.c:31
4 0x00000000006184a3 it_tools_it_reset_() /memfs/462823/Yambo/5.1.1/NVHPC-22.11-CUDA-11.7.0-991f327-no-time-profile/yambo/src/parser/mod_it_tools.f90:61
==============================
LOG fie end
<35s> P1: Loading full BSE kernel |###### | [015%] 26s(E) 02m-55s(X)
<44s> P1: Loading full BSE kernel |####### | [017%] 35s(E) 03m-23s(X)
<57s> P1: Loading full BSE kernel |######## | [020%] 48s(E) 04m-04s(X)
<01m-21s> P1: Loading full BSE kernel |######### | [022%] 01m-12s(E) 05m-22s(X)
<01m-37s> P1: Loading full BSE kernel |########## | [025%] 01m-28s(E) 05m-53s(X)
<01m-57s> P1: Loading full BSE kernel |########### | [027%] 01m-48s(E) 06m-34s(X)
<02m-28s> P1: Loading full BSE kernel |############ | [030%] 02m-19s(E) 07m-45s(X)
<03m-13s> P1: Loading full BSE kernel |############# | [032%] 03m-04s(E) 09m-28s(X)
<04m-21s> P1: Loading full BSE kernel |############## | [035%] 04m-12s(E) 12m-00s(X)
<05m-28s> P1: Loading full BSE kernel |############### | [037%] 05m-19s(E) 14m-11s(X)
<06m-06s> P1: Loading full BSE kernel |################ | [040%] 05m-57s(E) 14m-53s(X)
<06m-58s> P1: Loading full BSE kernel |################# | [042%] 06m-49s(E) 16m-04s(X)
<08m-27s> P1: Loading full BSE kernel |################## | [045%] 08m-19s(E) 18m-28s(X)
<09m-17s> P1: Loading full BSE kernel |################### | [047%] 09m-08s(E) 19m-15s(X)
<10m-24s> P1: Loading full BSE kernel |#################### | [050%] 10m-16s(E) 20m-31s(X)
<11m-25s> P1: Loading full BSE kernel |##################### | [052%] 11m-16s(E) 21m-29s(X)
<12m-37s> P1: Loading full BSE kernel |###################### | [055%] 12m-29s(E) 22m-41s(X)
<13m-39s> P1: Loading full BSE kernel |####################### | [057%] 13m-30s(E) 23m-29s(X)
<15m-23s> P1: Loading full BSE kernel |######################## | [060%] 15m-14s(E) 25m-23s(X)
<17m-32s> P1: Loading full BSE kernel |######################### | [062%] 17m-23s(E) 27m-50s(X)
<19m-53s> P1: Loading full BSE kernel |########################## | [065%] 19m-44s(E) 30m-22s(X)
<21m-50s> P1: Loading full BSE kernel |########################### | [067%] 21m-42s(E) 32m-09s(X)
<24m-31s> P1: Loading full BSE kernel |############################ | [070%] 24m-22s(E) 34m-48s(X)
<27m-27s> P1: Loading full BSE kernel |############################# | [072%] 27m-18s(E) 37m-40s(X)
<29m-29s> P1: Loading full BSE kernel |############################## | [075%] 29m-20s(E) 39m-07s(X)
<32m-11s> P1: Loading full BSE kernel |############################### | [077%] 32m-02s(E) 41m-20s(X)
<33m-23s> P1: Loading full BSE kernel |################################ | [080%] 33m-14s(E) 41m-33s(X)
<35m-25s> P1: Loading full BSE kernel |################################# | [082%] 35m-16s(E) 42m-45s(X)
<37m-08s> P1: Loading full BSE kernel |################################## | [085%] 36m-59s(E) 43m-31s(X)
<38m-49s> P1: Loading full BSE kernel |################################### | [087%] 38m-41s(E) 44m-12s(X)
<39m-59s> P1: Loading full BSE kernel |#################################### | [090%] 39m-50s(E) 44m-16s(X)
<41m-32s> P1: Loading full BSE kernel |##################################### | [092%] 41m-23s(E) 44m-45s(X)
<42m-12s> P1: Loading full BSE kernel |###################################### | [095%] 42m-03s(E) 44m-16s(X)
<42m-37s> P1: Loading full BSE kernel |####################################### | [097%] 42m-29s(E) 43m-34s(X)
<42m-45s> P1: Loading full BSE kernel |########################################| [100%] 42m-37s(E) 42m-37s(X)
<46m-39s> P1: [05.02] BSE solver(s) @q1
<46m-39s> P1: [05.03] Haydock Solver in the optics basis @q1 using the hermitian scheme
===================================================
Haydock in BSE - segmentation fault
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani
- malwi
- Posts: 36
- Joined: Mon Feb 29, 2016 1:00 pm
Haydock in BSE - segmentation fault
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
- Daniele Varsano
- Posts: 3979
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Haydock in BSE - segmentation fault
Dear Gosia,
can you attach your input and report files? You can use the attachments function below the message and add files after renaming the suffix (e.g. input.txt, report.txt).
Best,
Daniele
can you attach your input and report files? You can use the attachments function below the message and add files after renaming the suffix (e.g. input.txt, report.txt).
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- malwi
- Posts: 36
- Joined: Mon Feb 29, 2016 1:00 pm
Re: Haydock in BSE - segmentation fault
Dear Daniele,
thank you. I attach the files. It was run with 8 cpu and 8 gpu, 1 thread per cpu.
Gosia
thank you. I attach the files. It was run with 8 cpu and 8 gpu, 1 thread per cpu.
Gosia
You do not have the required permissions to view the files attached to this post.
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
- Daniele Varsano
- Posts: 3979
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Haydock in BSE - segmentation fault
Dear Gosia,
not easy to spot the problem!
The only thing I can see is that in your input file you are not including any QP correction (nor from database, nor as scissor operator).
BSE on top of KS can lead to negative excitation energy. In this case, I'm not sure the haydock solver in hermitian scheme is able to handle this. To verify if this is actually the case, can you add a QP scissor correction by hand and see if yambo runs without error?
Best,
Daniele
not easy to spot the problem!
The only thing I can see is that in your input file you are not including any QP correction (nor from database, nor as scissor operator).
BSE on top of KS can lead to negative excitation energy. In this case, I'm not sure the haydock solver in hermitian scheme is able to handle this. To verify if this is actually the case, can you add a QP scissor correction by hand and see if yambo runs without error?
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- malwi
- Posts: 36
- Joined: Mon Feb 29, 2016 1:00 pm
Re: Haydock in BSE - segmentation fault
Dear Daniele,
This run is GaN (4 atoms in the cell) with SOC. I know where is the first peak, because now I make the third run with more and more dense k-mesh.
Previous calculations with less k-points went well. I got the Haydock results for this system when I had 131 k-points in IBZ.
Now it failed when I have 315 k-points in the IBZ. I have "force_symmorphic = .true."
Another run for this system without SOC went well with 627 kpoints in IBZ and failed at Haydock for 1103 k-points.
I am looking at the parallelization and try to change the cpu distribution, still having only 8 cpu and 8 gpu in total.
Maciej Czuchry suggested using "ulimit -s unlimited", but it did not help.
If you have any other idea.... thanks
Gosia
This run is GaN (4 atoms in the cell) with SOC. I know where is the first peak, because now I make the third run with more and more dense k-mesh.
Previous calculations with less k-points went well. I got the Haydock results for this system when I had 131 k-points in IBZ.
Now it failed when I have 315 k-points in the IBZ. I have "force_symmorphic = .true."
Another run for this system without SOC went well with 627 kpoints in IBZ and failed at Haydock for 1103 k-points.
I am looking at the parallelization and try to change the cpu distribution, still having only 8 cpu and 8 gpu in total.
Maciej Czuchry suggested using "ulimit -s unlimited", but it did not help.
If you have any other idea.... thanks
Gosia
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland