Page 1 of 1

Killed by signal 11

Posted: Thu Jun 27, 2013 12:15 pm
by mojtaba
Dear all,

I want to do the BSE calculation, But I am getting the error:
"rank 5 in job 2 mojtaba_44363 caused collective abort of all ranks exit status of rank 5: killed by signal 11 ''.
I post the report file of my BSE calculation.
___________________________________________________________________________________________
#
# /$$ /$$ /$$$$$$ /$$ /$$ /$$$$$$$ /$$$$$$
# | $$ /$$//$$__ $$| $$$ /$$$| $$__ $$ /$$__ $$
# \ $$ /$$/| $$ \ $$| $$$$ /$$$$| $$ \ $$| $$ \ $$
# \ $$$$/ | $$$$$$$$| $$ $$/$$ $$| $$$$$$$ | $$ | $$
# \ $$/ | $$__ $$| $$ $$$| $$| $$__ $$| $$ | $$
# | $$ | $$ | $$| $$\ $ | $$| $$ \ $$| $$ | $$
# | $$ | $$ | $$| $$ \/ | $$| $$$$$$$/| $$$$$$/
# |__/ |__/ |__/|__/ |__/|_______/ \______/
#
# GPL Version 3.3.0 Revision 1887
# http://www.yambo-code.org
#
em1s # [R Xs] Static Inverse Dielectric Matrix
optics # [R OPT] Optics
bse # [R BSK] Bethe Salpeter Equation.
bss # [R BSS] Bethe Salpeter Equation solver
rim_cut # [R RIM CUT] Coulomb interaction
setup # [R INI] Initialization
StdoHash= 20 # [IO] Live-timing Hashes
Nelectro= 16.00000 # Electrons number
ElecTemp= 0.000000 eV # Electronic Temperature
BoseTemp=-1.000000 eV # Bosonic Temperature
OccTresh=0.1000E-4 # Occupation treshold (metallic bands)
FFTGvecs= 3000 RL # [FFT] Plane-waves
MaxGvecs= 10000 RL # [INI] Max number of G-vectors planned to use
NonPDirs= "none" # [X/BSS] Non periodic chartesian directions (X,Y,Z,XY...)
RandQpts=1000000 # [RIM] Number of random q-points in the BZ
RandGvec=1 RL # [RIM] Coulomb interaction RS components
#QpgFull # [F RIM] Coulomb interaction: Full matrix
% Em1Anys
0.00 | 0.00 | 0.00 | # [RIM] X Y Z Static Inverse dielectric matrix
%
IDEm1Ref=0 # [RIM] Dielectric matrix reference component 1(x)/2(y)/3(z)
CUTGeo= "box z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere
% CUTBox
0.00 | 0.00 | 66.00 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
#CUTCol_test # [CUT] Perform a cutoff test in R-space
KfnQPdb= "E < ./SAVE/db.QP" # [EXTQP BSK BSS] Database
KfnQP_N= 1 # [EXTQP BSK BSS] Interpolation neighbours
% KfnQP_E
0.000000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
%
% KfnQP_Wv
0.00 | 0.00 | 0.00 | # [EXTQP BSK BSS] W parameters (valence) eV|adim|eV^-1
%
% KfnQP_Wc
0.00 | 0.00 | 0.00 | # [EXTQP BSK BSS] W parameters (conduction) eV|adim|eV^-1
%
KfnQP_Z= ( 1.000000 , 0.000000 ) # [EXTQP BSK BSS] Z factor (c/v)
BSresKmod= "xc" # [BSK] Resonant Kernel mode. (`x`;`c`;`d`)
BScplKmod= "none" # [BSK] Coupling Kernel mode. (`x`;`c`;`d`)
% BSEBands
1 | 40 | # [BSK] Bands range
%
BSENGBlk= 200 RL # [BSK] Screened interaction block size
BSENGexx= 3000 RL # [BSK] Exchange components
#ALLGexx # [BSS] Force the use use all RL vectors for the exchange part
% BSEEhEny
-1.000000 |-1.000000 | eV # [BSK] Electron-hole energy range
%
XfnQPdb= "E < ./SAVE/db.QP" # [EXTQP Xd] Database
XfnQP_N= 1 # [EXTQP Xd] Interpolation neighbours
% XfnQP_E
0.000000 | 1.000000 | 1.000000 | # [EXTQP Xd] E parameters (c/v) eV|adim|adim
%
% XfnQP_Wv
0.00 | 0.00 | 0.00 | # [EXTQP Xd] W parameters (valence) eV|adim|eV^-1
%
% XfnQP_Wc
0.00 | 0.00 | 0.00 | # [EXTQP Xd] W parameters (conduction) eV|adim|eV^-1
%
XfnQP_Z= ( 1.000000 , 0.000000 ) # [EXTQP Xd] Z factor (c/v)
% QpntsRXs
1 | 73 | # [Xs] Transferred momenta
%
% BndsRnXs
1 | 200 | # [Xs] Polarization function bands
%
NGsBlkXs= 200 RL # [Xs] Response block size
CGrdSpXs= 100.0000 # [Xs] [o/o] Coarse grid controller
% EhEngyXs
-1.000000 |-1.000000 | eV # [Xs] Electron-hole energy range
%
% LongDrXs
1.000000 | 0.000000 | 0.000000 | # [Xs] [cc] Electric Field
%
DrudeWXs= ( 0.00 , 0.00 ) eV # [Xs] Drude plasmon
BoseCut= 0.10000 # [BOSE] Finite T Bose function cutoff
BSSmod= "h" # [BSS] Solvers `h/d/i/t`
% BEnRange
0.00000 | 30.00000 | eV # [BSS] Energy range
%
% BDmRange
0.010000 | 0.10000 | eV # [BSS] Damping range
%
BEnSteps= 1000 # [BSS] Energy steps
% BLongDir
1.000000 | 0.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
BSHayTrs= -0.02000 # [BSS] [o/o] Haydock treshold. Strict(>0)/Average(<0)
#BSHayTer # [BSS] Terminate Haydock continuos fraction
_______________________________________________________________________________

Re: Killed by signal 11

Posted: Thu Jun 27, 2013 12:20 pm
by Daniele Varsano
Dear Mojtaba,
please post also the report, standard output and the results of the ls ./SAVE
WIth the only input file it is very hard to understand what is going on.

Best,

Daniele

Re: Killed by signal 11

Posted: Thu Jun 27, 2013 2:14 pm
by mojtaba
Dear Daniele,
Thank you for your reply,
The results of ./SAVE is very high, about of 16G,
I post the report file.

Re: Killed by signal 11

Posted: Thu Jun 27, 2013 2:20 pm
by mojtaba
Dear Daniele,
We also post the l_setup file.

Re: Killed by signal 11

Posted: Thu Jun 27, 2013 2:50 pm
by Daniele Varsano
Dear Mojtaba,
I wa referring to the output of the ls ./SAVE in order to know which databases were produced.
Anyway it is not needed as I can see that from the report file. The problem looks to be in the
solver of the BSE, but it is not easy to identify. Most probably it could be a memory issue, you have
a very huge excitonic matrix, but I cannot say for sure. As a general tip, I suggest you to perform calculations
step by step (i.e. iniziation, GW runs, BSE runs). As you can see, you have a lot of warnings in the report files
due to some inconsistencies, even if they does not looks severe and the reason of the error.
Could you run a solver only run (just the bss runlevel in your input) and post standard output (the l_* file) and the report file together with the input?
From there may be we can see how much memory you are allocating. Then consider also if you can reduce the size of
you matrix.

Cheers,
Daniele

Re: Killed by signal 11

Posted: Fri Jun 28, 2013 6:12 am
by mojtaba
Dear Daniele,
Thank you for your reply,
I don't know the probable reason, but I have done BSE without any problems, use a runlevel of "yambo -y h -c",(just the bss runlevel).

Re: Killed by signal 11

Posted: Fri Jun 28, 2013 7:48 am
by Daniele Varsano
Dear Mojtaba,
glad to know you solved your problem. In the future, it is always better to perform the calculations step by step.
Best,
Daniele