BAD TERMINATION with EXIT CODE: 9

Run-time issues concerning Yambo that are not covered in the above forums.

Moderators: myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

Zhishuo Huang
Posts: 10
Joined: Tue Nov 17, 2015 9:28 am

BAD TERMINATION with EXIT CODE: 9

Post by Zhishuo Huang » Mon Sep 23, 2019 11:00 am

To whom it may concern,

I am using yambo to do wannier calculations on periodic system. However, the calculations always crashed with the error shown below:

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 26158 RUNNING AT r12i01n11
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================

It might due to not-enough memory for the calculation. The memory assigned to each core is 30Gb, but still the calculation crashed with such error. And in the report file "r_em1d_ppa_HF_and_locXC_gw0", there is no error report.

I am using yambo-4.3.3 compiled with intel compiler 2018, and yambo calculation is based on Quantum ESPRESSO database. I also attached the relevant files(as there is the limit about the file size, I deleted the irrelevant part of the report file).

I will appreciate it if anyone could give me any hints about the problem.

Best regards
Zhishuo Huang
You do not have the required permissions to view the files attached to this post.

User avatar
Daniele Varsano
Posts: 3844
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BAD TERMINATION with EXIT CODE: 9

Post by Daniele Varsano » Mon Sep 23, 2019 11:18 am

Dear Zhishuo,
the error reported is just the message coming from the queue system and it is not useful.
in order to check the memory used by the code, you can compile the code by using the option in the configure.
--enable-memory-profile
Some suggestions after having a look at your input:

Code: Select all

EXXRLvcs= 250463       RL  
you are using many g-vectors in the exchange part of the self-energy, maybe it is possible to reduce them: this will save a bit of memory without looing precision.

Code: Select all

NGsBlkXp= 1            RL    
You are using only 1 G-vector and this is for sure out-of-convergence.

Code: Select all

%QPkrange                    # [GW] QP generalized Kpoint/Band indices
1|1|78|150|
31|31|78|150|
121|121|78|150|
%
You are calculating many qp corrections in a single run (219).
You can try to divide the tasks in different runs and finally merge the databases by using the ypp utility.

Anyway, it seems the code stop before starting the calculation of the self energy, you can try to use in the input the variable:
PAR_def_mode="memory"
Yambo will adopt the most useful parallelization strategy to save memory.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Zhishuo Huang
Posts: 10
Joined: Tue Nov 17, 2015 9:28 am

Re: BAD TERMINATION with EXIT CODE: 9

Post by Zhishuo Huang » Mon Sep 23, 2019 2:04 pm

Dear Daniele,

Thank you so much for the quick reply.

I will re-compile yambo as you instructed.

However, as to the parameters "EXXRLvcs" and "NGsBlkXp", from the manual, the only information I could get is that these two parameters need to be tested for the convergence. If understand correctly, the test should be done by varying the parameters, which will take lots of time. So I wonder from your previous experiences, if there is an empirical setting of these parameters, or starting point for the convergence test.

Besides, I don't understand the parameter "NGsBlkXp" very well. From the description of this variable, this parameter refers "The dimension in Reciprocal Space of the response function". So could you please explain me more about it? How is it related to the GW calculations? And by setting such a parameter, what is calculated.

Moreover, about the dividing tasks in different runs, you mean I could calculate all the correction one-by-one, like setting in the input file:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|1|78|78|
%
and for each divided task I run yambo, and when all the tasks are done, I just type "ypp" to merge them? And if so, I should do it in the same directory or I need to do all the calculation in different directories?

Best regards
Zhishuo Huang

User avatar
Daniele Varsano
Posts: 3844
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BAD TERMINATION with EXIT CODE: 9

Post by Daniele Varsano » Mon Sep 23, 2019 2:16 pm

Dear Zhishuo,
I suggest you take a look at the tutorials in order to understand the meaning of the variables and what Yambo compute in each runlevel.
Here you can find tutorials for a step-by-step GW calculation:
http://www.yambo-code.org/wiki/index.ph ... =Tutorials

Code: Select all

So I wonder from your previous experiences, if there is an empirical setting of these parameters, or starting point for the convergence test. 
EXXRLvcs: Usually you need many G-vectors, but not all the G-vectors needed to represent the density. I suggest you carry on the convergence in EXXRLvcs by calculating exchange self-energy online for a few states.
NGsBlkXp: This is really system-dependent I suggest you start from 1Ry and the raise the value to 2Ry, 3Ry until convergence.
It governs the dimension of the screening matrix, i.e. of the screened coulomb potential W_GG'
You can have a look to the Yambo papers:
Computer Physics Communications 180 (2009) 1392–1403
and
https://iopscience.iop.org/article/10.1 ... 48X/ab15d0
you mean I could calculate all the correction one-by-one,
Not really one by one, but split the calculations in different runs.
I just type "ypp" to merge them?

Code: Select all

ypp -H
will tell you what you can do with ypp.

Code: Select all

ypp -q m
will generate an input file ypp.in to do the job
In the input file, you will specify the name and the path of the files you need to merge.
e.g.
QPDBs # [R] Quasi-particle databases
QPDB_merge # [R] Mergering
%Actions_and_names # [QPDB] Format is "what"|"OP"|"prefactor"|"DB"|. OP can be +/-/x(only for Z)
"E" |"+" |"1" |"./SAVE/ndb.QP_1" |
"E" |"+" |"1" |"./SAVE/ndb.QP_2" |
or
"E" |"+" |"1" |"./QP1/ndb.QP" |
"E" |"+" |"1" |"./QP2/ndb.QP" |
%

Best,

Daniele

~
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Zhishuo Huang
Posts: 10
Joined: Tue Nov 17, 2015 9:28 am

Re: BAD TERMINATION with EXIT CODE: 9

Post by Zhishuo Huang » Mon Sep 23, 2019 2:24 pm

Dear Daniele,

Thank you so much for the instructions. I will read the materials.

Best regards
Zhishuo Huang

Zhishuo Huang
Posts: 10
Joined: Tue Nov 17, 2015 9:28 am

Re: BAD TERMINATION with EXIT CODE: 9

Post by Zhishuo Huang » Thu Sep 26, 2019 4:30 pm

Dear Daniele,

I have compiled yambo with --enable-memory-profile, and have gone through the tutorial you showed me before. Now I try to run the convergence test calculation, of which I just do one band correction calculation. However, it stopped without any error and information about the reason why the calculation stops.

Please find the attachment for the relevant files(as there is the limit about the file size, I deleted the irrelevant part of the report file).

Best regards
Zhishuo Huang

User avatar
Daniele Varsano
Posts: 3844
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BAD TERMINATION with EXIT CODE: 9

Post by Daniele Varsano » Fri Sep 27, 2019 8:41 am

Dear Zhishuo,
there is no file attached. The memory consumption is shown in the log files that are in the LOG directory. In order to post them as an attachment, you need to change the suffix (.txt, .zip, .tar etc., see allowed suffix in the upload description.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Zhishuo Huang
Posts: 10
Joined: Tue Nov 17, 2015 9:28 am

Re: BAD TERMINATION with EXIT CODE: 9

Post by Zhishuo Huang » Fri Sep 27, 2019 10:32 am

Dear Daniele,

Sorry for that. Now I attached the file.

Best regards
Zhishuo Huang
You do not have the required permissions to view the files attached to this post.

User avatar
Daniele Varsano
Posts: 3844
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BAD TERMINATION with EXIT CODE: 9

Post by Daniele Varsano » Fri Sep 27, 2019 10:46 am

Dear Zhishuo,
it is not easy to spot the problem.
Can you try to define the parallel strategy in the input file?

Code: Select all

X_CPU= "1 1 1 4 2"                      # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v"                    # [PARALLEL] CPUs roles (q,g,k,c,v)
In this way the memory is distributed among cpus. Maybe you can also try to use more cpus, the product of the two number assigned to "c" and "v" should be
equal to the total number of Cpus.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Zhishuo Huang
Posts: 10
Joined: Tue Nov 17, 2015 9:28 am

Re: BAD TERMINATION with EXIT CODE: 9

Post by Zhishuo Huang » Mon Oct 07, 2019 9:31 pm

Dear Daniele,

As you told me, I set the parallel strategy as shown below:

X_CPU= "1 1 1 4 2" # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)

But still, the calculation stopped without any error. However, in my yambo.in file, I set the like:
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|1|80|80|
%

Please find the attachment for the relevant files(as there is the limit about the file size, I deleted the irrelevant part of the report file).

Best regards
Zhishuo Huang
You do not have the required permissions to view the files attached to this post.

Post Reply