BSE calculation exited without explicit error messages

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
yunhailiseu
Posts: 21
Joined: Fri Nov 29, 2013 1:30 pm

BSE calculation exited without explicit error messages

Post by yunhailiseu » Sat Nov 22, 2014 3:31 am

Dear Yambo team,

I ran into a tough problem when calculating dopped MgO2. Yambo kept exiting without any explicit error messages. The report file and log file just ended abruptly, and there was no content in the file redirected from stderr. The LSF job scheduling system generated some output files, but they just reported that the job had exited and no further information was provided.

The dopped MgO2 was constructed with a 2x2x2 supercell, with 3 Mg atoms substitued with Zn. The kinetic energy cuttoff for wavefunction was 60Ry and the k-points mesh was 6x6x6. I had been thinking that the insufficient RAM caused the programm to exit. However, after doubling the available CPU cores (from 128 to 256) the problem still existed. Adding the -M option to the command line did not work neither.

In the attachment there is the input, output files of a faild run and the configuration files. All suggestions are appeicated.

Best,
Yunhaili
You do not have the required permissions to view the files attached to this post.
Yunhai Li
Department of Physics, Southeast University
Nanjing, Jiangsu, PRC

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation exited without explicit error messages

Post by Daniele Varsano » Sat Nov 22, 2014 2:53 pm

Dear Yunahili,
indeed it looks a memory allocation problem.
Possible solutions:
1) check how GB RAM are available per cpu:
from the standard output yambo try to allocate:

Code: Select all

 <01m-07s> [M  8.993 Gb] Alloc WF ( 8.739)
almost 9GB and probably this is too much for your machine. You can try to reserve more memory per node.
As an example:
In the case you have something like 4Gb per cpu, and 16 cpu per nodes. You can reserve all the 16 cpu and run the calculations using 4 cpu, in this way you can have 16GB per core and it should fit.

2) Other solution, try to play with your calculation parameters, lowering some values, but trying to not loose accuracy.
Some parameters that can be changed affecting the memory:
a) Lower the FFTGvecs. Set in your input FFTGvecs=12294 RL (as the WF), now you are using the one of the charge, more than 80000. This should make you save memory without loosing accuracy,
b) yambo stops whie starting to build the static screening: you can try to reduce the BndsRnXs and NGsBlkXs size.

Hope that some of these solutions will work.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

yunhailiseu
Posts: 21
Joined: Fri Nov 29, 2013 1:30 pm

Re: BSE calculation exited without explicit error messages

Post by yunhailiseu » Mon Nov 24, 2014 1:29 pm

Daniele Varsano wrote:Dear Yunahili,
indeed it looks a memory allocation problem.
Possible solutions:
1) check how GB RAM are available per cpu:
from the standard output yambo try to allocate:

Code: Select all

 <01m-07s> [M  8.993 Gb] Alloc WF ( 8.739)
almost 9GB and probably this is too much for your machine. You can try to reserve more memory per node.
As an example:
In the case you have something like 4Gb per cpu, and 16 cpu per nodes. You can reserve all the 16 cpu and run the calculations using 4 cpu, in this way you can have 16GB per core and it should fit.

2) Other solution, try to play with your calculation parameters, lowering some values, but trying to not loose accuracy.
Some parameters that can be changed affecting the memory:
a) Lower the FFTGvecs. Set in your input FFTGvecs=12294 RL (as the WF), now you are using the one of the charge, more than 80000. This should make you save memory without loosing accuracy,
b) yambo stops whie starting to build the static screening: you can try to reduce the BndsRnXs and NGsBlkXs size.

Hope that some of these solutions will work.
Best,
Daniele
Dear Daniele,

Thank you for the reply.

I contacted the administrator of the supercomputer center and was confirmed that the crashes had been caused by the insufficient memory. Also, it is true that each node has 16 cores and 64GB RAM. The administrator gives the same sugguestion as you, that is, occupying the whole node but running 4 tasks per node. I will try that.

By the way, what is the mechanism of the -M option? For GW calculations it works well, but for BSE calculations the program often crashes when diagonalizing the BSE kernel matrix. The error message reads something like "error when calling MPI_ALL_REDUCE, XXXX bytes expected, by XXX bytes received".

Best,
Yunhai Li
Yunhai Li
Department of Physics, Southeast University
Nanjing, Jiangsu, PRC

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation exited without explicit error messages

Post by Daniele Varsano » Mon Nov 24, 2014 1:47 pm

Dear Yunhai,
the memory distribution option, as you observed, it should work for the GW part of the code, while it is not really effective for the BSE part.
Anyway, in the next release of the code, Yambo will have a totally new parallelization strategy aimed to scale (also with memory) up to thousands cpu.
The next release hopefully will appear in Springtime.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply