Calculation stuck at the end

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
Flex
Posts: 37
Joined: Fri Mar 25, 2016 4:21 pm

Calculation stuck at the end

Post by Flex » Sat Jun 10, 2017 5:37 pm

Hello,

I am doing a mid to heavy calculation of GW corrections. About 75 qpts over 6 bands. 300 bands are used.

Input, log and output are attached. You can see the calculation in itself is finished after 18 hours, but gets stuck at the writing phase. Then, after idling, it got killed at the 24 hours limit of the machine I work on.

Is it some kind of netCDF limitation ? Is the IO really that slow ?

Note that I use quite a lot of processors.

If I restart the calculation, is it going to restart the GW part or just retry to write ?

Thanks in advance
You do not have the required permissions to view the files attached to this post.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Calculation stuck at the end

Post by Daniele Varsano » Sun Jun 11, 2017 7:44 am

Dear Thierry,
my impression is that not all the cpus finished their task and that the calculation is unbalanced.
You have 6 bands and 75 kpoints, which means 450 corrections. I suggest you change the parallelization strategy considering a factor of 450 as "qp". Maintaining the number of cpus you are using a possible strategy could be:
SE_CPU= "1 6 76"

When restarting, Yambo will read the ndb.pp database, but we recalculate the QP correction from scratch as they have not be written in databases.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Flex
Posts: 37
Joined: Fri Mar 25, 2016 4:21 pm

Re: Calculation stuck at the end

Post by Flex » Tue Jun 20, 2017 1:43 pm

Thanks a lot, it worked with these parameters.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium

Post Reply