Page 1 of 1

SOC error

Posted: Sun Oct 09, 2016 2:16 pm
by yanghuang
Dear Developers and Users,
I want to perform GW+SOC, the GW calculation is correct, but the following error occurred when combined with SOC:

[yanghuang@ln1%tianhe2-C a.save]$ yhrun: error: cn15063: task 80: Killed
yhrun: First task exited 60s ago
yhrun: tasks 0,3,5,8,11,15,22-23,26-29,33,35-37,39,41,43-44,47,51-53,56,58-59,69-72,74,78,82-83,85,87,92,94: running
yhrun: tasks 1-2,4,6-7,9-10,12-14,16-21,24-25,30-32,34,38,40,42,45-46,48-50,54-55,57,60-68,73,75-77,79-81,84,86,88-91,93,95: exited abnormally
yhrun: Terminating job step 2761720.0
slurmd[cn10954]: *** STEP 2761720.0 KILLED AT 2016-10-09T20:26:58 WITH SIGNAL 9 ***
yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.
slurmd[cn10954]: *** STEP 2761720.0 KILLED AT 2016-10-09T20:26:58 WITH SIGNAL 9 ***
yhrun: error: Timed out waiting for job step to complete

I've multiplied BndsRnXp, GbndRnge by two.

Could you give me some advice please?

Thanks in advance.

Re: SOC error

Posted: Sun Oct 09, 2016 3:59 pm
by Daniele Varsano
Dear Yang Huang,
the error is due most probably for lack of memory:
[M 8.917 Gb] Alloc wf_disk ( 0.119)
as it is allocating around ~9Gb.
Such a big amount of memory is required as in your input you are asking to calculate corrections for 2000 points. (200 bands and 10 k points).
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 10| 1|200|
%
Why do you need to calculate the corrections from 1 to 200 bands?
Your last occupied bands is number 156, usually on it is interested in band structure corrections around the Fermi level, I think that you can avoid to calculate the correction for such a big number of bands. If you consider something like 10 occupied and 10 empty bands:

Code: Select all

[quote]%QPkrange                    # [GW] QP generalized Kpoint/Band indices
  1| 10|  146 |167|
%
...or similar this should solve your memory problem.
If you are really instead interested in deep energy levels and high energy states (which by the way most probably are unbound and do not make much sense), you will need to lower some of the convergence parameters.

I can see you are running in serial mode, why don't you try to run in parallel? If you switch to yambo 4.x you can control the parallelization strategy and parallelizing over bands you can reduce the memory needed per core.


Best,
Daniele

Re: SOC error

Posted: Mon Oct 10, 2016 7:50 am
by yanghuang
Daniele,
Many thanks for your detailed reply, I will try it all over again.

Best