Yambo Community Forum

Posted: **Wed Apr 15, 2020 5:46 pm**

Dear Yambo users,

I am using Yambo 4.4 to do the GW correction on my object, the large isolated molecule. Parallel calculation is applied with 28 core and 64G/core. But the calculation cannot be finished as the running time exceed the 48 hours and killed by our HPC system. I check the report file and it shows that the program was in the calculation of dynamic dielectric matrix before it is killed. And I also check the log file. I find that the log file doesn't update anymore from 3 hours after calculation beginning. Is that means my calculation is trapped in dynamic dielectric matrix?

It will be very appreciated if you can give me some suggestion on that problem. I attach the input file, report file and last log file. I hope they can do some help. Thank you.

Posted: **Wed Apr 15, 2020 6:02 pm**

Dear Yang Zhou,
from the single log file is hard to understand what is going on.
Can you post the final part also for the other log files in order to see if your run is highly unbalanced?

Here some recommendations, even if not related to your problem:

1. Being an isolated system maybe you want to use cutoff coulomb potential to isolate your system. Here rim_cut is activated but geometry is not specified.
2. Note that assignment for SE_CPU is inconsistent with your run.
3. In this run, you have more valence bands than conduction bands, I would distribute more on valence then conduction, this can be related with your problem if your calculation is unbalanced.
4. FFTGvecs= 10 Ry is a very low value, be aware you can lose a lot of precision.
5. Finally, I suggest you upgrade to the last version of the code (4.5)

Best,
Daniele

Posted: **Wed Apr 15, 2020 7:32 pm**

Dear Daniele,

Thanks for your advice. all of my log files is in log file.zip.

1. Being an isolated system maybe you want to use cutoff coulomb potential to isolate your system. Here rim_cut is activated but geometry is not specified.

I edit input file by adding -r flag. It gives me following parameter:

CUTGeo= "none" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws X/Y/Z/XY..
% CUTBox
0.00 | 0.00 | 0.00 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified

It is that normal the initial parameter of size of CUTBox is 0 0 0? my super cell size is 35x35x35 Angs, what size of cutoff box should be in input file?

2. Note that assignment for SE_CPU is inconsistent with your run.

Is it has to be exactly same with the total core I use? I use less core for that because I am no sure 28 cores for "b" calculation is feasible.

4. FFTGvecs= 10 Ry is a very low value, be aware you can lose a lot of precision.

Yes, you are right. The reason I use this low value is to shorten the calculation time. I will rise it when my problem is solved. Thanks for reminding.

Best,
Yang

Posted: **Thu Apr 16, 2020 8:20 am**

Dear Yang,
from the log filesI can see a bit of unbalance, but this is acceptable.
It is not clear what is going wrong, you can try to repeat your calculation setting all cores to the valence bands and see if this solve the problem.

1. CUTBox is 0 0 0 by default, you should set to 35,35,35 , but this is unrelated with your problem. Note that you need also to activate the Random integration method setting something like:
CUTGeo= "box xyz"
RandQpts= 1000000
RandGvec= 1

If your system can be contained in a sphere it is recommended to use a sphere instead of a box, just indicating
CUTGeo= "sphere xyz"
CUTRadius= 17.5

2.

Is it has to be exactly same with the total core I use?

Yes

3.

I use less core for that because I am no sure 28 cores for "b" calculation is feasible.

Yes, it is.

4. As suggested in the previous post, I suggest you upgrade to the last version of the code (4.5) and compile the code by adding the flag --enable-memory-profile in order to have information on the memory usage, in case you have some memory issue

Best,
Daniele

Posted: **Wed Apr 22, 2020 3:31 pm**

Hi Daniele,

I follow your suggestion to run the job with Yambo 4.5.1 and more core are allocated to run valence band. But unexpected error happened as:

ORTE has lost communication with a remote daemon.

HNP daemon : [[29453,0],0] on node d8s0b1
Remote daemon: [[29453,0],10] on node d10s7b4

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.

Is that means there is something wrong on processors distribution? I attach input, report and log files, and hope they will do some help. Thank you.

Best,
Yang

Posted: **Thu Apr 23, 2020 9:01 am**

Dear Yang,
it seems a problem of the MPI. Now it is hard to understand what went wrong, in any case, you are dealing with a very heavy calculation due to a large amount of vacuum and the consequent number of G-vectors. You can try to add some CPU in the "g" field.
Moreover, I can see from the report that you set up a spin-polarized calculation but here spin-up and spin-down channels are perfectly degenerate and I think that you can safely do a spin-unpolarized calculation without any loss of accuracy. This needs a new ground state calculation, but it allows for runs that are lighter by a factor 2.

Best,
Daniele

Yambo Community Forum

Dynamic Dielectric Matrix calculation is trapped

Dynamic Dielectric Matrix calculation is trapped

Re: Dynamic Dielectric Matrix calculation is trapped

Re: Dynamic Dielectric Matrix calculation is trapped

Re: Dynamic Dielectric Matrix calculation is trapped

Re: Dynamic Dielectric Matrix calculation is trapped

Re: Dynamic Dielectric Matrix calculation is trapped