em1s too time consuming

Stephan · Post by **Stephan** » Fri Nov 13, 2015 9:26 am

Hello,

I try to calculate the screening and I want to preceed afterwards with a BSE-calculation.
I'm working on a cluster. Since the calculation needs lots of memory I need to use a fat node on which I can not use more than 16 procs.
The time limit for such calculations on the cluster is 3 days.
After these 3 days the calculation ends without beeing completed.
Is there any possibility to interrupt the run and restart it from the interruption point?
Or can anybody tell me how to reduce the computational cost in a sensful way?

Inj the report file it is said that the Drude-behaviour of my system is not recognized.
Did I make any mistake in the input file?

Here is my input file:

#
# ::: ::: ::: :::: :::: ::::::::: ::::::::
# :+: :+: :+: :+: +:+:+: :+:+:+ :+: :+: :+: :+:
# +:+ +:+ +:+ +:+ +:+ +:+:+ +:+ +:+ +:+ +:+ +:+
# +#++: +#++:++#++: +#+ +:+ +#+ +#++:++#+ +#+ +:+
# +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+
# #+# #+# #+# #+# #+# #+# #+# #+# #+#
# ### ### ### ### ### ######### ########
#
#
# GPL Version 4.0.1 Revision 88
# OpenMPI Build
# http://www.yambo-code.org
#
em1s # [R Xs] Static Inverse Dielectric Matrix
X_all_q_CPU= "4 2 2 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXs
1 | 349 | # [Xs] Transferred momenta
%
% BndsRnXs
1 | 36 | # [Xs] Polarization function bands
%
NGsBlkXs= 300 RL # [Xs] Response block size
DrudeWXs= ( 0.80590 , 0.07439 ) eV # [Xd] Drude plasmon
% LongDrXs
1.000000 | 0.000000 | 0.000000 | # [Xs] [cc] Electric Field
%

Thanks and regards

Stephan

Post by **Daniele Varsano** » Fri Nov 13, 2015 9:32 am

Dear Stephan,
in order to understand the reason of your problem (unbalancing, memory etc..) and help you please post your complete report file and LOGS files.

Daniele

Stephan · Post by **Stephan** » Fri Nov 13, 2015 11:29 am

Dear Daniele,
thank you for your reply.
the report and log files:

martinspenke · Post by **martinspenke** » Sat Nov 14, 2015 11:40 am

Dear Stephan,

1) I strongly advise you to compile yambo with intel mpi compiler and NOT with gfortran, if not yet done. This saves you a massive amount of time.
2) And if possible try to calculate your stuff with yambo_3.4.1 which is from my point of view at least on cluster much faster than yambo_4.0.x.

Bests
M.

Post by **Daniele Varsano** » Sat Nov 14, 2015 3:33 pm

Dear Stephan,

I had a look to your log and report files, and I notice something strange: is it a restart calculations?
You had ndb.em1s* files previously calculated with different parameters?

A possibility is that you are using too much RAM memory and this slows the calculation. A workaround for that is to parallelize more over bands
than q points, e.g. you can try:

Code: Select all

 X_all_q_CPU= "1 1 4 4" # [PARALLEL] CPUs for each role

and see if things go better.

About restarting, this should be automatic.
Best,
Daniele

Stephan · Post by **Stephan** » Mon Nov 16, 2015 10:02 am

HGi Daniele,

Thank you for your reply.
I try too parallelize over bands.

When you say restarting goes automatically. Do you mean that if the run is interrupted by the rules of the cluster and if I type just again the command yambo than the calculation will restart at the point were it had be violantly interruppted?

Thanks and regards

Stephan

Post by **Daniele Varsano** » Mon Nov 16, 2015 10:26 am

Dear Stephan,

When you say restarting goes automatically. Do you mean that if the run is interrupted by the rules of the cluster and if I type just again the command yambo than the calculation will restart at the point were it had be violantly interruppted?

Yes, in line of principle. If you have different q points, it reads the already calculated database and continues from that point. If you have only gamma point I'm afraid it will not works.
Problems can arise if there are some database file corrupted, for instances if the job is crashed while writing. You can inspect in the SAVE (or dirname directory if you used -J dirname) the written files (ndb.em1s_*) in your case. They should have all the same size.
In order the restart to work, the parameter in the input should be the same of the previous run.

Best,
Daniele

Stephan · Post by **Stephan** » Mon Nov 16, 2015 5:13 pm

Thank you very much

Stephan · Post by **Stephan** » Tue Dec 01, 2015 11:41 am

Hello again,
the calculation of the screening was obviously succesfull.
Afterwards I started the BSE calculation. And after several restarts it comes to an end. But no o.eps-file was produced and in the report file (see attachment) it is said that the building of the kernel was impossible. Can anybody tell me what went wrong ?

Thanks and regards

Stephan

Post by **Daniele Varsano** » Tue Dec 01, 2015 12:06 pm

Dear Stephan,
from what I can see you did performed a previous BSE calculation as yambo read a previous database:

Code: Select all

[RD./SAVE//ndb.BS_Q1_CPU_0]

but than complains as the RL vector in exchange are not the same as the in the actual input:

Code: Select all

*ERR*    |RL vectors    [exchange]: 1633

Was the first calculation succesful?

Next it try to recalculate it using the input parameters:
First it find an inconsistency in the Time oredering of X:

Code: Select all

*ERR* X Time ordering        :c

which I do not know where it does come from, next complain as the screening database could be incomplete:
could you verify that your screening database contain the matrix for all the q points?
From the em1s report you post, it is a restart, may be in the restarting procedure something went wrong?
could you post the results of

Code: Select all

ls -ltr ./SAVE

If this is not the case we need to understand why you have a different Time ordering in the database and in the input file,
you can try to generate the input file as

Code: Select all

yambo -b -o b ...

in this way yambo should read the ndb.em1s database.
Best,
Daniele

Yambo Community Forum

em1s too time consuming

em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming

Re: em1s too time consuming