em1s too time consuming
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan
-
- Posts: 62
- Joined: Thu Jan 15, 2015 12:48 pm
em1s too time consuming
Hello,
I try to calculate the screening and I want to preceed afterwards with a BSE-calculation.
I'm working on a cluster. Since the calculation needs lots of memory I need to use a fat node on which I can not use more than 16 procs.
The time limit for such calculations on the cluster is 3 days.
After these 3 days the calculation ends without beeing completed.
Is there any possibility to interrupt the run and restart it from the interruption point?
Or can anybody tell me how to reduce the computational cost in a sensful way?
Inj the report file it is said that the Drude-behaviour of my system is not recognized.
Did I make any mistake in the input file?
Here is my input file:
#
# ::: ::: ::: :::: :::: ::::::::: ::::::::
# :+: :+: :+: :+: +:+:+: :+:+:+ :+: :+: :+: :+:
# +:+ +:+ +:+ +:+ +:+ +:+:+ +:+ +:+ +:+ +:+ +:+
# +#++: +#++:++#++: +#+ +:+ +#+ +#++:++#+ +#+ +:+
# +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+
# #+# #+# #+# #+# #+# #+# #+# #+# #+#
# ### ### ### ### ### ######### ########
#
#
# GPL Version 4.0.1 Revision 88
# OpenMPI Build
# http://www.yambo-code.org
#
em1s # [R Xs] Static Inverse Dielectric Matrix
X_all_q_CPU= "4 2 2 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXs
1 | 349 | # [Xs] Transferred momenta
%
% BndsRnXs
1 | 36 | # [Xs] Polarization function bands
%
NGsBlkXs= 300 RL # [Xs] Response block size
DrudeWXs= ( 0.80590 , 0.07439 ) eV # [Xd] Drude plasmon
% LongDrXs
1.000000 | 0.000000 | 0.000000 | # [Xs] [cc] Electric Field
%
Thanks and regards
Stephan
I try to calculate the screening and I want to preceed afterwards with a BSE-calculation.
I'm working on a cluster. Since the calculation needs lots of memory I need to use a fat node on which I can not use more than 16 procs.
The time limit for such calculations on the cluster is 3 days.
After these 3 days the calculation ends without beeing completed.
Is there any possibility to interrupt the run and restart it from the interruption point?
Or can anybody tell me how to reduce the computational cost in a sensful way?
Inj the report file it is said that the Drude-behaviour of my system is not recognized.
Did I make any mistake in the input file?
Here is my input file:
#
# ::: ::: ::: :::: :::: ::::::::: ::::::::
# :+: :+: :+: :+: +:+:+: :+:+:+ :+: :+: :+: :+:
# +:+ +:+ +:+ +:+ +:+ +:+:+ +:+ +:+ +:+ +:+ +:+
# +#++: +#++:++#++: +#+ +:+ +#+ +#++:++#+ +#+ +:+
# +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+
# #+# #+# #+# #+# #+# #+# #+# #+# #+#
# ### ### ### ### ### ######### ########
#
#
# GPL Version 4.0.1 Revision 88
# OpenMPI Build
# http://www.yambo-code.org
#
em1s # [R Xs] Static Inverse Dielectric Matrix
X_all_q_CPU= "4 2 2 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
Chimod= "hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXs
1 | 349 | # [Xs] Transferred momenta
%
% BndsRnXs
1 | 36 | # [Xs] Polarization function bands
%
NGsBlkXs= 300 RL # [Xs] Response block size
DrudeWXs= ( 0.80590 , 0.07439 ) eV # [Xd] Drude plasmon
% LongDrXs
1.000000 | 0.000000 | 0.000000 | # [Xs] [cc] Electric Field
%
Thanks and regards
Stephan
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany
1. phyical institute
University Stuttgart
Germany
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: em1s too time consuming
Dear Stephan,
in order to understand the reason of your problem (unbalancing, memory etc..) and help you please post your complete report file and LOGS files.
Daniele
in order to understand the reason of your problem (unbalancing, memory etc..) and help you please post your complete report file and LOGS files.
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 62
- Joined: Thu Jan 15, 2015 12:48 pm
Re: em1s too time consuming
Dear Daniele,
thank you for your reply.
the report and log files:
thank you for your reply.
the report and log files:
You do not have the required permissions to view the files attached to this post.
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany
1. phyical institute
University Stuttgart
Germany
-
- Posts: 149
- Joined: Tue Apr 08, 2014 6:05 am
Re: em1s too time consuming
Dear Stephan,
1) I strongly advise you to compile yambo with intel mpi compiler and NOT with gfortran, if not yet done. This saves you a massive amount of time.
2) And if possible try to calculate your stuff with yambo_3.4.1 which is from my point of view at least on cluster much faster than yambo_4.0.x.
Bests
M.
1) I strongly advise you to compile yambo with intel mpi compiler and NOT with gfortran, if not yet done. This saves you a massive amount of time.
2) And if possible try to calculate your stuff with yambo_3.4.1 which is from my point of view at least on cluster much faster than yambo_4.0.x.
Bests
M.
Martin Spenke, PhD Student
Theoretisch-Physikalisches Institut
Universität Hamburg, Germany
Theoretisch-Physikalisches Institut
Universität Hamburg, Germany
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: em1s too time consuming
Dear Stephan,
I had a look to your log and report files, and I notice something strange: is it a restart calculations?
You had ndb.em1s* files previously calculated with different parameters?
A possibility is that you are using too much RAM memory and this slows the calculation. A workaround for that is to parallelize more over bands
than q points, e.g. you can try:
and see if things go better.
About restarting, this should be automatic.
Best,
Daniele
I had a look to your log and report files, and I notice something strange: is it a restart calculations?
You had ndb.em1s* files previously calculated with different parameters?
A possibility is that you are using too much RAM memory and this slows the calculation. A workaround for that is to parallelize more over bands
than q points, e.g. you can try:
Code: Select all
X_all_q_CPU= "1 1 4 4" # [PARALLEL] CPUs for each role
About restarting, this should be automatic.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 62
- Joined: Thu Jan 15, 2015 12:48 pm
Re: em1s too time consuming
HGi Daniele,
Thank you for your reply.
I try too parallelize over bands.
When you say restarting goes automatically. Do you mean that if the run is interrupted by the rules of the cluster and if I type just again the command yambo than the calculation will restart at the point were it had be violantly interruppted?
Thanks and regards
Stephan
Thank you for your reply.
I try too parallelize over bands.
When you say restarting goes automatically. Do you mean that if the run is interrupted by the rules of the cluster and if I type just again the command yambo than the calculation will restart at the point were it had be violantly interruppted?
Thanks and regards
Stephan
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany
1. phyical institute
University Stuttgart
Germany
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: em1s too time consuming
Dear Stephan,
Problems can arise if there are some database file corrupted, for instances if the job is crashed while writing. You can inspect in the SAVE (or dirname directory if you used -J dirname) the written files (ndb.em1s_*) in your case. They should have all the same size.
In order the restart to work, the parameter in the input should be the same of the previous run.
Best,
Daniele
Yes, in line of principle. If you have different q points, it reads the already calculated database and continues from that point. If you have only gamma point I'm afraid it will not works.When you say restarting goes automatically. Do you mean that if the run is interrupted by the rules of the cluster and if I type just again the command yambo than the calculation will restart at the point were it had be violantly interruppted?
Problems can arise if there are some database file corrupted, for instances if the job is crashed while writing. You can inspect in the SAVE (or dirname directory if you used -J dirname) the written files (ndb.em1s_*) in your case. They should have all the same size.
In order the restart to work, the parameter in the input should be the same of the previous run.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 62
- Joined: Thu Jan 15, 2015 12:48 pm
Re: em1s too time consuming
Thank you very much
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany
1. phyical institute
University Stuttgart
Germany
-
- Posts: 62
- Joined: Thu Jan 15, 2015 12:48 pm
Re: em1s too time consuming
Hello again,
the calculation of the screening was obviously succesfull.
Afterwards I started the BSE calculation. And after several restarts it comes to an end. But no o.eps-file was produced and in the report file (see attachment) it is said that the building of the kernel was impossible. Can anybody tell me what went wrong ?
Thanks and regards
Stephan
the calculation of the screening was obviously succesfull.
Afterwards I started the BSE calculation. And after several restarts it comes to an end. But no o.eps-file was produced and in the report file (see attachment) it is said that the building of the kernel was impossible. Can anybody tell me what went wrong ?
Thanks and regards
Stephan
You do not have the required permissions to view the files attached to this post.
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany
1. phyical institute
University Stuttgart
Germany
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: em1s too time consuming
Dear Stephan,
from what I can see you did performed a previous BSE calculation as yambo read a previous database:
but than complains as the RL vector in exchange are not the same as the in the actual input:
Was the first calculation succesful?
Next it try to recalculate it using the input parameters:
First it find an inconsistency in the Time oredering of X:
which I do not know where it does come from, next complain as the screening database could be incomplete:
could you verify that your screening database contain the matrix for all the q points?
From the em1s report you post, it is a restart, may be in the restarting procedure something went wrong?
could you post the results of
If this is not the case we need to understand why you have a different Time ordering in the database and in the input file,
you can try to generate the input file as
in this way yambo should read the ndb.em1s database.
Best,
Daniele
from what I can see you did performed a previous BSE calculation as yambo read a previous database:
Code: Select all
[RD./SAVE//ndb.BS_Q1_CPU_0]
Code: Select all
*ERR* |RL vectors [exchange]: 1633
Next it try to recalculate it using the input parameters:
First it find an inconsistency in the Time oredering of X:
Code: Select all
*ERR* X Time ordering :c
could you verify that your screening database contain the matrix for all the q points?
From the em1s report you post, it is a restart, may be in the restarting procedure something went wrong?
could you post the results of
Code: Select all
ls -ltr ./SAVE
you can try to generate the input file as
Code: Select all
yambo -b -o b ...
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/