Nodes, cores and yambo parallelization
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani
-
- Posts: 37
- Joined: Fri Mar 25, 2016 4:21 pm
Nodes, cores and yambo parallelization
Hello,
I finally updated to yambo 4.0.2 and I am trying to use the new parallelization system. I'm working on 16 nodes*16cores (so, 256 cpu's ?) and I plan on using more later. The batch script is attached.
So, I read carefully the tutorials and set the para variables (for a GW calculation) to
X_all_q_CPU= "4 8 8 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
SE_CPU= "2 32 4" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
(see also input attached)
That should be powers of 2 and the products equal to 256
I still get this error :
<06s> P0001: CPU structure provided for the Response_G_space ENVIRONMENT is incomplete. Switching to defaults
P0001: [ERROR] STOP signal received while in :[05] Dynamic Dielectric Matrix (PPA)
P0001: [ERROR]Impossible to define an appropriate parallel structure
(see also log attached, the other 16 are similar)
Did I miss something in my cpu count ? Is Yambo configured the right way ?
BTW, is my processor distribution of the different tasks good enough ?
Thanks in advance
Thierry Clette
I finally updated to yambo 4.0.2 and I am trying to use the new parallelization system. I'm working on 16 nodes*16cores (so, 256 cpu's ?) and I plan on using more later. The batch script is attached.
So, I read carefully the tutorials and set the para variables (for a GW calculation) to
X_all_q_CPU= "4 8 8 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion
SE_CPU= "2 32 4" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
(see also input attached)
That should be powers of 2 and the products equal to 256
I still get this error :
<06s> P0001: CPU structure provided for the Response_G_space ENVIRONMENT is incomplete. Switching to defaults
P0001: [ERROR] STOP signal received while in :[05] Dynamic Dielectric Matrix (PPA)
P0001: [ERROR]Impossible to define an appropriate parallel structure
(see also log attached, the other 16 are similar)
Did I miss something in my cpu count ? Is Yambo configured the right way ?
BTW, is my processor distribution of the different tasks good enough ?
Thanks in advance
Thierry Clette
You do not have the required permissions to view the files attached to this post.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium
Student at Université Libre de Bruxelles, Belgium
- amolina
- Posts: 135
- Joined: Fri Jul 15, 2011 11:23 am
- Location: Valencia, Spain
- Contact:
Re: Nodes, cores and yambo parallelization
Dear Thierry,
I am not sure, but it seems the submission is not correctly done. Yambo is not seeing the 256 process. Maybe you need to add some options/flags to the mpirun as:
mpirun -np 256 yambo -F GW.in
Best,
Alejandro.
I am not sure, but it seems the submission is not correctly done. Yambo is not seeing the 256 process. Maybe you need to add some options/flags to the mpirun as:
mpirun -np 256 yambo -F GW.in
Best,
Alejandro.
Alejandro Molina-Sánchez
Institute of Materials Science (ICMUV)
University of Valencia, Spain
Institute of Materials Science (ICMUV)
University of Valencia, Spain
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Nodes, cores and yambo parallelization
Dear Thierry,
beside that:
Moreover:
are you sure you need to calculate corrections for 40 bands and all k points: these are 4840 corrections which is a huge number.
In the case the suggestion of Ale does not work, please post it again, including your report file.
Best,
Daniele
beside that:
I would avoid to parallelize on q points so I would put 1 both in X_all_q_CPU and in SE_CPU, as it can strongly unbalance your calculation.BTW, is my processor distribution of the different tasks good enough ?
Moreover:
Code: Select all
%QPkrange # [GW] QP generalized Kpoint/Band indices
1|121| 1|40|
%
In the case the suggestion of Ale does not work, please post it again, including your report file.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 37
- Joined: Fri Mar 25, 2016 4:21 pm
Re: Nodes, cores and yambo parallelization
Here is another calculation, this time a BSE (the first that started), but with the same idea, this time 4*4 cpu's and I added the tag "-np 16"
I still get the same error.
The in, r, l, and submit files are attached (r in next post)
also, thanks for the tips about roles and numbers of cpu's
I still get the same error.
The in, r, l, and submit files are attached (r in next post)
also, thanks for the tips about roles and numbers of cpu's
You do not have the required permissions to view the files attached to this post.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium
Student at Université Libre de Bruxelles, Belgium
-
- Posts: 37
- Joined: Fri Mar 25, 2016 4:21 pm
Re: Nodes, cores and yambo parallelization
report file in 2 parts
You do not have the required permissions to view the files attached to this post.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium
Student at Université Libre de Bruxelles, Belgium
- amolina
- Posts: 135
- Joined: Fri Jul 15, 2011 11:23 am
- Location: Valencia, Spain
- Contact:
Re: Nodes, cores and yambo parallelization
Dear Thierry,
in your input file I don't see the variables for parallelization of the dielectric function. You can see at the end of the report file that yambo complains when starting with the dielectric function.
If you are running 16 processes I recommend you to try this:
X_all_q_CPU= "1 16 1 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
or this.
X_all_q_CPU= "1 8 2 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
Cheers,
Alejandro.
in your input file I don't see the variables for parallelization of the dielectric function. You can see at the end of the report file that yambo complains when starting with the dielectric function.
If you are running 16 processes I recommend you to try this:
X_all_q_CPU= "1 16 1 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
or this.
X_all_q_CPU= "1 8 2 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
Cheers,
Alejandro.
Alejandro Molina-Sánchez
Institute of Materials Science (ICMUV)
University of Valencia, Spain
Institute of Materials Science (ICMUV)
University of Valencia, Spain
-
- Posts: 37
- Joined: Fri Mar 25, 2016 4:21 pm
Re: Nodes, cores and yambo parallelization
Hello all,
So far, the suggestions from amolina does seem to work, thanks a lot for that.
I still have a question about Q points
Thanks again for the answers
So far, the suggestions from amolina does seem to work, thanks a lot for that.
I still have a question about Q points
So, regarding this suggestion, I reduced the band number, but I don't know how to reduce the Q points. Since they come from the QE grid, I can't remove Q points without damaging the structure. I think I would rather have to reduce the grid in the QE calculation instead. Doesn't this reduce the scf and nscf precision ?Daniele Varsano wrote:Dear Thierry,
beside that:
I would avoid to parallelize on q points so I would put 1 both in X_all_q_CPU and in SE_CPU, as it can strongly unbalance your calculation.BTW, is my processor distribution of the different tasks good enough ?
Moreover:are you sure you need to calculate corrections for 40 bands and all k points: these are 4840 corrections which is a huge number.Code: Select all
%QPkrange # [GW] QP generalized Kpoint/Band indices 1|121| 1|40| %
In the case the suggestion of Ale does not work, please post it again, including your report file.
Best,
Daniele
Thanks again for the answers
Thierry Clette
Student at Université Libre de Bruxelles, Belgium
Student at Université Libre de Bruxelles, Belgium
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: Nodes, cores and yambo parallelization
Dear Flex,
Qpoint as you argued cannot be reduced, otherwise the BZ integration would be wrong, and as you say it also reduce precision in the ground state calculations. I was not meaning at all to reduce q points.
There I was talking about:
1) avoid parallelization of q points, this does not mean to discard any of them, but is just the tuning of the parallelization strategy.
2) Asking if you are really interested in calculating the GW corrections for all that bands: please note that QPkrange it is not a convergence parameter but indicates the bands and kpoints you want to calculate the QP corrections. Usually one it is interested in the bands around the Fermi energy, or more bands if needed for the BSE calculations, but the first deep bands usually are not of much interest. Of course what I'm saying it is not a rule and may be you are interested in that for some reason. Calculating corrections for that number of points could be quite consuming.
Best,
Daniele
Qpoint as you argued cannot be reduced, otherwise the BZ integration would be wrong, and as you say it also reduce precision in the ground state calculations. I was not meaning at all to reduce q points.
There I was talking about:
1) avoid parallelization of q points, this does not mean to discard any of them, but is just the tuning of the parallelization strategy.
2) Asking if you are really interested in calculating the GW corrections for all that bands: please note that QPkrange it is not a convergence parameter but indicates the bands and kpoints you want to calculate the QP corrections. Usually one it is interested in the bands around the Fermi energy, or more bands if needed for the BSE calculations, but the first deep bands usually are not of much interest. Of course what I'm saying it is not a rule and may be you are interested in that for some reason. Calculating corrections for that number of points could be quite consuming.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/