I am trying to calculate the GW band structure of monolayer MoS2. My goal is to achieve a 60 Ry cutoff and a 10 Ry block size. I have a 20*20 grid that gives me 77 Q points, and I have to get about 20 bands around the gap. I'm working on a cluster where I can have up to 1024 cores. Usually, 256 or 512 are enough.
But my problem here is the RAM usage. You can see in the batch script that I use at the moment 32*16*16000, which is quite a lot, and it still isn't enough. I can't really get much more from the cluster and I am trying to figure out how to reduce this massive RAM usage.
- Is there a better parallel (q,k,v,c) variable repartition ? Would it help to use less RAM ?
- Would reducing the number of cpu's help reduce the RAM needed ? (even if it slows the calculation)
- I have seen that doing less Q points at a time reduces the resources needed. So, I could divide my run in several parts, but wouldn't it take a lot more time ?
- Do I have to inevitably reduce the number of Gvecs (or cutoff) used ?
Inputs, outputs, and batch script attached
Achievable precision with limited RAM
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani
-
- Posts: 37
- Joined: Fri Mar 25, 2016 4:21 pm
Achievable precision with limited RAM
You do not have the required permissions to view the files attached to this post.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium
Student at Université Libre de Bruxelles, Belgium
- amolina
- Posts: 135
- Joined: Fri Jul 15, 2011 11:23 am
- Location: Valencia, Spain
- Contact:
Re: Achievable precision with limited RAM
Hello,
First of all. Can I ask you why you want 10 Ry block size? It seems to be a bit high and this will do the simulations very slow.
Cheers,
Alejandro.
First of all. Can I ask you why you want 10 Ry block size? It seems to be a bit high and this will do the simulations very slow.
Anyway. I think that it would help if in the parallel scheme (q.k.v.c) you put some processes on the valence and conduction bands. If you are using 128 cores, let's say you can try with the combination (1.32.1.4), (1.16.1.8.), etc, to see how it works.- Is there a better parallel (q,k,v,c) variable repartition ? Would it help to use less RAM ?
I think so but try first the option of the parallel scheme to see if it is enough.- Would reducing the number of cpu's help reduce the RAM needed ? (even if it slows the calculation)
It seems with this values the calculation is over-converged? I would proceed as follows. In a smaller k-grid, try smaller values for FFTGvecs, block size, and number of bands (in screening and self-energy). Remember that you would have also to converged in number of Q-points. Keeping this values will do your calculations very memory-demanding.- Do I have to inevitably reduce the number of Gvecs (or cutoff) used ?
Cheers,
Alejandro.
Alejandro Molina-Sánchez
Institute of Materials Science (ICMUV)
University of Valencia, Spain
Institute of Materials Science (ICMUV)
University of Valencia, Spain
- amolina
- Posts: 135
- Joined: Fri Jul 15, 2011 11:23 am
- Location: Valencia, Spain
- Contact:
Re: Achievable precision with limited RAM
By the way, I think in your affiliation you should include your full name.
Alejandro Molina-Sánchez
Institute of Materials Science (ICMUV)
University of Valencia, Spain
Institute of Materials Science (ICMUV)
University of Valencia, Spain
-
- Posts: 37
- Joined: Fri Mar 25, 2016 4:21 pm
Re: Achievable precision with limited RAM
I am trying to reproduce the work of somone else to be sure of my results and apply the same methods to other materials (Maurizia Palummo and Marco Bernardi and Jeffrey C. Grossman Exciton Radiative Lifetimes in Two-Dimensional Transition Metal Dichalcogenide)amolina wrote:Hello,
First of all. Can I ask you why you want 10 Ry block size? It seems to be a bit high and this will do the simulations very slow.
I found that block size qite big too. Perhaps about 8 Ry would be better to have about 1/10 of the FFTgvecs.
Would 1.16.2.4 be equally good ? Is parallelization better on c or v ?amolina wrote: Anyway. I think that it would help if in the parallel scheme (q.k.v.c) you put some processes on the valence and conduction bands. If you are using 128 cores, let's say you can try with the combination (1.32.1.4), (1.16.1.8.), etc, to see how it works.
Also, what about BSE variables ? They are not mentioned in the tutorial. Is it better to allocate more on k, eh, or t ?
I already tried smaller Gvecs and block sizes, but it didn't really converge well (the corrections were too little). I didn't try to reduce the grid yet, because with less than 20*20, I find the band structure to be a bit rough. But I guess I should try anyway.amolina wrote: It seems with this values the calculation is over-converged? I would proceed as follows. In a smaller k-grid, try smaller values for FFTGvecs, block size, and number of bands (in screening and self-energy). Remember that you would have also to converged in number of Q-points. Keeping this values will do your calculations very memory-demanding.
Also, sorry for asking so much questions. I am quite new to computational science.
Thanks for the very helpful answers.
Thierry Clette
Student at Université Libre de Bruxelles, Belgium
Student at Université Libre de Bruxelles, Belgium
- amolina
- Posts: 135
- Joined: Fri Jul 15, 2011 11:23 am
- Location: Valencia, Spain
- Contact:
Re: Achievable precision with limited RAM
This I don't know. I haven't check what is the best way but I am tempted to put more processors in the conduction bands.Would 1.16.2.4 be equally good ? Is parallelization better on c or v ?
Very good questions. This is a delicate point. You could start by putting all the processors in k. This part is a bit undocumented. It is good you bring the point. We will include a tutorial in yambopy including examples of parallelization options.Also, what about BSE variables ? They are not mentioned in the tutorial. Is it better to allocate more on k, eh, or t ?
Code: Select all
I already tried smaller Gvecs and block sizes, but it didn't really converge well (the corrections were too little). I didn't try to reduce the grid yet, because with less than 20*20, I find the band structure to be a bit rough. But I guess I should try anyway.
Alejandro Molina-Sánchez
Institute of Materials Science (ICMUV)
University of Valencia, Spain
Institute of Materials Science (ICMUV)
University of Valencia, Spain