Achievable precision with limited RAM

Flex · Post by **Flex** » Tue Jul 26, 2016 10:58 am

I am trying to calculate the GW band structure of monolayer MoS2. My goal is to achieve a 60 Ry cutoff and a 10 Ry block size. I have a 20*20 grid that gives me 77 Q points, and I have to get about 20 bands around the gap. I'm working on a cluster where I can have up to 1024 cores. Usually, 256 or 512 are enough.
But my problem here is the RAM usage. You can see in the batch script that I use at the moment 32*16*16000, which is quite a lot, and it still isn't enough. I can't really get much more from the cluster and I am trying to figure out how to reduce this massive RAM usage.

- Is there a better parallel (q,k,v,c) variable repartition ? Would it help to use less RAM ?

- Would reducing the number of cpu's help reduce the RAM needed ? (even if it slows the calculation)

- I have seen that doing less Q points at a time reduces the resources needed. So, I could divide my run in several parts, but wouldn't it take a lot more time ?

- Do I have to inevitably reduce the number of Gvecs (or cutoff) used ?

Inputs, outputs, and batch script attached

Post by **amolina** » Wed Jul 27, 2016 8:46 am

Hello,

First of all. Can I ask you why you want 10 Ry block size? It seems to be a bit high and this will do the simulations very slow.

- Is there a better parallel (q,k,v,c) variable repartition ? Would it help to use less RAM ?

Anyway. I think that it would help if in the parallel scheme (q.k.v.c) you put some processes on the valence and conduction bands. If you are using 128 cores, let's say you can try with the combination (1.32.1.4), (1.16.1.8.), etc, to see how it works.

- Would reducing the number of cpu's help reduce the RAM needed ? (even if it slows the calculation)

I think so but try first the option of the parallel scheme to see if it is enough.

- Do I have to inevitably reduce the number of Gvecs (or cutoff) used ?

It seems with this values the calculation is over-converged? I would proceed as follows. In a smaller k-grid, try smaller values for FFTGvecs, block size, and number of bands (in screening and self-energy). Remember that you would have also to converged in number of Q-points. Keeping this values will do your calculations very memory-demanding.

Cheers,

Alejandro.

Post by **amolina** » Wed Jul 27, 2016 8:46 am

By the way, I think in your affiliation you should include your full name.

Flex · Post by **Flex** » Wed Jul 27, 2016 11:46 am

amolina wrote:Hello,
First of all. Can I ask you why you want 10 Ry block size? It seems to be a bit high and this will do the simulations very slow.

I am trying to reproduce the work of somone else to be sure of my results and apply the same methods to other materials (Maurizia Palummo and Marco Bernardi and Jeffrey C. Grossman Exciton Radiative Lifetimes in Two-Dimensional Transition Metal Dichalcogenide)

I found that block size qite big too. Perhaps about 8 Ry would be better to have about 1/10 of the FFTgvecs.

amolina wrote: Anyway. I think that it would help if in the parallel scheme (q.k.v.c) you put some processes on the valence and conduction bands. If you are using 128 cores, let's say you can try with the combination (1.32.1.4), (1.16.1.8.), etc, to see how it works.

Would 1.16.2.4 be equally good ? Is parallelization better on c or v ?

Also, what about BSE variables ? They are not mentioned in the tutorial. Is it better to allocate more on k, eh, or t ?

amolina wrote: It seems with this values the calculation is over-converged? I would proceed as follows. In a smaller k-grid, try smaller values for FFTGvecs, block size, and number of bands (in screening and self-energy). Remember that you would have also to converged in number of Q-points. Keeping this values will do your calculations very memory-demanding.

I already tried smaller Gvecs and block sizes, but it didn't really converge well (the corrections were too little). I didn't try to reduce the grid yet, because with less than 20*20, I find the band structure to be a bit rough. But I guess I should try anyway.

Also, sorry for asking so much questions. I am quite new to computational science.
Thanks for the very helpful answers.

Post by **amolina** » Wed Jul 27, 2016 12:40 pm

Would 1.16.2.4 be equally good ? Is parallelization better on c or v ?

This I don't know. I haven't check what is the best way but I am tempted to put more processors in the conduction bands.

Also, what about BSE variables ? They are not mentioned in the tutorial. Is it better to allocate more on k, eh, or t ?

Very good questions. This is a delicate point. You could start by putting all the processors in k. This part is a bit undocumented. It is good you bring the point. We will include a tutorial in yambopy including examples of parallelization options.

Code: Select all

I already tried smaller Gvecs and block sizes, but it didn't really converge well (the corrections were too little). I didn't try to reduce the grid yet, because with less than 20*20, I find the band structure to be a bit rough. But I guess I should try anyway.

My advice is to converge everything except the k-grid. You can start with a 6x6 or 9x9 to see how the corrections depends on cutoff and on the block size, and afterwards increase the k-grid.

Yambo Community Forum

Achievable precision with limited RAM

Achievable precision with limited RAM

Re: Achievable precision with limited RAM

Re: Achievable precision with limited RAM

Re: Achievable precision with limited RAM

Re: Achievable precision with limited RAM