Fregmentation and too much RAM requirement

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Fregmentation and too much RAM requirement

Post by haseebphysics1 » Mon Sep 09, 2019 12:19 pm

Dear Developers,

I'm facing a memory issue (RAM) while doing BSE calculations even for Si. I have found some useful stuff in the forum before posting the question but could not get to the point answer to my situation.

What I have is a machine with 48 threads and 125GB total RAM while more than 10 TB of HDD available. Now after doing convergence tests for Si system I run the final calculation with converged parameters. But it stopped, probably due to insufficient RAM! I'm attaching BSE input and report file in the attachment.

Q1: What I was looking for to find the possibility to write BS matrix to HDD (as I have plenty of storage) instead of RAM or somehow reduce the RAM requirement.

Moreover, I am also not familiar with parallelization in BSE. I don't mention these lines of code in my input

Code: Select all

BS_CPU= "-- -- --"             # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t"           # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_invert= 4            # [PARALLEL] CPUs for matrix inversion
BS_nCPU_diago= 4             # [PARALLEL] CPUs for matrix diagonalization
instead, I just run via,

Code: Select all

export OMP_NUM_THREADS=4
nohup mpirun -np 20 yambo -F file name -J prefix & disown 
Q2: Is this way of submitting the job demands more RAM etc?

Now I have also added a new line (added later) to fragment the wfs to see if it helps by introducing,

Code: Select all

DBsFRAGpm= "+BS"
And finally, I want to know about damping (this is not directly relevant to this thread but then I will have to ask it again!)

Code: Select all

% BDmRange
  0.10000 |  0.10000 | eV    # [BSS] Damping range
%
Q3: If I increase the damping range will it make the width of the peaks broader or make them narrower? And can't we apply this broadening to specific peaks only to mimic the experimental curve closely?

Regards,
Haseeb Ahmad
MS - Physics
LUMS, Pakistan.
You do not have the required permissions to view the files attached to this post.
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Fregmentation and too much RAM requirement

Post by Daniele Varsano » Mon Sep 09, 2019 12:59 pm

Dear Haseeb,
Q1: What I was looking for to find the possibility to write BS matrix to HDD (as I have plenty of storage) instead of RAM or somehow reduce the RAM requirement.
A1: This will not help, but just will increase the I/O time.
What you can do is to reduce the number of mpi task. In this way, you will have more memory per core. Next, you can also try to assign in your input

Code: Select all

PAR_def_mode= "memory"
this keyword will optimize the cpu role distribution trying to minimize the memory per core.

Please note that in your run you have 20 MPI task and 4 OMP-THREADS. Having a total of 48core (if I have well understood) this is not optimal.
I would try to assign 24 MPI and 2 threads, or even 12MPI and 4 threads.
Q2: Is this way of submitting the job demands more RAM etc?
As suggested above:
export OMP_NUM_THREADS=4
nohup mpirun -np 12 yambo -F file name -J prefix & disown

Note that fragmentation here will not help either.
Q3: If I increase the damping range will it make the width of the peaks broader or make them narrower? And can't we apply this broadening to specific peaks only to mimic the experimental curve closely?
Larger damping means broader peaks. You cannot do a broadening peak specific but you can insert an initial and final damping range, where the initial value is for the low energy part of the spectrum and the final value for the higer energy part. Just make some tries.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: Fregmentation and too much RAM requirement

Post by haseebphysics1 » Mon Sep 09, 2019 4:15 pm

Thanks very much dear Daniele. This is a follow-up question of the previous post.

1:
Larger damping means broader peaks
. In

Code: Select all

% BDmRange
  0.10000 |  0.10000 | eV    # [BSS] Damping range
%
we apply the range, e.g the part of the BSE spectrum. I'm plotting the spectrum in 0-6eV so maybe I try to apply the damping in 2-3 eV region but how to adjust the damping value (absolute value), damping value in eV?

Maybe the following parameter?

Code: Select all

BDmERef	eV	#	(BSS)	Damping energy reference
2: And second questions is about memory issue! Are bigger parameters in BSE convergence like BSEBands, BSENGbNlk and BndsRnXs etc means more precise (and converged) value? As you might remember I had checked the convergence via plotting the optical spectra each time for different parameters! So, here for convergence, should I mean when results (spectra) don't change or when it mimics the experiment more closely?

I was thinking if the calculatons even for the simplest system like Si are too much time consuming and require 100GB + RAM, then for the bulkier system can I make some intelligent gauss for these parameters and avoid large calculations...


Many thanks, from,
Haseeb Ahmad,
LUMS, Pakistan.
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Fregmentation and too much RAM requirement

Post by Daniele Varsano » Mon Sep 09, 2019 6:23 pm

Dear Haseb,
we apply the range, e.g the part of the BSE spectrum. I'm plotting the spectrum in 0-6eV so maybe I try to apply the damping in 2-3 eV region
This sounds strange to me, why do you want to apply a broadening only in a region of the spectrum? Broadening is meant to mimic e.g. temperature effects, finite lifetime etc.. so I would apply it to all the spectrum. It is just a mere parameter and does not change at all the quality of your results.
Honestly I never used the BDmERef vaiable.
Are bigger parameters in BSE convergence like BSEBands, BSENGbNlk and BndsRnXs etc means more precise (and converged) value?
Sure it means more converged value.
So, here for convergence, should I mean when results (spectra) don't change or when it mimics the experiment more closely?
Converged results mean that the final outcome does not change to a certain extent that is the accuracy you want to reach. Then, they can differ from the experiments for a lot of reason (modeling of the system, approximation used ...).
I was thinking if the calculations even for the simplest system like Si are too much time consuming and require 100GB + RAM,
Sounds strange, as far as I remember few bands in BSE are needed to have a converged Si spectrum.
then for the bulkier system can I make some intelligent gauss for these parameters and avoid large calculations...
Convergences need to be checked for every system. The larger the system in terms of the number of electrons and supercell size the higher is the computational cost. Non-converged results cannot be trusted as essentially you can get any number and you do not learn much on the physics of the material.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: Fregmentation and too much RAM requirement

Post by haseebphysics1 » Tue Sep 17, 2019 12:41 am

Dear Daniele,

After learning Yambo a bit for the calculation of optical properties using BSE with Si, I'm now calculating the same BSE curve for my actual system, which contains 36 atoms in a unit cell. Now, my question is about computational resources and convergence strategy for (only BSE type) calculations.

I have 160 occupied bands and I calculated 250 bands in DFT and included all of them in static screening. To do a quick calculation I selected the following values randomly and did BSE calculation,

Code: Select all

NGsBlkXs= 10           RL 
.

And similarly I selected

Code: Select all

BSENGBlk= 10           RL  
.
And

Code: Select all

 % BSEBands
 150 | 170 |                   # [BSK] Bands range
%
This calculation took a long time, almost two days! And my purpose is to dope the system which requires supercell and hence a lot of atoms will be added in the system. How I can make these calculations feasible? It is very tedious to check the convergence by making input files manually (which I will have to!) because I check convergence by doing a full BSE spectrum even for convergence parameters for static screening. And doing such calculations again and again with increasing parameters values seems impossible for my computer (125 GB RAM and 48 cores). Therefore, how I can go for supercell and doping things?

Can you suggest anything to solve and do things in the most efficient manner, please?

Q2: Is it wise to have an equal number of valance and conduction bands in BSE spectrum calculations? (BSEBands)

and the same values for BSENGBlk and NGsBlkXs (to avoid some convergence tests!!!)?


Regards,
Haseeb Ahmad,
LUMS, Pakistan
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Fregmentation and too much RAM requirement

Post by Daniele Varsano » Tue Sep 17, 2019 8:35 am

Dear Haseeb,
It is very tedious to check the convergence by making input files manually
I know but convergences are needed in order to have meaningful results. If you are interested there are tools (yambopy) to automatize convergence:
https://yambopy.readthedocs.io/en/latest/
Here a short tutorial on this:
http://www.yambo-code.org/wiki/index.ph ... in_Yambopy
Can you suggest anything to solve and do things in the most efficient manner, please?
Yambo has several parallelization strategies, tuning the parallelization strategy can help a lot, but of course, for very large systems you need to have neough computational resources. Some info on the parallelization of the BSE can be found here:
https://iopscience.iop.org/article/10.1 ... 48X/ab15d0
Sec: 8.1--8.5
If you do not want to specify the cpu roles you can use the variable:

Code: Select all

PAR_def_mode= "balanced"       # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
If you compiled Yambo with the OpenMP support you can benefit from the mixed MPI/OpenMP parallelization.
Q2: Is it wise to have an equal number of valance and conduction bands in BSE spectrum calculations? (BSEBands)
This really depends on your system, i.e. from the character of the bands. My advice is to start with a very small number of bands and increase them slowly.
and the same values for BSENGBlk and NGsBlkXs (to avoid some convergence tests!!!)?
This is the way to do: BSENGBlk, of course, cannot exceed NGsBlkXs. You can calculate the screening with a large number of NGsBlkXs once for all and eventually reduce them in BSENGBlk
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply