Page 1 of 1

Out of memory issues while running IP RPA

Posted: Fri Nov 08, 2024 4:49 pm
by muhammadhasan
Hi Professor,

I am doing a dielectric function calculation using IP RPA (For Gold, 3D) . I have seen the following error message:

Code: Select all

slurmstepd-node-161: error: Detected 1 oom-kill event(s) in StepId=48909.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: node-161: task 48: Out Of Memory
slurmstepd-node-160: error: Detected 1 oom-kill event(s) in StepId=48909.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd-node-159: error: Detected 1 oom-kill event(s) in StepId=48909.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
I have attached necessary files if you can help me to solve the problem. Here is my input file (as follows). Now I am considering only one Q points to check the convergence, however, I would have to consider near 1000 points later.

Code: Select all

optics                           # [R] Linear Response optical properties
infver                           # [R] Input file variables verbosity
chi                              # [R][CHI] Dyson equation for Chi.
dipoles                          # [R] Oscillator strenghts (or dipoles)
Nelectro=  1216.00               # Electrons number
ElecTemp= 0.0388         eV    # Electronic Temperature
BoseTemp=-1.000000         eV    # Bosonic Temperature
OccTresh= 0.100000E-4            # Occupation treshold (metallic bands)
Chimod= "IP"                     # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% QpntsRXd
    1 |  1 |                       # [Xd] Transferred momenta
%
% BndsRnXd
    1 |  800 |                       # [Xd] Polarization function bands
%
% EnRngeXd
  0.00000 | 10.00000 |         eV    # [Xd] Energy range
%
% DmRngeXd
 0.100000 | 0.100000 |         eV    # [Xd] Damping range
%
ETStpsXd= 1001                    # [Xd] Total Energy steps
% LongDrXd
 1.000000 | 0.000000 | 0.000000 |        # [Xd] [cc] Electric Field
%
And finally submit job file on the cluster:

Code: Select all

#!/usr/bin/env bash
#SBATCH --job-name=Au_300K
#SBATCH --nodes=3                      # node count
#SBATCH --ntasks-per-node=24         # number of tasks per node
#SBATCH --cpus-per-task=1            # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=5gb                    # Job memory request
#SBATCH --time=60:00:00               # Time limit hrs:min:sec
#SBATCH --output=sdc.txt              # Standard output and error log
#SBATCH --partition=epyc           # MOAB/Torque called these queues

module load yambo
srun yambo -F yambo.in_IP -J Full
Thank you. Please let me know if you need some more info.

Best
Md J Hasan
PhD Student
Mechanical Engineering
University of Maine

Re: Out of memory issues while running IP RPA

Posted: Fri Nov 08, 2024 5:22 pm
by Daniele Varsano
Dear Hasan,

if you have a memory issue, you can try to set the parallelization strategy which distribute memory among cores in your input file.

DIP_CPU= "1 6 12" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
X_CPU= "1 1 6 12" # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)

If the problem persists, you can try to use less cpu per node in order to have more memory available.

Please note that in these calculations (IP) the q points are independent, you will have an IP spectrum for each q points, so this does not have to do with convergences.

Best,
Daniele

Re: Out of memory issues while running IP RPA

Posted: Fri Nov 08, 2024 5:53 pm
by muhammadhasan
Hi Professor,

Thank you so much as always.

I have increased the memory of our cluster and now it is working without any error.

Regarding convergence, I am planning to do only these three parameters (shared below). Are these sufficient parameters for convergence considering IP RPA, professor? Would you please suggest me about how can I proceed for DmRngeXd? I have seen maximum examples, they don't consider the convergence of this parameter. How do I can find best DmRngeXd?
1) FFTGvecs= 99845
2) % BndsRnXd
1 | 800 | # [Xd] Polarization function bands
%
3) % DmRngeXd
0.100000 | 0.100000 | eV # [Xd] Damping range
%

Thank you

Best
Md J Hasan
PhD Student
Mechanical Engineering
University of Maine