stuck in caluclation of static inverse dielectric matrix
Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan
-
- Posts: 85
- Joined: Fri May 03, 2013 10:20 am
Re: stuck in caluclation of static inverse dielectric matrix
Dear Daniele
I have fixed the memory issue by setting
"X_all_q_CPU= "1 1 1 32 36" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 4 # [PARALLEL] CPUs for Linear Algebra
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles"
..."
I then processed to the BSE calculation by using "yambo -o b -k sex -y d -r -V par" and do the following settings as
"optics # [R OPT] Optics
bss # [R BSS] Bethe Salpeter Equation solver
rim_cut # [R RIM CUT] Coulomb potential
bse # [R BSE] Bethe Salpeter Equation.
bsk # [R BSK] Bethe Salpeter Equation kernel
NLogCPUs=8 # [PARALLEL] Live-timing CPU`s (0 for all)
PAR_def_mode= "memory" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
BS_CPU= "32 1 36" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
K_Threads=0 # [OPENMP/BSK] Number of threads for response functions
RandQpts= 3000000 # [RIM] Number of random q-points in the BZ
RandGvec= 123 RL # [RIM] Coulomb interaction RS components
CUTGeo= "box z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws X/Y/Z/XY..
% CUTBox
0.00000 | 0.00000 | 55.00000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
BSEmod= "resonant" # [BSE] resonant/retarded/coupling
BSKmod= "SEX" # [BSE] IP/Hartree/HF/ALDA/SEX
BSSmod= "h" # [BSS] (h)aydock/(d)iagonalization/(i)nversion/(t)ddft`
BSENGexx= 68539 RL # [BSK] Exchange components
BSENGBlk= 4 Ry # RL # [BSK] Screened interaction block size
#WehCpl # [BSK] eh interaction included also in coupling
% KfnQP_E
0.8000000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
% BEnRange
0.00000 | 8.00000 | eV # [BSS] Energy range
%
% BDmRange
0.050000 | 0.050000 | eV # [BSS] Damping range
%
BEnSteps=600 # [BSS] Energy steps
% BLongDir
1.000000 | 0.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
% BSEBands
375 | 396 | # [BSK] Bands range
%
#WRbsWF
BSHayTrs= -0.02000 # [BSS] Relative [o/o] Haydock treshold. Strict(>0)/Average(<0)
(END)
..."
However, I got the log file with the following prompts as
"<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 43.31600Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 45.47000Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 1.075000Mb) TOTAL: 46.54900Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 1.075000Mb) TOTAL: 47.62800Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 49.78200Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 51.93600Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [PARALLEL Response_T_space for (e/h) Groups on 1080 CPU] Loaded/Total (Percentual):10/10800(0%)
<04s> P0001: [MEMORY] Alloc px%element_2D( 455.6250Mb) TOTAL: 507.6920Mb (traced) 37.02000Mb (memstat)
<05s> P0001: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 1 CPU] Loaded/Total (Percentual):54005/58325400(0%)
<05s> P0001: [PARALLEL Response_T_space for CON bands on 1080 CPU] Loaded/Total (Percentual):1/715(0%)
<05s> P0001: [PARALLEL Response_T_space for VAL bands on 1 CPU] Loaded/Total (Percentual):385/385(100%)
<05s> P0001: [06.01] Transition Groups build-up
<05m-34s> P0001: [06.02] CPU-dependent Block structure
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.219194Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.220210Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.221226Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.222242Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.223258Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.224274Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.225290Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.226306Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.227322Gb (traced) 37.02000Mb (memstat)
..."
The report file got termination in the steps of [06] as
" [K]Fermi Level [ev]: 0.000000
[K]VBM / CBm [ev]: 0.000000 1.990708
[K]Electronic Temp. [ev K]: 0.00 0.00
[K]Bosonic Temp. [ev K]: 0.00 0.00
[K]Finite Temperature mode: no
[K]El. density [cm-3]: 0.108E+24
[K]States summary : Full Metallic Empty
0001-0385 0386-1100
[K]Indirect Gaps [ev]: 1.990708 2.240225
[K]Direct Gaps [ev]: 1.997653 2.240225
[QP apply] Ind. Gap Correction [ev]: 0.800000
[06] Response Functions in Transition space
===========================================
[WARNING]Allocation attempt of W%p of zero size.
[ERROR] STOP signal received while in :[06] Response Functions in Transition space
(END)"
I guess this issue is also from memory. I have tried massive attempts of varying parallization methods and also reduced the values of "BSENGexx", "BSENGBlk" and "BSEBands", however, all the attempts failed in the step of "[06] Response Functions in Transition space"
Do you have any nice suggestions to address this issue? Attached are the three input (ljbse), log and report files files. Hope this could be helping. Thanks a lot
Best
I have fixed the memory issue by setting
"X_all_q_CPU= "1 1 1 32 36" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 4 # [PARALLEL] CPUs for Linear Algebra
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles"
..."
I then processed to the BSE calculation by using "yambo -o b -k sex -y d -r -V par" and do the following settings as
"optics # [R OPT] Optics
bss # [R BSS] Bethe Salpeter Equation solver
rim_cut # [R RIM CUT] Coulomb potential
bse # [R BSE] Bethe Salpeter Equation.
bsk # [R BSK] Bethe Salpeter Equation kernel
NLogCPUs=8 # [PARALLEL] Live-timing CPU`s (0 for all)
PAR_def_mode= "memory" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
BS_CPU= "32 1 36" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads=0 # [OPENMP/X] Number of threads for response functions
DIP_Threads=0 # [OPENMP/X] Number of threads for dipoles
K_Threads=0 # [OPENMP/BSK] Number of threads for response functions
RandQpts= 3000000 # [RIM] Number of random q-points in the BZ
RandGvec= 123 RL # [RIM] Coulomb interaction RS components
CUTGeo= "box z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws X/Y/Z/XY..
% CUTBox
0.00000 | 0.00000 | 55.00000 | # [CUT] [au] Box sides
%
CUTRadius= 0.000000 # [CUT] [au] Sphere/Cylinder radius
CUTCylLen= 0.000000 # [CUT] [au] Cylinder length
CUTwsGvec= 0.700000 # [CUT] WS cutoff: number of G to be modified
BSEmod= "resonant" # [BSE] resonant/retarded/coupling
BSKmod= "SEX" # [BSE] IP/Hartree/HF/ALDA/SEX
BSSmod= "h" # [BSS] (h)aydock/(d)iagonalization/(i)nversion/(t)ddft`
BSENGexx= 68539 RL # [BSK] Exchange components
BSENGBlk= 4 Ry # RL # [BSK] Screened interaction block size
#WehCpl # [BSK] eh interaction included also in coupling
% KfnQP_E
0.8000000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
% BEnRange
0.00000 | 8.00000 | eV # [BSS] Energy range
%
% BDmRange
0.050000 | 0.050000 | eV # [BSS] Damping range
%
BEnSteps=600 # [BSS] Energy steps
% BLongDir
1.000000 | 0.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
% BSEBands
375 | 396 | # [BSK] Bands range
%
#WRbsWF
BSHayTrs= -0.02000 # [BSS] Relative [o/o] Haydock treshold. Strict(>0)/Average(<0)
(END)
..."
However, I got the log file with the following prompts as
"<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 43.31600Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 45.47000Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 1.075000Mb) TOTAL: 46.54900Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 1.075000Mb) TOTAL: 47.62800Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 49.78200Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [MEMORY] Alloc px%element_1D( 2.150000Mb) TOTAL: 51.93600Mb (traced) 37.02000Mb (memstat)
<04s> P0001: [PARALLEL Response_T_space for (e/h) Groups on 1080 CPU] Loaded/Total (Percentual):10/10800(0%)
<04s> P0001: [MEMORY] Alloc px%element_2D( 455.6250Mb) TOTAL: 507.6920Mb (traced) 37.02000Mb (memstat)
<05s> P0001: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 1 CPU] Loaded/Total (Percentual):54005/58325400(0%)
<05s> P0001: [PARALLEL Response_T_space for CON bands on 1080 CPU] Loaded/Total (Percentual):1/715(0%)
<05s> P0001: [PARALLEL Response_T_space for VAL bands on 1 CPU] Loaded/Total (Percentual):385/385(100%)
<05s> P0001: [06.01] Transition Groups build-up
<05m-34s> P0001: [06.02] CPU-dependent Block structure
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.219194Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.220210Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.221226Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.222242Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.223258Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.224274Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.225290Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.226306Gb (traced) 37.02000Mb (memstat)
<05m-35s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.016000Mb) TOTAL: 1.227322Gb (traced) 37.02000Mb (memstat)
..."
The report file got termination in the steps of [06] as
" [K]Fermi Level [ev]: 0.000000
[K]VBM / CBm [ev]: 0.000000 1.990708
[K]Electronic Temp. [ev K]: 0.00 0.00
[K]Bosonic Temp. [ev K]: 0.00 0.00
[K]Finite Temperature mode: no
[K]El. density [cm-3]: 0.108E+24
[K]States summary : Full Metallic Empty
0001-0385 0386-1100
[K]Indirect Gaps [ev]: 1.990708 2.240225
[K]Direct Gaps [ev]: 1.997653 2.240225
[QP apply] Ind. Gap Correction [ev]: 0.800000
[06] Response Functions in Transition space
===========================================
[WARNING]Allocation attempt of W%p of zero size.
[ERROR] STOP signal received while in :[06] Response Functions in Transition space
(END)"
I guess this issue is also from memory. I have tried massive attempts of varying parallization methods and also reduced the values of "BSENGexx", "BSENGBlk" and "BSEBands", however, all the attempts failed in the step of "[06] Response Functions in Transition space"
Do you have any nice suggestions to address this issue? Attached are the three input (ljbse), log and report files files. Hope this could be helping. Thanks a lot
Best
You do not have the required permissions to view the files attached to this post.
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
- Daniele Varsano
- Posts: 4231
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: stuck in caluclation of static inverse dielectric matrix
Dear Zhou Liu-Jiang,
looking at your report, you have 10k point sampling in the IBZ, and this is not compatible with the 32 assignments on k point you set in parallelization.
I do not know if this is causing problems to the run, but it is surely a waste of resources. Moreover your BSE matrix (Nc x Nv x Nbz) is rather small (dim ~2000), I would reduce
the total number of CPU for this run. By the way, if this is a 0D system, why are you using a k point grid instead of performing a gamma only calculation?
Best,
Daniele
looking at your report, you have 10k point sampling in the IBZ, and this is not compatible with the 32 assignments on k point you set in parallelization.
I do not know if this is causing problems to the run, but it is surely a waste of resources. Moreover your BSE matrix (Nc x Nv x Nbz) is rather small (dim ~2000), I would reduce
the total number of CPU for this run. By the way, if this is a 0D system, why are you using a k point grid instead of performing a gamma only calculation?
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 85
- Joined: Fri May 03, 2013 10:20 am
Re: stuck in caluclation of static inverse dielectric matrix
Hi Daniele
I have tried this kind of seting 10 assignments on k point to match the 10 k-points. But it did not give me any improvement. The system containing 115 atoms is 2D monolayer interfaced with a 0D cluster. I will do the run by using the reduced number of cpu.
Thanks a lot
I have tried this kind of seting 10 assignments on k point to match the 10 k-points. But it did not give me any improvement. The system containing 115 atoms is 2D monolayer interfaced with a 0D cluster. I will do the run by using the reduced number of cpu.
Thanks a lot
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
-
- Posts: 85
- Joined: Fri May 03, 2013 10:20 am
Re: stuck in caluclation of static inverse dielectric matrix
Dear Daniele
"BS_CPU= "10 3 3" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads=4 # [OPENMP/X] Number of threads for response functions
DIP_Threads=4 # [OPENMP/X] Number of threads for dipoles
K_Threads=4 # [OPENMP/BSK] Number of threads for response functions"
However, it still got stuck in the step of "[06.02] CPU-dependent Block structure" . I attached the input, report and log files here. Please help me to spot it. THanks a lot.
I reduced the total number of CPU and made this kind parallelization asDaniele Varsano wrote: looking at your report, you have 10k point sampling in the IBZ, and this is not compatible with the 32 assignments on k point you set in parallelization.
I do not know if this is causing problems to the run, but it is surely a waste of resources. Moreover your BSE matrix (Nc x Nv x Nbz) is rather small (dim ~2000), I would reduce
the total number of CPU for this run.
"BS_CPU= "10 3 3" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
X_Threads=4 # [OPENMP/X] Number of threads for response functions
DIP_Threads=4 # [OPENMP/X] Number of threads for dipoles
K_Threads=4 # [OPENMP/BSK] Number of threads for response functions"
However, it still got stuck in the step of "[06.02] CPU-dependent Block structure" . I attached the input, report and log files here. Please help me to spot it. THanks a lot.
You do not have the required permissions to view the files attached to this post.
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
- Daniele Varsano
- Posts: 4231
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: stuck in caluclation of static inverse dielectric matrix
Dear Dr. Zhou Liu-Jiang,
you need to reduce the number of CPU consistently, the BS_CPU now it is not consistent with the number of used CPU.
In this case, yambo ignores the setting and switch to the default. I have the impression anyway that you are running out of memory,
how much memory per core you have available?
I suggest you run the calculations using the same input but with 90 CPU instead of 360. If possible try to. use few CPU per node in order to reserve more
memory per core.
Best,
Daniele
you need to reduce the number of CPU consistently, the BS_CPU now it is not consistent with the number of used CPU.
In this case, yambo ignores the setting and switch to the default. I have the impression anyway that you are running out of memory,
how much memory per core you have available?
I suggest you run the calculations using the same input but with 90 CPU instead of 360. If possible try to. use few CPU per node in order to reserve more
memory per core.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 85
- Joined: Fri May 03, 2013 10:20 am
Re: stuck in caluclation of static inverse dielectric matrix
Dear Daniele
THanks. I will furthur reduce the # of cpu in my calculation. The reason that I made such setting is based on the equation: "number of BS_CPU (90)" times " number of Threads (4) " = the total number of used cpu (360)? Is it wrong when carrying out the MPI+openMPI strategey?
THanks. I will furthur reduce the # of cpu in my calculation. The reason that I made such setting is based on the equation: "number of BS_CPU (90)" times " number of Threads (4) " = the total number of used cpu (360)? Is it wrong when carrying out the MPI+openMPI strategey?
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
- Daniele Varsano
- Posts: 4231
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: stuck in caluclation of static inverse dielectric matrix
Dear Zhou Liu-Jiang,
the strategy is right, but this setting depends on the number of cores you have inside a node and the setting of your submission script.
An example: in a node of e.g. 16 core you can reserve the entire node, ask for 4 CPU (MPI) and set 4 threads.
In your previous run you asked for 360 MPI processes, from the report:
Best,
Daniele
the strategy is right, but this setting depends on the number of cores you have inside a node and the setting of your submission script.
An example: in a node of e.g. 16 core you can reserve the entire node, ask for 4 CPU (MPI) and set 4 threads.
In your previous run you asked for 360 MPI processes, from the report:
Code: Select all
* CPU-Threads :360(CPU)-1(threads)-4(threads@X)-4(threads@DIP)-4(threads@K)
* CPU-Threads :BS(environment)-10 3 3(CPUs)-k eh t(ROLEs)
* MPI CPU : 360
* THREADS (max): 4
* THREADS TOT(max): 1440
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 85
- Joined: Fri May 03, 2013 10:20 am
Re: stuck in caluclation of static inverse dielectric matrix
Hi Daniele
By the way, how to understand the difference of these key words regarding memory issues, namely Total, Traced, memstat? Does 'TOTAL" mean the total memory size that the one cpu should have? But I found there are many lines, showing the increased TOTAL value not a fixed one. How about further "memstat, traced"?
I have done these settings as you suggested, but the memory issue still persists. The system contains 115 atoms with up to 770 electrons. The total memory per 1 cpu is 3.56 G. No matter how large or small cpu resources I used, the calculation ended in the step of "[06.01] Transition Groups build-up" and got the prompt like "[MEMORY] Alloc BS_blk(iB)%mat( 164.4510Mb) TOTAL: 2.987147Gb (traced) 36.43600Mb (memstat)" Could you help me to build up a parallization stratege?ljzhou86 wrote:Hi Daniele
I have tried this kind of seting 10 assignments on k point to match the 10 k-points. But it did not give me any improvement. The system containing 115 atoms is 2D monolayer interfaced with a 0D cluster. I will do the run by using the reduced number of cpu.
By the way, how to understand the difference of these key words regarding memory issues, namely Total, Traced, memstat? Does 'TOTAL" mean the total memory size that the one cpu should have? But I found there are many lines, showing the increased TOTAL value not a fixed one. How about further "memstat, traced"?
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
- Daniele Varsano
- Posts: 4231
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: stuck in caluclation of static inverse dielectric matrix
Dear Zhou,
the Total, is the amount of memory allocated at that point of the calculation. As you can see it keeps increasing up it reaches the max memory you have at disposal, then it goes out-of-memory and crashes.
What you can do is:
1) Allow more memory per node. This can be done by your submission script by using less core per node. I do not know what queue system are you using, but usually, it is possible by reseving more nodes and assigning few tasks per node.
2) Try to reduce the number of bands in the BSE. Start with a small number of bands and increase to see if you arrive at convergence using less bands than you are using now.
Daniele
the Total, is the amount of memory allocated at that point of the calculation. As you can see it keeps increasing up it reaches the max memory you have at disposal, then it goes out-of-memory and crashes.
What you can do is:
1) Allow more memory per node. This can be done by your submission script by using less core per node. I do not know what queue system are you using, but usually, it is possible by reseving more nodes and assigning few tasks per node.
2) Try to reduce the number of bands in the BSE
Code: Select all
BSEBands
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 85
- Joined: Fri May 03, 2013 10:20 am
Re: stuck in caluclation of static inverse dielectric matrix
Dear Naniele
"
#!/bin/bash
##SBATCH -t 16:00:00
#SBATCH -t 0:10:00
#SBATCH -N 32
#SBATCH --partition=standard
#SBATCH --ntasks-per-node=6
#SBATCH --ntasks=192
#SBATCH -A s17_cint
module load intel/17.0.4 intel-mpi/2017.1 mkl/11.4.1 hdf5-serial/1.8.16 netcdf-serial/4.4.0
MPIRUN="mpirun -np 192"
#$MPIRUN yambo -F Inputs/05w
$MPIRUN yambo -F Inputs/ljbse -J ljbse"
The total memory per node is up to 128 G. I use less cores (6 cores) per node (36 cores) so as to get sixfold amount of available meory per node. Are these settings in submission script right?
However, I still suffered from the memory issue as reported in log file
""
...
<12s> P0001: [PARALLEL Response_T_space for (e/h) Groups on 24 CPU] Loaded/Total (Percentual):2/240(1%)
<12s> P0001: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 1 CPU] Loaded/Total (Percentual):242/28920(1%)
<12s> P0001: [PARALLEL Response_T_space for CON bands on 24 CPU] Loaded/Total (Percentual):30/715(4%)
<12s> P0001: [PARALLEL Response_T_space for VAL bands on 1 CPU] Loaded/Total (Percentual):385/385(100%)
<12s> P0001: [06.01] Transition Groups build-up
<15s> P0001: [06.02] CPU-dependent Block structure
<15s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.027819Gb) TOTAL: 1.219881Gb (traced) 36.43600Mb (memstat)
""
The TOTAL got decreased but my job still jumped out very fast. Reducing the number of bands in BSE also did not make sense to address this issue.
Do you have any other suggestions? Thanks a lot
My submission script is as followsDaniele Varsano wrote: the Total, is the amount of memory allocated at that point of the calculation. As you can see it keeps increasing up it reaches the max memory you have at disposal, then it goes out-of-memory and crashes.
What you can do is:
1) Allow more memory per node. This can be done by your submission script by using less core per node. I do not know what queue system are you using, but usually, it is possible by reseving more nodes and assigning few tasks per node.
"
#!/bin/bash
##SBATCH -t 16:00:00
#SBATCH -t 0:10:00
#SBATCH -N 32
#SBATCH --partition=standard
#SBATCH --ntasks-per-node=6
#SBATCH --ntasks=192
#SBATCH -A s17_cint
module load intel/17.0.4 intel-mpi/2017.1 mkl/11.4.1 hdf5-serial/1.8.16 netcdf-serial/4.4.0
MPIRUN="mpirun -np 192"
#$MPIRUN yambo -F Inputs/05w
$MPIRUN yambo -F Inputs/ljbse -J ljbse"
The total memory per node is up to 128 G. I use less cores (6 cores) per node (36 cores) so as to get sixfold amount of available meory per node. Are these settings in submission script right?
However, I still suffered from the memory issue as reported in log file
""
...
<12s> P0001: [PARALLEL Response_T_space for (e/h) Groups on 24 CPU] Loaded/Total (Percentual):2/240(1%)
<12s> P0001: [PARALLEL Response_T_space for (e/h)->(e/h)' Transitions (ordered) on 1 CPU] Loaded/Total (Percentual):242/28920(1%)
<12s> P0001: [PARALLEL Response_T_space for CON bands on 24 CPU] Loaded/Total (Percentual):30/715(4%)
<12s> P0001: [PARALLEL Response_T_space for VAL bands on 1 CPU] Loaded/Total (Percentual):385/385(100%)
<12s> P0001: [06.01] Transition Groups build-up
<15s> P0001: [06.02] CPU-dependent Block structure
<15s> P0001: [MEMORY] Alloc BS_blk(iB)%mat( 1.027819Gb) TOTAL: 1.219881Gb (traced) 36.43600Mb (memstat)
""
The TOTAL got decreased but my job still jumped out very fast. Reducing the number of bands in BSE also did not make sense to address this issue.
Do you have any other suggestions? Thanks a lot
Dr. Zhou Liu-Jiang
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002
Fujian Institute of Research on the Structure of Matter
Chinese Academy of Sciences
Fuzhou, Fujian, 350002