Page 2 of 5
Re: BSE diagonalization solver error
Posted: Fri Oct 04, 2019 1:44 pm
by sdwang
Dear Daniele,
I am running with the version compiled using --enable-memory-profile, but ther is nothing memory information in the log file as attached. In my previous calculation the memory information is there. Does this mean tehe meomory can not be distributed?
I rerun as your suggestion remove the scalapack distribution and lower the cores to 12, but it still do not work.
Thanks!
Shudong
Re: BSE diagonalization solver error
Posted: Fri Oct 04, 2019 1:48 pm
by Daniele Varsano
Hi Shudong,
it seems to me strange you do not have information on memory usage. Try to type ./configure -h and see the exact keyword in order to compile with memory monitoring.
Another advice is to move all the parallelization on the k points:
Code: Select all
BS_CPU= "N 1 1" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t"
Best,
Daniele
Re: BSE diagonalization solver error
Posted: Sat Oct 05, 2019 4:38 am
by will
Hi Shudong,
Finally, I solved the problem using the serial run. And the memory used in the serial run is rather smaller than the memory per core in the parallel run. Maybe the memories are not correctlly distributed in the parallel run.
Best,
Xiaowei
Re: BSE diagonalization solver error
Posted: Sat Oct 05, 2019 7:05 am
by Daniele Varsano
Dear Shudong,
I have also noticed you compiled Yambo in double precision, do you really need that?
Having the BSE matrix in double-precision, It does not help in terms of memory usage.
Best,
Daniele
Re: BSE diagonalization solver error
Posted: Sat Oct 05, 2019 8:45 am
by sdwang
Dear Daniele,
You are right. I used double precision version. For my system, if I used single precision Yambo, the G vectors after initialization is far less than in nscf.out, and if switch to dp version,
it increases obviously but still less than in nscf.out.
Thanks!
Shudong
Re: BSE diagonalization solver error
Posted: Sun Oct 06, 2019 8:57 am
by sdwang
Dear Daniele,
I rerun it as your suggestion moving all paralell options to k, but it failed as before. I monitored the memory usage and found the memory did not reach to the limit of the machine (1Tb) before
it stoped.
Attached is the log and shown the diagonalization stoped.
I noticed in previous you mentioned one can perform serial calculation without rerun the BS matrix. My process is do yambo -b, and then yambo -o b -k sex -y d. I think your suggestion is if this stopped but with kernel OK, we can go on with yambo -y d using serial calculation. But when I do this(yambo -y d), the kernel rerun again...
Thanks!
Best
Shudong
Re: BSE diagonalization solver error
Posted: Sun Oct 06, 2019 9:04 am
by sdwang
will wrote:Hi Shudong,
Finally, I solved the problem using the serial run. And the memory used in the serial run is rather smaller than the memory per core in the parallel run. Maybe the memories are not correctlly distributed in the parallel run.
Best,
Xiaowei
Dear Xiaowei,
Thanks for your suggestion. I am not sure if the serial run works or not since it will take a long time. How did you do that?
Thanks!
SD
Re: BSE diagonalization solver error
Posted: Sun Oct 06, 2019 4:10 pm
by Daniele Varsano
Dear Shudong,
found the memory did not reach to the limit of the machine (1Tb) before it stoped.
OK, it stops when it tries to allocate and the memory is not enough.
I noticed in previous you mentioned one can perform serial calculation without rerun the BS matrix. But when I do this(yambo -y d), the kernel rerun again...
Usually, when it rerun in the report you should find the reason why it happens as en (ERR). Anyway, in your case this is because the struture of the BSE database is CPU dependent so in order to read the BSE matrix you need the same number of cpus you used to calculate it.
Considering that the building of the BSE matrix is not that slow, what I can suggest you is to repeat the calculation using less CPU's and activate OMP threads, in this case, the variable
, in orde to do that you need to compile yambo with the omp support.
Finally please note this warning message in your report:
Code: Select all
[WARNING] n_eh_CPU > 1 in a system with symmetries and k-points is not efficient. Try distributing first "k" and "t"
Best,
Daniele
Re: BSE diagonalization solver error
Posted: Mon Oct 07, 2019 9:44 am
by sdwang
Dear Daniele,
I ran with OMP and my setting is:
PAR_def_mode= "memory" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
BS_CPU= "1 2 4" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 4 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 4 # [PARALLEL] CPUs for Linear Algebra
#X_Threads=2 # [OPENMP/X] Number of threads for response functions
#DIP_Threads=2 # [OPENMP/X] Number of threads for dipoles
K_Threads=8 # [OPENMP/BSK] Number of threads for response functions
,but it stoped with log as:
...
<13m-35s> P0002: Kernel |################## | [090%] 13m-28s(E) 14m-58s(X)
<14m-22s> P0002: Kernel |################### | [095%] 14m-15s(E) 15m-00s(X)
<15m-12s> P0002: Kernel |####################| [100%] 15m-06s(E) 15m-06s(X)
<15m-32s> P0002: [06] BSE solver(s)
<15m-32s> P0002: [LA@Response_T_space] PARALLEL linear algebra uses a 2x2 SLK grid (4 cpu)
<15m-32s> P0002: [06.01] Diago solver
<15m-32s> P0002: [MEMORY] Alloc BS_mat( 21.23366Gb) TOTAL: 23.20335Gb (traced)
<16m-48s> P0002: BSK diagonalize | | [000%] --(E) --(X)
<16m-48s> P0002: [MEMORY] Alloc M_loc%blc( 5.308416Gb) TOTAL: 27.21647Gb (traced)
<19m-49s> P0002: [MEMORY] Alloc V_loc%blc( 5.308416Gb) TOTAL: 32.52488Gb (traced) 360.6920Mb (memstat)
, and in report file, the error is :
[06.01] Diago solver
====================
[WARNING]Allocation attempt of WS%v_cmplx of zero size.
If I increased the BS_nCPU_LinAlg_INV and BS_nCPU_LinAlg_DIAGO from 4 to 8, the calculation becomes very slow...
Thanks!
Best
Shudong
Re: BSE diagonalization solver error
Posted: Mon Oct 07, 2019 3:03 pm
by Daniele Varsano
Dear Shudong,
as I told you before I would skip the scalapack parallelization and move all the CPU on "k".
Try to see if even by using a few CPUs the BSE matrix is built in a reasonable time.
Best,
Daniele