Page 3 of 5
Re: BSE diagonalization solver error
Posted: Sun Mar 22, 2020 10:22 pm
by haseebphysics1
Dear Daniele,
May I ask how to calculate roughly the RAM size from the BSE matrix dimension? You have done this in one of the posts as:
You have a matrix of large dimension (Nd~49000). A rough estimation of the memory:
NdxNdx16/(1024^3) ~ 36Gb per core
I have a smaller dimension size than 49000 and more RAM/core but still getting the memory issue using BSE inversion solver!
Will there be a big difference in different BSE solvers like inversion, full diago. and Haydock etc in terms of memory consumption? But optical spectra should remain the same no matter which solver was used. Correct?
One interesting thing that I am experiencing right now is, I have switch off MPI and also OpenMP and doing a serial calculation of the same BSE matrix which I was not able to run on 125 GB ram! It is now consuming just around 20 GB RAM in serial mode!
previously I was doing on just 2 cores with 122 GB/core and was unsuccessful! So, it might be that, in parallel, the same memory is duplicating? I have changed just one thing, from inversion solver to diagonalization solver...
I will send the input and report files. If needed.
Thanks,
Re: BSE diagonalization solver error
Posted: Mon Mar 23, 2020 10:15 am
by Daniele Varsano
Dear Haseeb,
May I ask how to calculate roughly the RAM size from the BSE matrix dimension
It is just the number of elements of the matrix multiplied for the byte of each element and converted from bytes to Gb.
But optical spectra should remain the same no matter which solver was used. Correct?
The optical spectra will be the same. A full diagonalization provides also the eigenvectors while haydock will not. Haydock is a recursive method which is very powerful for large matrices, in particular when the diagonalization is prohibitive.
So, it might be that, in parallel, the same memory is duplicating?
Yes, it is possible. Note that the diagonalization will be done in serial anyway unless you link the code to the ScaLAPACK libraries and provides the number of CPU in the input (BS_nCPU_LinAlg_DIAGO). I'm not expert in the "inversion" technique.
Best,
Daniele
Re: BSE diagonalization solver error
Posted: Mon Mar 23, 2020 9:14 pm
by haseebphysics1
sdwang wrote: ↑Sun Oct 06, 2019 8:57 am
Dear Daniele,
I noticed in previous you mentioned one can perform serial calculation without rerun the BS matrix. My process is do yambo -b, and then yambo -o b -k sex -y d. I think your suggestion is if this stopped but with kernel OK, we can go on with yambo -y d using serial calculation. But when I do this(yambo -y d), the kernel rerun again...
Thanks!
Best
Shudong
Dear Daniele, this is from a pretty old post, but nowadays, I am also facing the same issue. I have done the dielectric screening using yambo -b
then build the kernel using a separate calculation, yambo -k sex.
All database was stored on the SAVE folder.
Now, when I do the yambo -y i , it complains about the em1s database. So, I added the em1s parameters by hand in bse file, now it started but it starts the kernel from the very beginning! Although the ndb.BS_Q1_CPU_x databases were already present in the SAVE.
Any help in this regard will be very useful.
Regards,
Re: BSE diagonalization solver error
Posted: Tue Mar 24, 2020 8:46 am
by Daniele Varsano
Dear Haseeb,
you can't do it with your present calculation, as the databases have a defined CPU structure (ndb.BS_Q1_CPU_x).
I think you can do if you calculate the BS matrix using a parallel I/O. In order to do that you need to compile the code with the HDF5 support. I never used it so I cannot help much on this.
Best,
Daniele
Re: BSE diagonalization solver error
Posted: Wed Mar 25, 2020 10:37 am
by Davide Sangalli
Just to add few more info.
A Daniele said you need to compile yambo enabling parallel I/O support to be able to read an ndb.BS database regardless of the parallel structure used.
To achieve that yambo needs to be compiled with the
flag.
In such case the restart will work even if the previous calculation was interrupted during the construction of the kernel and thus with a partially filled ndb.BS file
Notice that in the log of the restart, you will find that the code goes again through the BSK loop, but it will be very fast.
Otherwise, if you use yambo compiled without parallel I/O you can restart only
- using exactly the same number of core and the same parallel structure
- if the ndb.BS_CPU_X files were finalized
Best,
D.
Re: BSE diagonalization solver error
Posted: Wed Mar 25, 2020 1:23 pm
by haseebphysics1
Davide Sangalli wrote: ↑Wed Mar 25, 2020 10:37 am
loop, but it will be very fast.
Otherwise, if you use yambo compiled without parallel I/O you can restart only
- using exactly the same number of core and the same parallel structure
- if the ndb.BS_CPU_X files were finalized
Best,
D.
Dear Davide, thank you for the useful suggestion!
I just want to mention, in my case, I tried to restart with exactly the same processor cores and BSE kernel was completely done, which means ndb.BS_Q1_CPU_x files were fully present. And importantly I don't change my BSE input file also. I'm checking that is it can be started in the diagonalization method...
Thanks,
Re: BSE diagonalization solver error
Posted: Thu Mar 26, 2020 9:25 am
by Davide Sangalli
I just want to mention, in my case, I tried to restart with exactly the same processor cores and BSE kernel was completely done, which means ndb.BS_Q1_CPU_x files were fully present. And importantly I don't change my BSE input file also. I'm checking that is it can be started in the diagonalization method...
Ok. Then it should have worked.
Can you attach the report of the restart ?
Best,
D.
Re: BSE diagonalization solver error
Posted: Thu Mar 26, 2020 2:30 pm
by haseebphysics1
Dear Davide,
I have done the calculation again, and it seems that inversion was able to restart and it skipped the kernel at least! But I have stuck to another problem! All the reports and LOG and my slurm script files are attached.
I thought that it might be a memory issue. But the same calculations (with the same BSEbands and k-points) via diagonalization method took only 36 GB per core and here I am allocating ~ 62 GB per core to stay on the safe side.
And 2nd thing which I thought might be the issue within intel mpi! But then a lot of other calculations were successful with the same ifort and mpi!
Any help would be highly appreciated!
Thanks,
Re: BSE diagonalization solver error
Posted: Thu Mar 26, 2020 3:05 pm
by Davide Sangalli
Yeah, the restart worked fine
As you can see in the report there are the following lines
Code: Select all
[06.03] BSE Kernel @q1 (Resonant CORRRELATION EXCHANGE)
=======================================================
[BSE] Exchange components : 1895
[RD./SAVE//ndb.BS_Q1_CPU_0]---------------------------------
Brillouin Zone Q/K grids (IBZ/BZ): 40 72 40 72
RL vectors (WF): 5353
Coulomb cutoff potential :none
Parallel CPUs :4.1
Parallel Roles :eh.k
Fragmentation :no
It means it is reading the ndb from the previous run and the paramenters reported are checked (see the two lines on parallel CPUs / Roles) and match.
If not you would have seen something like
Code: Select all
[*ERR] Parallel CPUs :2.2
Parallel Roles :eh.k
Fragmentation :no
The crash I tend to suspect it is a memory issue as well.
Just one comment. Why do you use the inverison solver ? It tries to allocate the whole BSE matrix in memory
You can just use the Haydock which is far more efficient and less memory demanding.
If for any reason you really need the inversion solver it may help setting
It will use scalapack instead of lapack and should distribute the matrix over the cores and you can possibly run on a single node.
Thus you will se the message
replaced by the scalapck one
But I fear it will take forever to invert such a huge matrix ..
Best,
D.
Re: BSE diagonalization solver error
Posted: Thu Mar 26, 2020 3:33 pm
by haseebphysics1
Dear Davide, thank you for your help.
Why do you use the inverison solver ?
Just because I wanted to use the double-grid method to save k-points! I don't think I can do this in other schemes?
Thanks,