BSE diagonalization solver error

Deals with issues related to computation of optical spectra in reciprocal space: RPA, TDDFT, local field effects.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: BSE diagonalization solver error

Post by haseebphysics1 » Sun Mar 22, 2020 10:22 pm

Dear Daniele,
May I ask how to calculate roughly the RAM size from the BSE matrix dimension? You have done this in one of the posts as:
You have a matrix of large dimension (Nd~49000). A rough estimation of the memory:
NdxNdx16/(1024^3) ~ 36Gb per core
I have a smaller dimension size than 49000 and more RAM/core but still getting the memory issue using BSE inversion solver!

Will there be a big difference in different BSE solvers like inversion, full diago. and Haydock etc in terms of memory consumption? But optical spectra should remain the same no matter which solver was used. Correct?

One interesting thing that I am experiencing right now is, I have switch off MPI and also OpenMP and doing a serial calculation of the same BSE matrix which I was not able to run on 125 GB ram! It is now consuming just around 20 GB RAM in serial mode!

previously I was doing on just 2 cores with 122 GB/core and was unsuccessful! So, it might be that, in parallel, the same memory is duplicating? I have changed just one thing, from inversion solver to diagonalization solver...

I will send the input and report files. If needed.

Thanks,
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE diagonalization solver error

Post by Daniele Varsano » Mon Mar 23, 2020 10:15 am

Dear Haseeb,
May I ask how to calculate roughly the RAM size from the BSE matrix dimension
It is just the number of elements of the matrix multiplied for the byte of each element and converted from bytes to Gb.
But optical spectra should remain the same no matter which solver was used. Correct?
The optical spectra will be the same. A full diagonalization provides also the eigenvectors while haydock will not. Haydock is a recursive method which is very powerful for large matrices, in particular when the diagonalization is prohibitive.
So, it might be that, in parallel, the same memory is duplicating?
Yes, it is possible. Note that the diagonalization will be done in serial anyway unless you link the code to the ScaLAPACK libraries and provides the number of CPU in the input (BS_nCPU_LinAlg_DIAGO). I'm not expert in the "inversion" technique.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: BSE diagonalization solver error

Post by haseebphysics1 » Mon Mar 23, 2020 9:14 pm

sdwang wrote: Sun Oct 06, 2019 8:57 am Dear Daniele,


I noticed in previous you mentioned one can perform serial calculation without rerun the BS matrix. My process is do yambo -b, and then yambo -o b -k sex -y d. I think your suggestion is if this stopped but with kernel OK, we can go on with yambo -y d using serial calculation. But when I do this(yambo -y d), the kernel rerun again...

Thanks!

Best

Shudong
Dear Daniele, this is from a pretty old post, but nowadays, I am also facing the same issue. I have done the dielectric screening using yambo -b
then build the kernel using a separate calculation, yambo -k sex.
All database was stored on the SAVE folder.

Now, when I do the yambo -y i , it complains about the em1s database. So, I added the em1s parameters by hand in bse file, now it started but it starts the kernel from the very beginning! Although the ndb.BS_Q1_CPU_x databases were already present in the SAVE.

Any help in this regard will be very useful.

Regards,
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE diagonalization solver error

Post by Daniele Varsano » Tue Mar 24, 2020 8:46 am

Dear Haseeb,
you can't do it with your present calculation, as the databases have a defined CPU structure (ndb.BS_Q1_CPU_x).
I think you can do if you calculate the BS matrix using a parallel I/O. In order to do that you need to compile the code with the HDF5 support. I never used it so I cannot help much on this.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: BSE diagonalization solver error

Post by Davide Sangalli » Wed Mar 25, 2020 10:37 am

Just to add few more info.

A Daniele said you need to compile yambo enabling parallel I/O support to be able to read an ndb.BS database regardless of the parallel structure used.
To achieve that yambo needs to be compiled with the

Code: Select all

--enable-hdf5-par-io
flag.
In such case the restart will work even if the previous calculation was interrupted during the construction of the kernel and thus with a partially filled ndb.BS file
Notice that in the log of the restart, you will find that the code goes again through the BSK loop, but it will be very fast.

Otherwise, if you use yambo compiled without parallel I/O you can restart only
- using exactly the same number of core and the same parallel structure
- if the ndb.BS_CPU_X files were finalized

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: BSE diagonalization solver error

Post by haseebphysics1 » Wed Mar 25, 2020 1:23 pm

Davide Sangalli wrote: Wed Mar 25, 2020 10:37 am loop, but it will be very fast.

Otherwise, if you use yambo compiled without parallel I/O you can restart only
- using exactly the same number of core and the same parallel structure
- if the ndb.BS_CPU_X files were finalized

Best,
D.
Dear Davide, thank you for the useful suggestion!

I just want to mention, in my case, I tried to restart with exactly the same processor cores and BSE kernel was completely done, which means ndb.BS_Q1_CPU_x files were fully present. And importantly I don't change my BSE input file also. I'm checking that is it can be started in the diagonalization method...

Thanks,
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: BSE diagonalization solver error

Post by Davide Sangalli » Thu Mar 26, 2020 9:25 am

I just want to mention, in my case, I tried to restart with exactly the same processor cores and BSE kernel was completely done, which means ndb.BS_Q1_CPU_x files were fully present. And importantly I don't change my BSE input file also. I'm checking that is it can be started in the diagonalization method...
Ok. Then it should have worked.
Can you attach the report of the restart ?

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: BSE diagonalization solver error

Post by haseebphysics1 » Thu Mar 26, 2020 2:30 pm

Dear Davide,

I have done the calculation again, and it seems that inversion was able to restart and it skipped the kernel at least! But I have stuck to another problem! All the reports and LOG and my slurm script files are attached.

I thought that it might be a memory issue. But the same calculations (with the same BSEbands and k-points) via diagonalization method took only 36 GB per core and here I am allocating ~ 62 GB per core to stay on the safe side.

And 2nd thing which I thought might be the issue within intel mpi! But then a lot of other calculations were successful with the same ifort and mpi!

Any help would be highly appreciated!

Thanks,
You do not have the required permissions to view the files attached to this post.
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: BSE diagonalization solver error

Post by Davide Sangalli » Thu Mar 26, 2020 3:05 pm

Yeah, the restart worked fine
As you can see in the report there are the following lines

Code: Select all

  [06.03] BSE Kernel @q1 (Resonant CORRRELATION EXCHANGE)
  =======================================================

  [BSE] Exchange components : 1895
  [RD./SAVE//ndb.BS_Q1_CPU_0]---------------------------------
   Brillouin Zone Q/K grids (IBZ/BZ):  40   72   40   72
   RL vectors                   (WF):  5353
   Coulomb cutoff potential         :none
   Parallel CPUs                    :4.1
   Parallel Roles                   :eh.k
   Fragmentation                    :no
It means it is reading the ndb from the previous run and the paramenters reported are checked (see the two lines on parallel CPUs / Roles) and match.
If not you would have seen something like

Code: Select all

[*ERR]   Parallel CPUs                    :2.2
   Parallel Roles                   :eh.k
   Fragmentation                    :no
The crash I tend to suspect it is a memory issue as well.

Just one comment. Why do you use the inverison solver ? It tries to allocate the whole BSE matrix in memory
You can just use the Haydock which is far more efficient and less memory demanding.

If for any reason you really need the inversion solver it may help setting

Code: Select all

BS_nCPU_LinAlg_INV=4
It will use scalapack instead of lapack and should distribute the matrix over the cores and you can possibly run on a single node.
Thus you will se the message

Code: Select all

[LA] SERIAL linear algebra
replaced by the scalapck one
But I fear it will take forever to invert such a huge matrix ..

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

haseebphysics1
Posts: 169
Joined: Sat Aug 17, 2019 2:48 pm

Re: BSE diagonalization solver error

Post by haseebphysics1 » Thu Mar 26, 2020 3:33 pm

Dear Davide, thank you for your help.
Why do you use the inverison solver ?
Just because I wanted to use the double-grid method to save k-points! I don't think I can do this in other schemes?


Thanks,
Haseeb Ahmad
MS - Physics,
LUMS - Pakistan

Post Reply