BSE SLEPC - memory efficiency could be improved

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
User avatar
malwi
Posts: 38
Joined: Mon Feb 29, 2016 1:00 pm

BSE SLEPC - memory efficiency could be improved

Post by malwi » Fri Mar 28, 2025 10:17 am

Dear Yambo Developers,

BSE slepc reads large data ndb.BS_PAR_Q1 using all cpu and taking a lot of memory (6 atoms with SOC needs 400GB).
It takes 1h-7min for the case I tried, while the calculations take 3min.
Maybe it could be possible to rewrite this part for the calculations "on-the-fly"?

Below is the time report.
Best regards,
Gosia

Code: Select all

 [06] Timing Overview
 ====================

 Clock: global (MAX - min (if any spread is present) clocks)
            (...)
            io_WF :     13.1474s P30 (   607 calls,   0.022 msec avg) [MAX]      0.0208s P3 [min]
DIPOLE_transverse :     40.5606s P37 [MAX]      0.0319s P31 [min]
            (...)
          Dipoles :     42.3740s P2 [MAX]     42.3739s P42 [min]
     Slepc Solver :    208.9028s P1 [MAX]    208.8697s P41 [min]
            io_BS :      01h-07m P40 (109384 calls,   0.037 sec avg) [MAX]       52m-28s P4 (109384 calls,   0.029 sec avg) [min]
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland

User avatar
Daniele Varsano
Posts: 4198
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE SLEPC - memory efficiency could be improved

Post by Daniele Varsano » Fri Mar 28, 2025 10:23 am

Dear Gosia,

can you please also post the input and report file?
Some info as size of the BSE matrix and percentage of eigenvector required can be useful.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

andrea.ferretti
Posts: 214
Joined: Fri Jan 31, 2014 11:13 am

Re: BSE SLEPC - memory efficiency could be improved

Post by andrea.ferretti » Fri Mar 28, 2025 11:56 am

Dear Gosia,

besides Daniele's suggestion (which needs to be considered first),
once the input file were ok, you could consider checking this development branch

Code: Select all

https://github.com/yambo-code/yambo/tree/tech-ydiago
currently in a pull request, which is a fix contributed by a user to improve on the BSE solver
(memory footprint and solver setup)

not sure this helps but several fixes are included.

take care
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

User avatar
Davide Sangalli
Posts: 640
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: BSE SLEPC - memory efficiency could be improved

Post by Davide Sangalli » Fri Mar 28, 2025 1:01 pm

Dear Gosia,
the issue is that the code is doing too many calls to io_BS_PAR. Possibly because the BSE kernel was splitted in many blocks.

You can switch off the I/O of the BSE kernel, just add in input this line:

Code: Select all

DBsIOoff="BS"
Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

User avatar
malwi
Posts: 38
Joined: Mon Feb 29, 2016 1:00 pm

Re: BSE SLEPC - memory efficiency could be improved

Post by malwi » Sat Apr 19, 2025 4:21 pm

Dear Daniele, Andrea and Davide,
Thank you very for your prompt answer and forgive us late reply. We make a test suite in Cyfronet and LUMI for the memory use i Yambo. I prepared cells with 6, 12, 48 and 96 atoms and 3 levels of accuracy accordig to k-points, bands and plane waves.
The long job was run by a colleague. Then my job of the same input at Ares compter was much shorter. We tested it several times and got running time 1h-7min down to 29 min all of them using 48 cpu at 1 node,
and the running time for 96 cpu at 2 nodes was only 6min. Seems that it strongly depends on the number of users.
I contiinue testing memory (up to 96 atoms). The inputs for the above case are attached.

Buona Pasqua!
Gosia
You do not have the required permissions to view the files attached to this post.
dr hab. Małgorzata Wierzbowska, Prof. IHPP PAS
Institute of High Pressure Physics Polish Academy of Sciences
Warsaw, Poland

batman
Posts: 14
Joined: Sun Jun 20, 2021 2:04 pm

Re: BSE SLEPC - memory efficiency could be improved

Post by batman » Fri Apr 25, 2025 7:30 am

Dear Gosia,

Couple of things:

1) When dealing with large files, performing parallel I/O on HPCs requires manual intervention. This can have a significant impact on I/O time.
This is not related to Yambo or its underlying I/O libraries. Moreover, it depends on the specific filesystem your HPC uses.
For instance, according to https://docs.lumi-supercomputer.eu/stor ... ems/lumip/, the scratch storage is equipped with a Lustre file system.
In that case, you need to adjust some parameters (mainly the stripe count for Lustre, based on my experience). Please refer to your HPC documentation.
For LUMI, a quick Google search led me to this: https://lumi-supercomputer.github.io/LU ... 08_Lustre/
Alternatively, you can contact your HPC administrator for help.
In any case, if you don’t need to store the kernel, since it is very fast in your case, you can follow Davide’s suggestion.

2) If you are already aware of (1), you can read this. As already mentioned, the IO_BS function in Yambo is called over 100K times.
This is exactly what the HDF5/NetCDF libraries advise users to avoid when performing large parallel writes.
However, this was a kind of compromise — writing frequently allows for restarting BSE calculations, at the cost of some performance penalty. But this penalty can become significant when performing writing large files. This is not a solution, but simply an explanation of why certain things are sometimes coded this way.

Best regards,
Murali
Muralidhar Nalabothula
Doctoral student at Department of Physics and Materials Science,
Université du Luxembourg

Post Reply