Question about checkpoint restart facility in Yambo for a long GW/BSE job

Concerns issues with computing quasiparticle corrections to the DFT eigenvalues - i.e., the self-energy within the GW approximation (-g n), or considering the Hartree-Fock exchange only (-x)

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
sabrine
Posts: 21
Joined: Tue Apr 26, 2022 3:05 pm
Location: Paris , France

Question about checkpoint restart facility in Yambo for a long GW/BSE job

Post by sabrine » Wed Mar 08, 2023 1:54 pm

Dear Developers and Users,

I am currently running a long Yambo job for GW/BSE calculations in cluster, that is expected to take approximately 5 days to complete. I am concerned about the risk of job failure due to hardware or software issues, and I am interested in implementing a checkpointing strategy to mitigate this risk.

I have read about the checkpointing solution for Yambo jobs, and I understand that it involves configuring the SLURM script and modifying the input files. However, I am also wondering if there is a checkpoint restart facility in Yambo that allows me to restart the job from the last checkpoint in case of a failure, without having to rerun the entire job from the beginning.

Can you please confirm if such a checkpoint restart facility exists in Yambo, and if so, how can I enable it for my long job? Also, are there any specific modifications that I need to make to the input file or the SLURM script to use this feature?

In my search for a solution, I came across a thread in the Yambo forum (viewtopic.php?t=320&start=30) that suggests that Yambo should be able to recognize an interrupted calculation and continue from where it left off when the same input is run again. Can you please confirm if this is accurate, and if so, would it be a reliable solution to my problem?

I appreciate any guidance or resources that you can provide to help me implement a reliable checkpointing strategy for my Yambo job.
Thank you for your time and assistance.

Best regards,
Dr.Sabrine Ayari
Laboratoire de Physique de lÉcole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, Paris, France

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Question about checkpoint restart facility in Yambo for a long GW/BSE job

Post by Daniele Varsano » Wed Mar 08, 2023 4:02 pm

Dear Sabrine,

a possibile strategy is the following:

Divide your calculations in steps.

1) Calculation of the screening
2) GW calculation
3) BSE calculation

1. In the calculation of the screening (e.g. plasmon pole) it is possible to restart. The calculation is done for each q point (maybe avoid parallelizing over q). All matrix elements eps^-1(q,g,g') are written on disk (ndb.pp_fragment_iq). If the code crash for any reason, just rerunning the same input file Yambo will restart the calculation from the last q point.
2. GW calculation, yambo read the screening previously calculated and start to evaluate <n| Sigma | n> elements. Here unfortunately there is not a possible restart, so the strategy here is to run small calculations, ie including few bands and k points in the %QPkrange variable. The idea is to split your %QPkrange of interests in different smaller calculations and then merge the database using the ypp utility.
3. BSE calculation has a restart when building the kernel (just rerun with the same input). it should laso have a restart for the Haydock procedure, while there is not restart for the diagonalization.

Hope it helps,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

sabrine
Posts: 21
Joined: Tue Apr 26, 2022 3:05 pm
Location: Paris , France

Re: Question about checkpoint restart facility in Yambo for a long GW/BSE job

Post by sabrine » Fri Mar 10, 2023 12:02 pm

Dear Daniele,

Many thanks for the proposed strategy .

Best Regrads.
Sabrine.
Dr.Sabrine Ayari
Laboratoire de Physique de lÉcole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, Paris, France

Post Reply