Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)
Posted: Mon Nov 03, 2025 1:18 pm
Dear Yambo developers and users,
I am running the Statically screened electron–electron interaction calculation (yambo -X s) for bulk MoS2. The run took about three days and apparently stopped before finishing.
When I checked the LOG directory, I found that the calculation reached approximately q[268], and the last messages were:
Command: mpirun -np 40 yambo -X s
Yambo version: 5.2
System: bulk MoS2 (with spin)
Grid dimensions: 10x20x20
Each node: 32 cores, 191 GB RAM, Infiniband network
Time per q-point: ~10.12 min
Total runtime before stop: ~2 days and 10 hours
Any suggestions to correctly restart the calculation or improve the load balancing would be very helpful.
Thank you very much for your time.
I've attached the input and the r-setup files
Best regards,
Paula Buitrago
PhD Student - Lab. of Surface and Interface Physics
National University of the Litoral (UNL)
I am running the Statically screened electron–electron interaction calculation (yambo -X s) for bulk MoS2. The run took about three days and apparently stopped before finishing.
When I checked the LOG directory, I found that the calculation reached approximately q[268], and the last messages were:
After the job stopped, I found in the external folder (the calculation was run with the option -J) a lock file:<02d-09h-56m> [WARNING] Empty workload for CPU 10
<02d-09h-56m> [PARALLEL distribution for X Frequencies on 40 CPU] Loaded/Total (Percentual):0/1(0%)
<02d-09h-56m> [PARALLEL distribution for RL vectors(X) on 2 CPU] Loaded/Total (Percentual):130305/261121(50%)
<02d-09h-56m> [X-CG] R(p) Tot o/o(of R): 59873 1539200 100
<02d-09h-56m> Xo@q[268] progress up to ~97%
My questions are:ndb.em1s_fragment_267-2686910465-3599.lock
- Can this calculation be restarted from where it stopped (around q[268]) without recomputing all previous q-points?
- If so, what is the proper way to restart — should I remove the lock file or use a specific restart flag?
- Could the warning about empty workload for CPU 10 and the unbalanced workload distribution be related to the crash?
Command: mpirun -np 40 yambo -X s
Yambo version: 5.2
System: bulk MoS2 (with spin)
Grid dimensions: 10x20x20
Each node: 32 cores, 191 GB RAM, Infiniband network
Time per q-point: ~10.12 min
Total runtime before stop: ~2 days and 10 hours
Any suggestions to correctly restart the calculation or improve the load balancing would be very helpful.
Thank you very much for your time.
I've attached the input and the r-setup files
Best regards,
Paula Buitrago
PhD Student - Lab. of Surface and Interface Physics
National University of the Litoral (UNL)