Yambo Community Forum

Posted: **Mon Nov 03, 2025 1:18 pm**

Dear Yambo developers and users,

I am running the Statically screened electron–electron interaction calculation (yambo -X s) for bulk MoS2. The run took about three days and apparently stopped before finishing.

When I checked the LOG directory, I found that the calculation reached approximately q[268], and the last messages were:

<02d-09h-56m> [WARNING] Empty workload for CPU 10
<02d-09h-56m> [PARALLEL distribution for X Frequencies on 40 CPU] Loaded/Total (Percentual):0/1(0%)
<02d-09h-56m> [PARALLEL distribution for RL vectors(X) on 2 CPU] Loaded/Total (Percentual):130305/261121(50%)
<02d-09h-56m> [X-CG] R(p) Tot o/o(of R): 59873 1539200 100
<02d-09h-56m> Xo@q[268] progress up to ~97%

After the job stopped, I found in the external folder (the calculation was run with the option -J) a lock file:

ndb.em1s_fragment_267-2686910465-3599.lock

My questions are:

Can this calculation be restarted from where it stopped (around q[268]) without recomputing all previous q-points?

If so, what is the proper way to restart — should I remove the lock file or use a specific restart flag?

Could the warning about empty workload for CPU 10 and the unbalanced workload distribution be related to the crash?

System and run details:

Command: mpirun -np 40 yambo -X s

Yambo version: 5.2
System: bulk MoS2 (with spin)
Grid dimensions: 10x20x20
Each node: 32 cores, 191 GB RAM, Infiniband network
Time per q-point: ~10.12 min
Total runtime before stop: ~2 days and 10 hours

Any suggestions to correctly restart the calculation or improve the load balancing would be very helpful.

Thank you very much for your time.

I've attached the input and the r-setup files
Best regards,

Paula Buitrago
PhD Student - Lab. of Surface and Interface Physics
National University of the Litoral (UNL)

Posted: **Tue Nov 04, 2025 10:04 am**

Dear Paula,

yes you can restart your calculation simply by reruiing the same unput. Yambo will read the calculated file and start the calculate the missing one. Probably yes, remove the .lock file.

Your calculation is very unbalanced as you can see from the warning.

In order to have a better balance you can define explicitley the parallel strategy.
Now I do not recall exactly the name of the variable in the version you are using.
Anyway you can do the following:

1) Generate the input file adding parallel verbosity (yambo -X s -V par).
2) You will have in input these variable (not that in 5.1 they can be with a similar but different name):

Code: Select all

X_and_IO_CPU= ""                 # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= ""               # [PARALLEL] CPUs roles (q,g,k,c,v)

3) Assign MPI task on conduction and valence band e.g.

Code: Select all

X_and_IO_CPU= "1 1 1 10 4"                 # [PARALLEL] CPUs for each role
X_and_IO_ROLEs= "q g k c v"               # [PARALLEL] CPUs roles (q,g,k,c,v)

Best,

Daniele

Posted: **Thu Nov 27, 2025 4:55 pm**

Dear Daniel,

Thank you very much for your reply. I hadn’t written back earlier because I wanted to try a few tests based on your suggestions.

1. I removed the .lock file and reran my calculation, and it finished successfully. Do you know why this lock file is generated? It happened several times during my tests.

2. Regarding your comment that “your calculation is very unbalanced as you can see from the warning. In order to have a better balance you can explicitly define the parallel strategy”, I tried several parallel strategies, but I never managed to obtain a good percentage for each distribution — the maximum I get is around 25%. Also, when I achieve a slightly better balance, the calculation takes significantly longer.
So, what would be the best strategy to obtain both a good balance and computational efficiency?

I tried the suggested strategy

Code: Select all

DIP_Threads=0                    # [OPENMP/X] Number of threads for dipoles
X_Threads=0                      # [OPENMP/X] Number of threads for response functions
K_Threads=0                      # [OPENMP/BSK] Number of threads for response functions
X_all_q_ROLEs= "g q k c v"     # [PARALLEL] CPUs roles (g,q,k,c,v)
X_all_q_CPU= "1 1 1 10 4"      # [PARALLEL] CPUs for each role
X_all_q_nCPU_LinAlg_INV= -1   # [PARALLEL] CPUs for Linear Algebra

but the warning still appears:

Code: Select all

---> P20-n-10: MPI Cores-Threads   : 40(CPU)-1(threads)-1(threads@X)-1(threads@DIP)
<---> P20-n-10: MPI Cores-Threads   : X_and_IO(environment)-1 1 1 10 4(CPUs)-q g k c v(ROLEs)
.
.
.
<09m-59s> P20-n-10: [WARNING] Empty workload for CPU 20
<09m-59s> P20-n-10: [PARALLEL distribution for X Frequencies on 40 CPU] Loaded/Total (Percentual): 0/1 (0%)
<09m-59s> P20-n-10: [PARALLEL distribution for RL vectors(X) on 1 CPU] Loaded/Total (Percentual): 261121/261121 (100%)

I also tested the run on a single core and the calculation finished very quickly:

Code: Select all

DIP_Threads=1                    # [OPENMP/X] Number of threads for dipoles
X_Threads=2                      # [OPENMP/X] Number of threads for response functions
K_Threads=1                      # [OPENMP/BSK] Number of threads for response functions
X_all_q_ROLEs= "g q k c v"     # [PARALLEL] CPUs roles (g,q,k,c,v)
X_all_q_CPU= "1 1 2 4 4"      # [PARALLEL] CPUs for each role
X_all_q_nCPU_LinAlg_INV= -1   # [PARALLEL] CPUs for Linear Algebra

Code: Select all

<---> [01] MPI/OPENMP structure, Files & I/O Directories
<---> MPI Cores-Threads   : 1(CPU)-1(threads)-2(threads@X)-1(threads@DIP)-1(threads@K);

Could it be that the calculation simply performs better without parallelization?

I would be very grateful if you could help me understand this and guide me on how to choose the optimal parallel strategy.

Thank you in advance for your time.

Best regards,

Paula Buitrago
PhD Student - Lab. of Surface and Interface Physics
National University of the Litoral (UNL)

Posted: **Fri Nov 28, 2025 4:27 pm**

Dear Paula,

1. The .lock files are generated to signal that a database is in use. Once the job is completed, they are removed. They are there because the job crashed and did not finish correctly.

2. I do not know what do you mean exactly with maximum distribution. The job is balanced when the percentage of the workload is similar for each task. E.g. 25% for 4 MPI tasks is a "perfect balancing". Note that the percentages are reported for each parallelization field.

*empty workload: I should inspect the complete log to understand. It is possible that this is the distribution over frequency and as you are doing a static calculation this is perfectly normal.

* Could it be that the calculation simply performs better without parallelization?

I would say no. Here, either there is a communication problem in the machine or a miscompilation.
Anyway, did you start a serial calculation by scratch or ran it on top of a previous calculation? As in the latter case, Yambo does not perform the calculation, but just read the previously computed database.

If you post your report/log files of your calculation, we will take a look and try to understand what is going wrong.
Please also post the config.log file.

Best,
Daniele

Posted: **Sun Nov 30, 2025 3:14 pm**

Dear Daniele,

Thank you for your reply.

1. About the .lock files:
That makes sense. In my first parallel run with 64 cores the job crashed near 95% and did not complete, which explains why the .lock file remained.

2. Regarding the workload distribution:
Thank you for the explanation. I now understand that the workload is balanced when each MPI task has a similar percentage.

3. To give you some context: my first run was done in parallel using 64 cores. The calculation progressed up to ~95% and then stopped producing output without finishing. Later, I repeated the run using a single core, and in that case the job completed successfully and produced the final output.

I discussed this with the cluster administrators, and they informed me that this specific Yambo case generates extremely intensive I/O. When running in parallel directly from the storage (NFS), file-locking conflicts occur, while running entirely in RAM eventually leads to a loss of communication between MPI processes even though the processes remain alive. For now, they recommended running the calculation on a single core and copying the whole case to /dev/shm before execution, and only writing the results back to storage at the end.

I am currently following this approach, although the serial execution will significantly increase the wall time of the calculations.

As requested, I am attaching the report/log files of both runs (parallel and serial), as well as the config.log.

Best regards,

Paula Buitrago
PhD Student - Lab. of Surface and Interface Physics
National University of the Litoral (UNL)

Posted: **Mon Dec 01, 2025 8:31 am**

Dear Paula,

please note that the serial run took few time as it was a restart. It reads previous stored file and calculated the remaining part: in your case, less than 0.3% of the total elements:
see in the log:

Code: Select all

Kernel loaded percentual :  99.71201 [%]

Yes there is intensive I/O but in this case this also happens because it is a restart calculation.
Note that the 95% of the parallel run is not the real calculation, but it is a reading process from a previous calculation (that indeed seems to be very slow).

Finally, please note you are using 64 MPI tasks, but in input you assign the parallelization strategy for 32 tasks:

Code: Select all

X_all_q_CPU= "1 1 2 4 4"

morevoer, this is the strategy for the response function. The variables for BSE parallelization are not present in your input:

Code: Select all

BS_CPU= "n1, n2, n3"             # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t"           # [PARALLEL] CPUs roles (k,eh,t)

in this case yambo use a default value that it is not guaranteed it is optimal.

Your BSE matrix has a rather large dimension (64000): running it in serial would be very time-consuming.
My suggestion is to redo a clean (parallel) calculation starting from scratch, ie removing existing previously calculated ndb.BS databases, assigning a larger number of tasks to the role "k" (e.g. 16) and the rest to the other roles (the product of the assigned tasks should match the total mpi processes) . If the calculation does not end correctly you can restart it (eventually in serial, if parallel restart shows problem).

Best,

Daniele

Yambo Community Forum

Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)

Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)

Re: Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)

Re: Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)

Re: Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)

Re: Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)

Re: Restarting yambo -X after interruption: .lock file found in ndb.em1s folder (v5.2)