Yambo Community Forum

Posted: **Wed Mar 25, 2026 1:28 pm**

Dear developers,

I am just playing with parallelization trying to get an intuition for it, for that, please don't mind the weird numbers for now. I have however noticed a weird behavior in which Yambo completely ignores my input. Please find attached the input file and the log file.

In the input file, I am using

Code: Select all

X_CPU= "1 1 1 1 1"

and

Code: Select all

X_q_0_CPU= "1 1 1"

(as I have noticed that for q=0 this might be the variable, the documentation is not very clear about it).

In both cases, both are added in the input, with the second variable not present by default when geenrated with -V all, I do still see:

Code: Select all

<05m-19s> P1-nic5-w029: [05] Optics
 <05m-21s> P1-nic5-w029: [PARALLEL Response_G_space for K(bz) on 2 CPU] Loaded/Total (Percentual):108/216(50%)
 <05m-21s> P1-nic5-w029: [PARALLEL Response_G_space for Q(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
 <05m-21s> P1-nic5-w029: [PARALLEL Response_G_space for G-vectors on 1 CPU]
 <05m-21s> P1-nic5-w029: [PARALLEL Response_G_space for K-q(ibz) on 1 CPU] Loaded/Total (Percentual):12/20(60%)
 <05m-21s> P1-nic5-w029: [LA] SERIAL linear algebra
 <05m-21s> P1-nic5-w029: [PARALLEL Response_G_space for K(ibz) on 1 CPU] Loaded/Total (Percentual):12/20(60%)
 <05m-21s> P1-nic5-w029: [PARALLEL Response_G_space for CON bands on 2 CPU] Loaded/Total (Percentual):2/3(67%)

which indicates that a parallelization is taking place even though I have explicitly requested none at all...

I am grateful for any help.
Kind regards.

Posted: **Wed Mar 25, 2026 1:53 pm**

Dear mhamad,
first of all, please add a signature to your profile. It is a rule of the forum.

About the parallelization, yambo accepts the user provided values it the product of the numbers used in input matches the total number of MPI tasks used.

If not it switches to some automatic parallelization.

If you do not want to parallelize at all, the only way is to run a serial job.

Best,
D.

Posted: **Wed Mar 25, 2026 2:39 pm**

Dear Davide,

If I understand well, this means the MPI product must be the same across all the different variables, correct? Also, is X_q_0_CPU useful then in this case? i.e., does it really do anything special or is it simply controlled by X_CPU?

Kind regards,

Posted: **Wed Mar 25, 2026 3:18 pm**

The product of numbers inside a single variable.

For example if you run on 16 MPI tasks, this is accepted

Code: Select all

X_CPU= "2 1 4 2 1"

I do not remember why there is a different on for q=0 and what it does...

D.

Posted: **Wed Mar 25, 2026 3:50 pm**

I see. Thank you for your help.

If I may also, is there a general rule of thumb, for X parallelization, regarding when to use g, k, c and v? I am avoiding q and keeping it to 1, as I heard that it may lead to bad load balance. Of course, if purely optics and we set q=0 then g is also 1. So my question is, what is the general rule of thumb (if it exists) for X parallelization?
I have also noticed that for DIP, it seems to be that k is the best one by far, as whenever I try to use c or v, the code stalls for a long time before starting any R V P [g-space] calculation, it just gets stuck sometimes for 2 hours (and then the calculation is notoriously slow after that). I noticed this behavior whenever I use c or v with MPI > 2 or if both are set to 2. In other words, I noticed that it always stalls heavily whenever MPI_c * MPI_v > 2. Parallelization on k appears to be linear however, and again, for DIP.
For SE, I know that qp is near linear, if not due to memory limits, then one may use b.

As a summary of what I found so far, at least for my system:

- DIP: k >> c / v [ c / v only good if MPI_c * MPI_v <= 2 ]
- X: ?
- SE: qp > b > q (q should be 1 even in most cases).

And finally, one further small question, is threads allowed to be differnet for DIP than for X? If I set OMP to 4, but then decided for whatever reason that X should only use 2, will it be accepted or ignored?

Thank you again,

Posted: **Thu Mar 26, 2026 8:24 am**

Dear Mhamad,

1. is there a general rule of thumb, for X parallelization:

Parallelizing on "c" and "v" is usually the best option regarding memory distribution, and it is also effective regarding performance. Depending on the number of valence bands you can assign tasks such that Nv/task_v \simeq Nc/task_c in order to balance the workload.

2. DIP:

: Also in this case "c" and "v" distribute the memory, I have no explanation for your observed behaviour, anyway you can keep parallelizing on "k".

For SE, I know that qp is near linear, if not due to memory limits, then one may use b.

Correct.

is threads allowed to be differnet for DIP than for X?

I do not exactly if this is allowed, anyway the number of threads you want to use depends on your MPI tasks and your nodes resource. In general, you want to keep them fixed for all the runlevels and you can set it externally from your submission script setting the environment variable OMP_NUM_THREADS.

Best,
Daniele

Yambo Community Forum

Optics Parallelization ignored

Optics Parallelization ignored

Re: Optics Parallelization ignored

Re: Optics Parallelization ignored

Re: Optics Parallelization ignored

Re: Optics Parallelization ignored

Re: Optics Parallelization ignored