Page 1 of 1

Fail to run IP calculations in parallel

Posted: Thu Nov 23, 2023 8:57 am
by Jessie
Dear all,

I was trying to follow the steps in the tutorial https://www.yambo-code.eu/wiki/index.ph ... n_parallel to run an IP calculation in parallel, but it has failed. I also tried to run an RPA calculation and the same error occured. Here is the error information in the log files

Code: Select all

P1: [ERROR] STOP signal received while in[04] Dipoles
P1: [ERROR] Writing File ./IP_mpi8//ndb.dipoles; Variable  NOT DEFINED; NetCDF: Parallel operation on file opened for non-parallel access
and the error information on my screen:

Code: Select all

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[localhost.localdomain:42104] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2198
[localhost.localdomain:42104] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[localhost.localdomain:42104] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
The version of my yambo code is MPI+HDF5_MPI_IO - Ver. 5.1.0 and the code was installed by conda. The task has been run in the following steps:

1. yambo
(initialization)
2. yambo -o c -V par -F yambo_IP.in
(However, in this step, I didn't find variables named "X_Threads" and "DIP_Threads" which are mentioned in the tutorial webpage above, so I didn't change any variables about parallel in the input file.)
3. mpirun -np 8 yambo -F yambo_IP.in &

Could you give me some instructions to solve this error? Thanks.

Best regards,
Jessie

Re: Fail to run IP calculations in parallel

Posted: Thu Nov 23, 2023 9:06 am
by Daniele Varsano
Dear Jessie,

it seems you have an I/O problem. It could be related to some conflict with the used parallel compiler and the mpirun you are using to launch the job.
In any case, I suggest you to update to the latest version of the code and try again. If the problem persists we will insoect it in more details.
Best,
Daniele