Page 3 of 3

Re: BSE calculation stopped

Posted: Fri Jul 22, 2022 10:03 am
by ezekiel
Dear Daniele,
The Kernel calculation input file is attached.

Ezekiel

Re: BSE calculation stopped

Posted: Mon Jul 25, 2022 8:44 am
by Daniele Varsano
Dear Ezekiel,

you can try to set manually the BS parallelization:
something like,

Code: Select all

BS_CPU= "2 8 4"                       # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t"                     # [PARALLEL] CPUs roles (k,eh,t)
and you can try different combinations.
Something that can help is to use the parallel IO in this way you can restart BS calculation.

In order to activate the parallel IO for BS, I think you need to compile the code using the flag: --enable-netcdf-par-io
but I do not know if there are other variables to be set. Let me check and I will be back with more instructions.

Best,
Daniele

Re: BSE calculation stopped

Posted: Mon Jul 25, 2022 9:35 am
by Daniele Varsano
Dear Ezekiel,

--enable-netcdf-par-io should be deprecated and the correct flag is: --enable-hdf5-par-io
If you are using a recent version of the code this is set by default so your calculation is probably already using a parallel IO.
You can verify this by looking at the BS database (parallel IO should have a single db, otherwise you have a number of db equal to the MPI tasks you are using).

If this is the case, you can try to relaunch your calculation and it should restart it from the point it stopped.

Best,
Daniele

Re: BSE calculation stopped

Posted: Fri Aug 12, 2022 7:27 am
by ezekiel
Dear developers,
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.

Regards

Oyeniyi Ezekiel

Re: BSE calculation stopped

Posted: Fri Aug 12, 2022 7:31 am
by ezekiel
In addition, my report file is attached.

Re: BSE calculation stopped

Posted: Mon Aug 15, 2022 7:46 am
by ezekiel
ezekiel wrote: Fri Aug 12, 2022 7:27 am Dear developers,
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.

Regards

Oyeniyi Ezekiel
Please assist.

Re: BSE calculation stopped

Posted: Tue Aug 16, 2022 9:32 am
by Daniele Varsano
Dear Oyeniyi Ezekiel,


do you have some error messages from the batch execution?

you have a quite large kernel ( [BSK] (total): 145800) and probably you are experiencing a memory problem (but I'm not totally sure about it).
You can try to reduce the mpi-processes per node in order to have more available memory.
Try to set:
#PBS -l select=3:ncpus=24:mpiprocs=12
mpirun -np 36 ...
BS_CPU= "36 1 1"

Also, if you compile the code using --enable-memory-profile you will have some information about the used memory in the log files, and knowing how much memory per node you have in your machine you can better tune your input/script.

Best,
Daniele

Re: BSE calculation stopped

Posted: Tue Aug 16, 2022 9:56 am
by ezekiel
Dear Daniele,
Thank you for your response. I don't have error messages from the batch execution.
I will try to implement your suggestions.

Many thanks

Ezekiel