BSE calculation stopped

Run-time issues concerning Yambo that are not covered in the above forums.

Moderators: myrta gruning, andrea marini, Daniele Varsano, Conor Hogan

ezekiel
Posts: 19
Joined: Fri Oct 02, 2020 11:01 pm

Re: BSE calculation stopped

Post by ezekiel » Fri Jul 22, 2022 10:03 am

Dear Daniele,
The Kernel calculation input file is attached.

Ezekiel
You do not have the required permissions to view the files attached to this post.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation stopped

Post by Daniele Varsano » Mon Jul 25, 2022 8:44 am

Dear Ezekiel,

you can try to set manually the BS parallelization:
something like,

Code: Select all

BS_CPU= "2 8 4"                       # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t"                     # [PARALLEL] CPUs roles (k,eh,t)
and you can try different combinations.
Something that can help is to use the parallel IO in this way you can restart BS calculation.

In order to activate the parallel IO for BS, I think you need to compile the code using the flag: --enable-netcdf-par-io
but I do not know if there are other variables to be set. Let me check and I will be back with more instructions.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation stopped

Post by Daniele Varsano » Mon Jul 25, 2022 9:35 am

Dear Ezekiel,

--enable-netcdf-par-io should be deprecated and the correct flag is: --enable-hdf5-par-io
If you are using a recent version of the code this is set by default so your calculation is probably already using a parallel IO.
You can verify this by looking at the BS database (parallel IO should have a single db, otherwise you have a number of db equal to the MPI tasks you are using).

If this is the case, you can try to relaunch your calculation and it should restart it from the point it stopped.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

ezekiel
Posts: 19
Joined: Fri Oct 02, 2020 11:01 pm

Re: BSE calculation stopped

Post by ezekiel » Fri Aug 12, 2022 7:27 am

Dear developers,
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.

Regards

Oyeniyi Ezekiel
You do not have the required permissions to view the files attached to this post.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria

ezekiel
Posts: 19
Joined: Fri Oct 02, 2020 11:01 pm

Re: BSE calculation stopped

Post by ezekiel » Fri Aug 12, 2022 7:31 am

In addition, my report file is attached.
You do not have the required permissions to view the files attached to this post.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria

ezekiel
Posts: 19
Joined: Fri Oct 02, 2020 11:01 pm

Re: BSE calculation stopped

Post by ezekiel » Mon Aug 15, 2022 7:46 am

ezekiel wrote: Fri Aug 12, 2022 7:27 am Dear developers,
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.

Regards

Oyeniyi Ezekiel
Please assist.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria

User avatar
Daniele Varsano
Posts: 3773
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: BSE calculation stopped

Post by Daniele Varsano » Tue Aug 16, 2022 9:32 am

Dear Oyeniyi Ezekiel,


do you have some error messages from the batch execution?

you have a quite large kernel ( [BSK] (total): 145800) and probably you are experiencing a memory problem (but I'm not totally sure about it).
You can try to reduce the mpi-processes per node in order to have more available memory.
Try to set:
#PBS -l select=3:ncpus=24:mpiprocs=12
mpirun -np 36 ...
BS_CPU= "36 1 1"

Also, if you compile the code using --enable-memory-profile you will have some information about the used memory in the log files, and knowing how much memory per node you have in your machine you can better tune your input/script.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

ezekiel
Posts: 19
Joined: Fri Oct 02, 2020 11:01 pm

Re: BSE calculation stopped

Post by ezekiel » Tue Aug 16, 2022 9:56 am

Dear Daniele,
Thank you for your response. I don't have error messages from the batch execution.
I will try to implement your suggestions.

Many thanks

Ezekiel
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria

Post Reply