BSE calculation stopped
Moderators: myrta gruning, andrea marini, Daniele Varsano, Conor Hogan
-
- Posts: 27
- Joined: Fri Oct 02, 2020 11:01 pm
Re: BSE calculation stopped
Dear Daniele,
The Kernel calculation input file is attached.
Ezekiel
The Kernel calculation input file is attached.
Ezekiel
You do not have the required permissions to view the files attached to this post.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: BSE calculation stopped
Dear Ezekiel,
you can try to set manually the BS parallelization:
something like,
and you can try different combinations.
Something that can help is to use the parallel IO in this way you can restart BS calculation.
In order to activate the parallel IO for BS, I think you need to compile the code using the flag: --enable-netcdf-par-io
but I do not know if there are other variables to be set. Let me check and I will be back with more instructions.
Best,
Daniele
you can try to set manually the BS parallelization:
something like,
Code: Select all
BS_CPU= "2 8 4" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
Something that can help is to use the parallel IO in this way you can restart BS calculation.
In order to activate the parallel IO for BS, I think you need to compile the code using the flag: --enable-netcdf-par-io
but I do not know if there are other variables to be set. Let me check and I will be back with more instructions.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: BSE calculation stopped
Dear Ezekiel,
--enable-netcdf-par-io should be deprecated and the correct flag is: --enable-hdf5-par-io
If you are using a recent version of the code this is set by default so your calculation is probably already using a parallel IO.
You can verify this by looking at the BS database (parallel IO should have a single db, otherwise you have a number of db equal to the MPI tasks you are using).
If this is the case, you can try to relaunch your calculation and it should restart it from the point it stopped.
Best,
Daniele
--enable-netcdf-par-io should be deprecated and the correct flag is: --enable-hdf5-par-io
If you are using a recent version of the code this is set by default so your calculation is probably already using a parallel IO.
You can verify this by looking at the BS database (parallel IO should have a single db, otherwise you have a number of db equal to the MPI tasks you are using).
If this is the case, you can try to relaunch your calculation and it should restart it from the point it stopped.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 27
- Joined: Fri Oct 02, 2020 11:01 pm
Re: BSE calculation stopped
Dear developers,
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.
Regards
Oyeniyi Ezekiel
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.
Regards
Oyeniyi Ezekiel
You do not have the required permissions to view the files attached to this post.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
-
- Posts: 27
- Joined: Fri Oct 02, 2020 11:01 pm
Re: BSE calculation stopped
In addition, my report file is attached.
You do not have the required permissions to view the files attached to this post.
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
-
- Posts: 27
- Joined: Fri Oct 02, 2020 11:01 pm
Re: BSE calculation stopped
Please assist.ezekiel wrote: ↑Fri Aug 12, 2022 7:27 am Dear developers,
My BSE calculations stopped. I really have problem with distribution of CPUs over parameters. Please, an urgent advice on the way forward. If possible, I need a simple and robust tutorial on GW/BSE parallelization. My job script, input , log and report files are attached.
Regards
Oyeniyi Ezekiel
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
- Daniele Varsano
- Posts: 4198
- Joined: Tue Mar 17, 2009 2:23 pm
- Contact:
Re: BSE calculation stopped
Dear Oyeniyi Ezekiel,
do you have some error messages from the batch execution?
you have a quite large kernel ( [BSK] (total): 145800) and probably you are experiencing a memory problem (but I'm not totally sure about it).
You can try to reduce the mpi-processes per node in order to have more available memory.
Try to set:
#PBS -l select=3:ncpus=24:mpiprocs=12
mpirun -np 36 ...
BS_CPU= "36 1 1"
Also, if you compile the code using --enable-memory-profile you will have some information about the used memory in the log files, and knowing how much memory per node you have in your machine you can better tune your input/script.
Best,
Daniele
do you have some error messages from the batch execution?
you have a quite large kernel ( [BSK] (total): 145800) and probably you are experiencing a memory problem (but I'm not totally sure about it).
You can try to reduce the mpi-processes per node in order to have more available memory.
Try to set:
#PBS -l select=3:ncpus=24:mpiprocs=12
mpirun -np 36 ...
BS_CPU= "36 1 1"
Also, if you compile the code using --enable-memory-profile you will have some information about the used memory in the log files, and knowing how much memory per node you have in your machine you can better tune your input/script.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
-
- Posts: 27
- Joined: Fri Oct 02, 2020 11:01 pm
Re: BSE calculation stopped
Dear Daniele,
Thank you for your response. I don't have error messages from the batch execution.
I will try to implement your suggestions.
Many thanks
Ezekiel
Thank you for your response. I don't have error messages from the batch execution.
I will try to implement your suggestions.
Many thanks
Ezekiel
Oyeniyi Ezekiel,
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria
Theoretical Physicist,
Department of Physics,
University of Ibadan, Nigeria