Page 2 of 2

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Mon Aug 26, 2019 3:32 pm
by z.hooshmand
Dear Daniele,

Thanks for your reply. I have attached my report and log files to this post for you. Please let me know what I am missing.

For installing Yambo on the machine that I am using, I have followed these steps:
1. ./configure
2. make all

I don't know how to install it in parallel. Your help is very much appreciated.

Best,
Zahra
______________________
Zahra Hooshmand
University of Central Florida

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Mon Aug 26, 2019 3:35 pm
by z.hooshmand
My apologies! The files were not attached. Here they are.

Zahra Hooshmand
University of Central Florida

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Mon Aug 26, 2019 3:38 pm
by z.hooshmand
Zahra Hooshmand
University of Central Florida

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Mon Aug 26, 2019 5:00 pm
by Daniele Varsano
Dear Zahra,
as you can see from the report, you have 16 cpus doing the same job (you can see all the information that yambo prints, repeated 16 times). This means that yambo has been compiled in serial. The reason is that weather you do not have the MPI module installed in your machine, or the mpi module have not been loaded before the configure, or the configure is not able to recognize the parallel compiler (usually mpif90, mpiifort) depending on what compiler you have installed in your machine.
In order to compile in parallel probably, you should instruct the configure where to find the MPI compiler. If you post the config.log file it can help to find the missing piece.

Best,
Daniele

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Mon Aug 26, 2019 8:36 pm
by z.hooshmand
Dear Daniele,

I compiled yambo again making sure that the mpi module was loaded before configure. Here is the log file.
Please let me know if this is compiled in parallel.

Best,
Zahra Hooshmand
University of Central Florida

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Mon Aug 26, 2019 10:55 pm
by z.hooshmand
Dear Daniele,

I believe the code is compiled in parallel. I started running it in parallel to get the absorption spectrum. I have attached my input file.
When examining the log files, while the first few steps take only a few seconds, the step in which the wf are used, takes several minutes and eventually the calculations stops. I have also attached one of log files.
I am using 4 nodes each with 16 cpus. So:

1. Why it takes so long in that step ()
2. why still in o-eps... file as well as o.eel... file only one line is printed? Is this related to 1? I have checked the energy range and energy steps so I have no idea why this step does not proceed.
3. For a system with the size of my system, what is your recommendation on number of processors? Is 64 cpus enough?

Thanks a lot for your help!
Best,
Zahra
___________________
Zahra Hooshmand
University of Central Florida

Re: BSE stop with "Allocation of K_slk%blc failed"

Posted: Tue Aug 27, 2019 8:34 am
by Daniele Varsano
Dear Zhara,
1. Why it takes so long in that step ()
Because your wfs are large, moreover they need a lot of memory (more than 8Gb per core):
from the l*:

Code: Select all

<01m-26s> P0001: [M  8.299 Gb] Alloc wf_disk ( 3.363)
then it stops as you are running out-of-memory.
2. why still in o-eps... file as well as o.eel... file only one line is printed?
Because the run stops and the calculation is not terminated correctly, actually the linear response calculation has not even started.
3. For a system with the size of my system, what is your recommendation on number of processors? Is 64 cpus enough?
You are running out of memory so you need to reduce the cpu per node accordingly to the available memory per node.
Also assigning more cpus to "v" and "c" instead of "k" will help in distributing the memory among cpus.

Best,
Daniele