mpirun fails

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
martinspenke
Posts: 149
Joined: Tue Apr 08, 2014 6:05 am

mpirun fails

Post by martinspenke » Wed Oct 28, 2015 8:02 am

Dear Daniele,

I try to run yambo_4.0.2_rev.90 without any specific parallelization strategy, and just by mpirun -np .....
but in the log file yambo requires from me a parallelization strategy, otherwise it fails to run !

no more mpirun possible ?

Bests
Martin
Martin Spenke, PhD Student
Theoretisch-Physikalisches Institut
Universität Hamburg, Germany

User avatar
Daniele Varsano
Posts: 3816
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: mpirun fails

Post by Daniele Varsano » Wed Oct 28, 2015 3:11 pm

Dear Martin,
can you try to add the string:

Code: Select all

 -V par 
when building the input file?
In this way the parallelization strategy should appear in the input file and even if not specified the default should be set.
If it does not work please post here input and output.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: mpirun fails

Post by Davide Sangalli » Wed Oct 28, 2015 11:01 pm

Dear Martin,
the default should work as before. Also now there is not anymore the limitation that the total number of processors have to be a power of 2.
As before you can specify manually the parallelization scheme using -V par, as Daniele suggested.

Can you specify in which case you get the problem ?
(i.e. attach input and details about the system as usual and specify the total number of processors)

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

martinspenke
Posts: 149
Joined: Tue Apr 08, 2014 6:05 am

Re: mpirun fails

Post by martinspenke » Thu Oct 29, 2015 3:32 pm

Dear Daniele and Davide,

-V par works BUT only if i launch mpirun for 8 or 12 processors,
as soon as i increase the number of cores to 16, 32 or 64, it fails to run with the following error :
<03s> P0011: CPU structure provided for the Response_T_space ENVIRONMENT is incomplete. Switching to defaults
P0011: [ERROR] STOP signal received while in :[05] Response Functions in Transition space
P0011: [ERROR]Impossible to define an appropriate parallel structure
My system is quite large, i try to reproduce the problem on a small system and will be back with the input and output files.

Is the default parallelization strategy of yambo_4.0.2_rev.90 the same as in yambo 3.4.1 ?


Best wishes
Martin
Martin Spenke, PhD Student
Theoretisch-Physikalisches Institut
Universität Hamburg, Germany

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: mpirun fails

Post by Davide Sangalli » Thu Oct 29, 2015 3:52 pm

Dear Martin,
I guess you are doing a BSE run and you get the message likely because the product of the input numbers you set for the parallelization does not match the total number of cores you are using.
i.e. if you use 64 processor you need to set something like
BS_CPU= "4.2.8" # [PARALLEL] CPUs for each role
BS_ROLEs= "k.eh.t" # [PARALLEL] CPUs roles (k,eh,t)
since 4*2*8=64
Is this the case ?

Then yeah, on many cores the automatic distribution is not very robust.
We have to improve it. Anyway the default parallelization strategy is different from 3.4.1 because the parallelization is completely different.

Finally also remember the following prescriptions:
k: nk_cores <= nk
nk_cores=1st number in BS_CPU input
nk is total number of kpt in the IBZ

eh: neh_cores<= neh
neh_cores=2nd number in BS_CPU input
neh= number of transition per kpts)
This is the less efficient parallelization approach if you have a periodic system, with symmetries and k-points. Instead is the scheme to be used for calculations at gamma

t: nt_cores < nt
nt_cores=3rd number in BS_CPU input
nt= (nk/nk_cores)*(neh_cores*nk+1)/2

Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

martinspenke
Posts: 149
Joined: Tue Apr 08, 2014 6:05 am

Re: mpirun fails

Post by martinspenke » Thu Oct 29, 2015 5:16 pm

Dear Davide,

many thanks. Yes it was a BSE run but the same error i also get in the GW part.

when i set numbers, it works and runs properly. The point is I do NOT wanna set any numbers. In the last version yambo_4.0.1_rev.89 i could avoid any settings regarding number of processors,
and it could run successfully, but now this is only the case for 8 and 12 cores. For more than this it fails to run.


summary : it works if you set the numbers which should give as product the total number of available cores. Fine. But i do not want this. I would like to have it like yambo_3.4.1 /usr/bin/mpirun np -32 yambo and go without specifying the number of cores in the input file.

Hopefully, i am clear.
many thanks for the prescriptions.
Best wishes
Martin
Martin Spenke, PhD Student
Theoretisch-Physikalisches Institut
Universität Hamburg, Germany

User avatar
Davide Sangalli
Posts: 614
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy
Contact:

Re: mpirun fails

Post by Davide Sangalli » Thu Oct 29, 2015 5:46 pm

Dear Martin,
clear. As I wrote, on many cores the automatic distribution is not very robust. We have to improve it.

If you really do not want to set the input by hand you can try to change the subroutine:
src/parallel/PARALLEL_defaults.F

Try to replace

Code: Select all

 N_basis=1
 do i1=1,13
   if (mod(NC,i1)==0) N_basis=i1
 enddo
with

Code: Select all

 N_basis=2
and recompile yambo.

D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
https://sites.google.com/view/davidesangalli
http://www.max-centre.eu/

martinspenke
Posts: 149
Joined: Tue Apr 08, 2014 6:05 am

Re: mpirun fails

Post by martinspenke » Thu Oct 29, 2015 5:55 pm

Dear Davide,

Thanks, i will modify it.

Any way just for your information:
Yambo_4.0_2_rev.90 on 32 cores seems to be stable.
I yesterday ran a calculation on 32 cores after i set the numbers in the input file.
And got my calculations finished. I have for now no problem with the stability and robustness, at least not on 32 cores.

Bests
Martin
Martin Spenke, PhD Student
Theoretisch-Physikalisches Institut
Universität Hamburg, Germany

Post Reply