How to define an appropriate parallel structure

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
Stephan
Posts: 62
Joined: Thu Jan 15, 2015 12:48 pm

How to define an appropriate parallel structure

Post by Stephan » Tue Aug 11, 2015 1:27 pm

Hello,
I'm a real newb in parallel calculations, but have the opportiunity to work with yambo on a cluster.
In a first test I want to do IP-linear-response calculation for optical absorption.
The setup run of yambo (yambo -o c) delivers the following input:
#
# __ __ ________ ___ __ __ _______ ______
# /_/\/_/\ /_______/\ /__//_//_/\ /_______/\ /_____/\
# \ \ \ \ \\::: _ \ \\::\| \| \ \\::: _ \ \\:::_ \ \
# \:\_\ \ \\::(_) \ \\:. \ \\::(_) \/_\:\ \ \ \
# \::::_\/ \:: __ \ \\:.\-/\ \ \\:: _ \ \\:\ \ \ \
# \::\ \ \:.\ \ \ \\. \ \ \ \\::(_) \ \\:\_\ \ \
# \__\/ \__\/\__\/ \__\/ \__\/ \_______\/ \_____\/
#
#
# GPL Version 4.0.1 Revision 88
# OpenMPI Build
# http://www.yambo-code.org
#
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
X_q_0_CPU= "" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert= # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert= # [PARALLEL] CPUs for matrix inversion
Chimod= "IP" # [X] IP/Hartree/ALDA/LRC/BSfxc
%QpntsRXd
1 | 1 |
NGsBlkXd= 1 RL # [Xd] Response block size
% BndsRnXd
1 | 70 | # [Xd] Polarization function bands
%
% EnRngeXd
0.00000 | 10.00000 | eV # [Xd] Energy range
%
% DmRngeXd
0.100000 | 0.100000 | eV # [Xd] Damping range
%
ETStpsXd= 100 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%



I want to do the test run on a single node using 10 procs.
I don't understand how to cope with the PARALLEL-parameters. The tutorial for parallelization was not able to clarify the issue.
I tried several parameter combinations like the following

X_q_0_CPU= "10 1 1 1" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "q k c v" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_invert=2 # [PARALLEL] CPUs for matrix inversion
X_finite_q_CPU= "q k c v" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "10 1 1 1" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_invert=2 # [PARALLEL] CPUs

but all I receive ist the following error message:
[ERROR] STOP signal received while in :[05] Optics

[ERROR]Impossible to define an appropriate parallel structure

Can you please give a short introduction what these parameters mean and how to deal with them?
In the tutorial I could not find an appropriate description.

Thanks and Regards

Stephan Ludwig
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany

andrea.ferretti
Posts: 206
Joined: Fri Jan 31, 2014 11:13 am

Re: How to define an appropriate parallel structure

Post by andrea.ferretti » Tue Aug 11, 2015 1:59 pm

Dear Stephan,

the parallel structure of yambo is implemented (at the moment) only for a number of MPI tasks that is
a power of 2 (this is used to allow for flexibility in the implementation and will possibly be removed in the future,
but at the moment is a strict condition)
Therefore, when defining the the number of tasks associated to each "role", power 2 integers should be provided.

This limitation is not present in the openmp parallelism, which can be used to optimize the filling of the machine
(for instance, when running on a machine with 12 cores, I can use 4 MPI tasks and 3 openmp threads, or, if the machine is powerful enough,
8 MPI tasks and 2 threads, leading to some degree of hyperthreading)

as a side note,

Code: Select all

X_q_0_CPU= "10 1 1 1" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "q k c v" # [PARALLEL] CPUs roles (k,c,v)
only requires 3 fields (k c v) and not q (since you are working at q=0 no parallelism is possible)

Code: Select all

X_finite_q_nCPU_invert=2
scalapack linear algebra is instead not fully implemented and better not using it here. Set it to 0 or 1

regarding the choice of the parallel structure (i.e. whether to use "8 1 1 1" or "2 2 2 1" for the calculation of X)
the following comments apply:
* q parallelism does not communicate much but does not scale memory (and may lead to load unbalance)
* k parallelism doe not scale memory much as well
* c, v require to communicate X at the end of the calculation but are very efficient to scale memory.

overall, the different in time among different parallel structure has to be related to load unbalance rather
than communication (this is my experience so far).

take care
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

Stephan
Posts: 62
Joined: Thu Jan 15, 2015 12:48 pm

Re: How to define an appropriate parallel structure

Post by Stephan » Tue Aug 11, 2015 3:11 pm

Thank you very much!
Stephan Ludwig
1. phyical institute
University Stuttgart
Germany

Post Reply