Page 1 of 1

Parallel TDDFT error

Posted: Sun Jul 14, 2019 4:08 pm
by will
Dear Yambo experts,
I've got a problem when I was doing parallel TDDFT calculations. Below are my input and log files. I used Yambo-4.1.3. And there are 16 kpoints.
input:->
_________________________________________
optics # [R OPT] Optics
chi # [R CHI] Dyson equation for Chi.
tddft # [R K] Use TDDFT kernel
NLogCPUs= 0 # [PARALLEL] Live-timing CPU`s (0 for all)
FFTGvecs= 4007 RL # [FFT] Plane-waves
X_q_0_CPU= "8 1 1" # [PARALLEL] CPUs for each role
X_q_0_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
X_q_0_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
X_finite_q_CPU= "1 8 1 1" # [PARALLEL] CPUs for each role
X_finite_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_finite_q_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
Chimod= "ALDA" # [X] IP/Hartree/ALDA/LRC/BSfxc
FxcGRLc= 705 RL # [TDDFT] XC-kernel RL size
NGsBlkXd= 705 RL # [Xd] Response block size
% QpntsRXd
1 | 1 | # [Xd] Transferred momenta
%
% BndsRnXd
1 | 384 | # [Xd] Polarization function bands
%
% EnRngeXd
0.000000 | 6.000000 | eV # [Xd] Energy range
%
% DmRngeXd
0.10000 | 0.10000 | eV # [Xd] Damping range
%
ETStpsXd= 200 # [Xd] Total Energy steps
% LongDrXd
1.000000 | 0.000000 | 0.000000 | # [Xd] [cc] Electric Field
%
__________________________________________________________________________________

log:->
___________________________________________________________
<---> P0001: [M 0.059 Gb] Alloc RL_Gshells RL_Eshells ( 0.018)

____ ____ ___ .___ ___. .______ ______
\ \ / / / \ | \/ | | _ \ / __ \
\ \/ / / ^ \ | \ / | | |_) | | | | |
\_ _/ / /_\ \ | |\/| | | _ < | | | |
| | / _____ \ | | | | | |_) | | `--" |
|__| /__/ \__\ |__| |__| |______/ \______/



<---> P0001: [01] CPU structure, Files & I/O Directories
<---> P0001: CPU-Threads:8(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)
<---> P0001: CPU-Threads:X_q_0(environment)-8 1 1(CPUs)-k c v(ROLEs)
<---> P0001: CPU-Threads:X_finite_q(environment)-1 8 1 1(CPUs)-q k c v(ROLEs)
<---> P0001: [02] CORE Variables Setup
<---> P0001: [02.01] Unit cells
<01s> P0001: [02.02] Symmetries
<01s> P0001: [02.03] RL shells
<01s> P0001: [02.04] K-grid lattice
<01s> P0001: [02.05] Energies [ev] & Occupations
<01s> P0001: [03] Transferred momenta grid
<01s> P0001: [M 0.327 Gb] Alloc bare_qpg ( 0.260)
<02s> P0001: [04] External corrections
<02s> P0001: [05] Optics
<02s> P0001: [LA] SERIAL linear algebra
<02s> P0001: [PARALLEL Response_G_space_Zero_Momentum for K(ibz) on 8 CPU] Loaded/Total (Percentual):2/16(13%)
<02s> P0001: [PARALLEL Response_G_space_Zero_Momentum for CON bands on 1 CPU] Loaded/Total (Percentual):56/56(100%)
<02s> P0001: [PARALLEL Response_G_space_Zero_Momentum for VAL bands on 1 CPU] Loaded/Total (Percentual):328/328(100%)
<01m-11s> P0001: [M 0.374 Gb] Alloc WF ( 0.048)
<01m-11s> P0001: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):656/5248(13%)
<01m-11s> P0001: [WF] Performing Wave-Functions I/O from ./SAVE
<01m-11s> P0001: [FFT-Rho] Mesh size: 12 18 45
<01m-11s> P0001: [M 0.472 Gb] Alloc wf_disk ( 0.097)
<01m-11s> P0001: Reading wf_fragments_1_1
<01m-11s> P0001: Reading wf_fragments_1_2
<01m-11s> P0001: Reading wf_fragments_1_3
<01m-11s> P0001: Reading wf_fragments_1_4
<01m-12s> P0001: Reading wf_fragments_1_5
<01m-12s> P0001: Reading wf_fragments_1_6
<01m-12s> P0001: Reading wf_fragments_1_7
<01m-12s> P0001: Reading wf_fragments_2_1
<01m-13s> P0001: Reading wf_fragments_2_2
<01m-13s> P0001: Reading wf_fragments_2_3
<01m-13s> P0001: Reading wf_fragments_2_4
<01m-13s> P0001: Reading wf_fragments_2_5
<01m-14s> P0001: Reading wf_fragments_2_6
<01m-14s> P0001: Reading wf_fragments_2_7
<01m-14s> P0001: [M 0.374 Gb] Free wf_disk ( 0.097)
<01m-15s> P0001: [xc] Functional Perdew, Burke & Ernzerhof(X)+Perdew, Burke & Ernzerhof(C)
__________________________________________________________________________________________
job script:->
_______________________________________________________________________________________
#!/bin/bash
#SBATCH -J SnTe
#SBATCH --get-user-env
#SBATCH -e exclusive
#SBATCH -N 1
#SBATCH -n 8
#SBATCH -p C032M0128G
#SBATCH --qos low
bin=~/software/yambo-4.1.3/bin/yambo
jf="04_tddft.in"
srun hostname -s | sort -u > slurm.hosts
mpirun -n 8 -machinefile slurm.hosts $bin -F ${jf} -J 04_tddft
rm -rf slurm.hosts
_______________________________________________________________________________
And slurm log :->
___________________________________________________________________

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 432360 RUNNING AT b2u09n3
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
_______________________________________________________________________________________

Could you help me? Thank you very much!

Best,
Xiaowei

Re: Parallel TDDFT error

Posted: Mon Jul 15, 2019 12:02 pm
by Daniele Varsano
Dear Xiaowei,
actually, I strongly suggest you perform tddft calculation in transition space instead of reciprocal space.

Code: Select all

yambo -o b -k alda -y d 
In this way, you do not need to specify the number of G vectors entering in the TDDFT kernel and all g vectors are automatically taken into account.
In your previous run it is possible that you faced a memory allocation problem, but it need to be verified.
Can you check if at the end of the other log files there is some message that can help us in spotting the problem?

Best,
Daniele


Addendum: note that you are running a quite old version of the code, so I strongly suggest you update the version of the code as many bugs were solved since then.

Re: Parallel TDDFT error

Posted: Mon Jul 15, 2019 2:19 pm
by will
Dear Daniele,
Thanks for your reply. I've uploaded other log files and other testing calculations in transition space as you suggested. It seems that the problem is due to memory limit for the calculations in transition space. Actually, there are 200 atoms in my system. :(
I'm also trying to use the latest Yambo to do it. Thanks!

Best,
Xiaowei