Page 1 of 1

Segmentation fault when running hBN-2D-para example

Posted: Sat May 26, 2018 6:28 am
by lance xu
Hi,
I am learning how to work with Yambo in a parallel environment following the hBN-2D-para example.
http://www.yambo-code.org/wiki/index.ph ... strategies
While testing pure MPI scaling, using 1, 2, 4, and 16 MPI processes all yield seemingly good results. However, the simulation failed with 8 MPI processes. The error message is as shown.

Code: Select all

[cg17-4.agave.rc.asu.edu:mpi_rank_5][error_sighandler] Caught error: Segmentation fault (signal 11)
[cg17-4.agave.rc.asu.edu:mpi_rank_7][error_sighandler] Caught error: Segmentation fault (signal 11)
[cg17-4.agave.rc.asu.edu:mpi_rank_3][error_sighandler] Caught error: Segmentation fault (signal 11)
[cg17-4.agave.rc.asu.edu:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: cg17-4: tasks 1,3,5,7: Segmentation fault
I don't quite know why this only happens when 8 MPI processes are used. My Yambo (4.2.2) is compiled against mvapich2/2.3b, Intel 2018x, and QE 6.2.1. MKL is linked for BLACS, BLAS, LAPCAK, ScaLAPACK, and FFT. Other required libs are the internal ones.

Thank you very much!
Weiqing Xu

Re: Segmentation fault when running hBN-2D-para example

Posted: Mon May 28, 2018 8:32 am
by Daniele Varsano
Dear Weiqing Xu,
that's sound quite strange. From the error you post we cannot say much, can you post in attachment your input, report and log files?
Thanks,

Daniele

Re: Segmentation fault when running hBN-2D-para example

Posted: Mon May 28, 2018 6:14 pm
by lance xu
Hi Daniele,

The attached tarball contains my input, report, log, and some configuration information.
Thank you very much!

Weiqing Xu

Re: Segmentation fault when running hBN-2D-para example

Posted: Mon May 28, 2018 6:40 pm
by Daniele Varsano
Dear Weiking,
I do suspect the problem is related to the cpu assigned to the linear algebra considering you are dealing with a very small matrix (NGsBlkXp= 4 RL ).
Note that in the tutorial the size of the screening matrix is set to 4 Ry and not 4 RL.

Best,
Daniele

Re: Segmentation fault when running hBN-2D-para example

Posted: Mon May 28, 2018 7:13 pm
by lance xu
Hi Daniele,

Thank you for catching that! But the same error message still shows up even after I fix it.
Here are the new run and log files.

Weiqing

Re: Segmentation fault when running hBN-2D-para example

Posted: Tue May 29, 2018 8:54 am
by Daniele Varsano
Dear Weiqing,
I will try to reproduce your problem, in the meanwhile can you try to repeat your calculations using:

Code: Select all

X_all_q_nCPU_LinAlg_INV= 1
Best,
Daniele

Re: Segmentation fault when running hBN-2D-para example

Posted: Tue May 29, 2018 10:52 pm
by lance xu
Hi Daniele,

It works, the error message no longer shows up. And I got a time vs the number of MPI tasks plot as expected. So what is special about 8 MPI processes?

Weiqing

Re: Segmentation fault when running hBN-2D-para example

Posted: Wed May 30, 2018 8:14 am
by Daniele Varsano
Dear Weiking,
I've reproduced your problem, we will inspect what is going wrong and fix it.
In the meanwhile, you can safely continue to use yambo without using linear algebra parallelization, as you can see you can observe a very good scaling even without using it.

Thanks for reporting,

Best,
Daniele

Re: Segmentation fault when running hBN-2D-para example

Posted: Wed Jun 27, 2018 11:37 am
by Davide Sangalli
Dear Weiking,

Code: Select all

X_all_q_nCPU_LinAlg_INV
needs to be set to a value which is the square of an integer.
Thus 1, 4, 9, 16 etc.

Doing so it should work.
Best,
D.