intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)
Posted: Thu Jan 09, 2020 10:43 am
Dear all,
when calculating an 2D heterostack (for BSE, converging k-points), I came across instabilities of the intel-mkl (2019) linked with qe and yambo:
In DFT (QE), when increasing the k-point density from 12x12x1 to 18x18x1 (hexagonal unit cell), I had to change the parallelization strategy in QE in order to avoid segfaults (running qe with mpirun pw.x -nk 2, which means two pools of processors for k-point parallelization).
Independent on the k-point density - also for the 12x12x1 - I found a segfault when yambo was reading the last part of the wavefunction on the first processor:
Thus, I thought to use an older version of yambo without mkl (4.2.1) to process the database: ./SAVE//ndb.gops; Variable GROT; NetCDF: Start+count exceeds dimension bound suddenly appeared. In order to resolve this, I removed the ndb.gops and ran the initialization step of yambo, once again with the older version. And it runs. But it's a bit unsatisfactory as the numerical routines from the system are loaded that decrease the performance strongly.
So my question is what to do in order to get yambo running performantly with the mkl. From the QE experience, I also modified the parallelization strategy in yambo - without success. Is there any idea on this? May this be a compilation issue?
Thank you very much!
Christian
P.S. Some i/o on the 6.4.1. run in the appendix
edit: typos.
when calculating an 2D heterostack (for BSE, converging k-points), I came across instabilities of the intel-mkl (2019) linked with qe and yambo:
In DFT (QE), when increasing the k-point density from 12x12x1 to 18x18x1 (hexagonal unit cell), I had to change the parallelization strategy in QE in order to avoid segfaults (running qe with mpirun pw.x -nk 2, which means two pools of processors for k-point parallelization).
Independent on the k-point density - also for the 12x12x1 - I found a segfault when yambo was reading the last part of the wavefunction on the first processor:
Code: Select all
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x13787
[ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x7f1d77d065e0]
[ 1] /beegfs-home/modules/intelmkl/compilers_and_libraries_2019/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so(cdotc+0xba)[0x7f1d7f1eaeda]
So my question is what to do in order to get yambo running performantly with the mkl. From the QE experience, I also modified the parallelization strategy in yambo - without success. Is there any idea on this? May this be a compilation issue?
Thank you very much!
Christian
P.S. Some i/o on the 6.4.1. run in the appendix
edit: typos.