Compiler Bug:
The following C code which uses an OpenMP reduction in an offloaded code block of the main program gives correct results when fewer than 9 threads are used, but incorrect results for 9 threads and above. This occurs in the 13.0.1.117 Intel compiler, but not in the previous revision (13.0.1.079). This only occurs when the code block is in the main routine (and the reduction is offloaded). The error does not occur when the block is executed natively, or in functions called within an offload. This does not occur in similar Fortran code. The program below is a sanitized version of the code in the "issue" (bug) reported to Intel. The bug is fixed in the next release, due within 3 weeks of this posting. The compiler command, environment and execution command are also given. If you have any concerns about similar code in your program, please submit a ticket.
Work around:
Turn off fast reductions with the compiler option: -offload-option,mic,compiler,"-mP2OPT_hpo_fast_reduction=FALSE".
icc -offload-option,mic,compiler,"-mP2OPT_hpo_fast_reduction=FALSE" bug2.c
export MIC_OMP_NUM_THREADS=9
export MIC_PREFIX=MIC
./a.out
bug2.c code:
1
2#include <omp.h>
3int main()
4{
5 double sum; int i,n, nt;
6
7 n=2000;
8 sum=0.0e0;
9
10#pragma offload target(mic:0) inout(sum)
11 {
12#pragma omp parallel for reduction(+:sum)
13 for(i=1;i<=n;i++)
14 {
15 sum += (double)i;
16 }
17 nt = omp_get_max_threads();
18 printf("Hello MIC reduction %f threads: %d\n",sum,nt);
19 }
20}
icc bug1.c
export MIC_OMP_NUM_THREADS=9
export MIC_PREFIX=MIC
./a.out #gives incorrect results