In coordination with the ACCESS team, XSEDE has paused the processing of Startup, Education, and other allocation requests from August 17-31. This pause will ensure that no requests are lost while we are making infrastructure updates behind the scenes and handing things over to ACCESS. If you have questions, contact help@xsede.org. The ACCESS team will resume taking allocation requests on September 1 via https://access-ci.org/.

The discussion forums in the XSEDE User Portal are for users to share experiences, questions, and comments with other users and XSEDE staff. Visitors are welcome to browse and search, but you must login to contribute to the forums. While XSEDE staff monitor the lists, XSEDE does not guarantee that questions will be answered. Please note that the forums are not a replacement for formal support or bug reporting procedures through the XSEDE Help Desk. You must be logged in to post to the user forums.

« Back

Offload bug in OpenMP reduction in C main code

Combination View Flat View Tree View
Threads [ Previous | Next ]
Compiler Bug:
The following C code which uses an OpenMP reduction in an offloaded code block of the main program gives correct results when fewer than 9 threads are used, but incorrect results for 9 threads and above. This occurs in the 13.0.1.117 Intel compiler, but not in the previous revision (13.0.1.079). This only occurs when the code block is in the main routine (and the reduction is offloaded). The error does not occur when the block is executed natively, or in functions called within an offload. This does not occur in similar Fortran code. The program below is a sanitized version of the code in the "issue" (bug) reported to Intel. The bug is fixed in the next release, due within 3 weeks of this posting. The compiler command, environment and execution command are also given. If you have any concerns about similar code in your program, please submit a ticket.

Work around:
Turn off fast reductions with the compiler option: -offload-option,mic,compiler,"-mP2OPT_hpo_fast_reduction=FALSE".

icc -offload-option,mic,compiler,"-mP2OPT_hpo_fast_reduction=FALSE" bug2.c
export MIC_OMP_NUM_THREADS=9
export MIC_PREFIX=MIC
./a.out



bug2.c code:

 1
 2#include <omp.h>
 3int main()
 4{
 5  double sum; int i,n, nt;
 6      
 7  n=2000;
 8  sum=0.0e0;
 9   
10#pragma offload target(mic:0) inout(sum)
11  {
12#pragma omp parallel for reduction(+:sum)
13    for(i=1;i<=n;i++)
14      {
15                sum += (double)i;
16      }
17    nt = omp_get_max_threads();
18    printf("Hello MIC reduction %f threads: %d\n",sum,nt);
19  }
20}


icc bug1.c
export MIC_OMP_NUM_THREADS=9
export MIC_PREFIX=MIC
./a.out #gives incorrect results