<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>mvapich2 block or dead lock with default BLACS topology</title>
  <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_recent_posts?p_l_id=" />
  <subtitle>mvapich2 block or dead lock with default BLACS topology</subtitle>
  <entry>
    <title>mvapich2 block or dead lock with default BLACS topology</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=465236" />
    <author>
      <name>Jing Zhang</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=465236</id>
    <updated>2013-02-22T00:32:13Z</updated>
    <published>2013-02-22T00:28:28Z</published>
    <summary type="html">Hi, it seems to be a bug for mvapich2/1.9a2, when calling BLACS functions &amp;#039;dgebs2d&amp;#039; and &amp;#039;dgebr2d&amp;#039; with TOP=&amp;#039; &amp;#039;. All processes are blocked or dead-lock (not sure), will never return from &amp;#039;dgebs2d&amp;#039; or &amp;#039;dgebr2d&amp;#039;. The sample code is as follow,&lt;br /&gt;&lt;br /&gt;&lt;div class="code"&gt;&lt;span class="code-lines"&gt;&amp;nbsp;1&lt;/span&gt;c&amp;nbsp; &amp;nbsp; bcast.f&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;2&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;3&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;program test&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;4&lt;/span&gt;c------------------------------------------------------------&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;5&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;include &amp;#039;mpif.h&amp;#039;&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;6&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;integer nprocs,me&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;7&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;integer nprow,npcol,ictxt,myrow,mycol&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;8&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;integer n&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;9&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;parameter(n=1000)&lt;br /&gt;&lt;span class="code-lines"&gt;10&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;double precision A(n,n)&lt;br /&gt;&lt;span class="code-lines"&gt;11&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;12&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;call blacs_pinfo(me,nprocs)&lt;br /&gt;&lt;span class="code-lines"&gt;13&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;nprow=int(sqrt(float(nprocs)))&lt;br /&gt;&lt;span class="code-lines"&gt;14&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;npcol=int(nprocs/nprow)&lt;br /&gt;&lt;span class="code-lines"&gt;15&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;if(nprow*npcol.gt.nprocs)npcol=npcol-1&lt;br /&gt;&lt;span class="code-lines"&gt;16&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;print *,&amp;#039;nprow=&amp;#039;,nprow,&amp;#039; npcol=&amp;#039;,npcol&lt;br /&gt;&lt;span class="code-lines"&gt;17&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;18&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;call blacs_get(-1, 0, ictxt)&lt;br /&gt;&lt;span class="code-lines"&gt;19&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;call blacs_gridinit(ictxt,&amp;#039;R&amp;#039;,nprow,npcol)&lt;br /&gt;&lt;span class="code-lines"&gt;20&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;call blacs_gridinfo(ictxt,nprow,npcol,myrow,mycol)&lt;br /&gt;&lt;span class="code-lines"&gt;21&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;print *,&amp;#039;myrow=&amp;#039;,myrow,&amp;#039; mycol=&amp;#039;,mycol&lt;br /&gt;&lt;span class="code-lines"&gt;22&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;23&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;if(myrow.eq.0.and.mycol.eq.0)then&lt;br /&gt;&lt;span class="code-lines"&gt;24&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; call dgebs2d(ictxt,&amp;#039;All&amp;#039;,&amp;#039; &amp;#039;,n,n,A,n)&lt;br /&gt;&lt;span class="code-lines"&gt;25&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;else&lt;br /&gt;&lt;span class="code-lines"&gt;26&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; call dgebr2d(ictxt,&amp;#039;All&amp;#039;,&amp;#039; &amp;#039;,n,n,A,n,0,0)&lt;br /&gt;&lt;span class="code-lines"&gt;27&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;endif&lt;br /&gt;&lt;span class="code-lines"&gt;28&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;print *,&amp;#039;myrow=&amp;#039;,myrow,&amp;#039; mycol=&amp;#039;,mycol, &amp;#039;pass&amp;#039;&lt;br /&gt;&lt;span class="code-lines"&gt;29&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;30&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;if(myrow.ge.0)call blacs_gridexit(ictxt)&lt;br /&gt;&lt;span class="code-lines"&gt;31&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;call blacs_exit(0)&lt;br /&gt;&lt;span class="code-lines"&gt;32&lt;/span&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;end&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The code is compiled and executed like this,&lt;br /&gt;&lt;div class="code"&gt;&lt;span class="code-lines"&gt;1&lt;/span&gt;/opt/apps/intel13/mvapich2/1.9/bin/mpif77 -O2 -traceback&amp;nbsp; -o bcast.x bcast.f -L/opt/apps/intel/13/composer_xe_2013.2.146/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core&amp;nbsp; -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm&lt;br /&gt;&lt;span class="code-lines"&gt;2&lt;/span&gt;ibrun -n 4 -o 0 ./bcast.x&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;If the TOP is set to like &amp;#039;i-ring&amp;#039; or &amp;#039;1-tree&amp;#039;, the code works well. Or if intel MPI is used instead of mvapich2, everything is fine. &lt;br /&gt;&lt;br /&gt;Is there some magic env or setting could fix this issue? Or is this a bug need to be fixed?&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Jing</summary>
    <dc:creator>Jing Zhang</dc:creator>
    <dc:date>2013-02-22T00:28:28Z</dc:date>
  </entry>
</feed>

