<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>General Discussion</title>
  <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_category?p_l_id=&amp;mbCategoryId=22180" />
  <subtitle>Please post threads under this general XSEDE User Forum category, we will create sub-categories as relevant threads emerge.</subtitle>
  <entry>
    <title>Flash scratch storage - what happens if job runs out of walltime?</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2986037" />
    <author>
      <name>Carl Lemmon</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2986037</id>
    <updated>2022-05-12T21:24:03Z</updated>
    <published>2022-05-12T21:24:03Z</published>
    <summary type="html">Hi all,&lt;br /&gt;&lt;br /&gt;I currently have a trial allocation on expanse and I am trying to get my software (ORCA) configured as best as possible before I run some benchmarks.&lt;br /&gt;&lt;br /&gt;Often I am running restartable jobs, and will run out of walltime and need to resubmit the job. I would like to use the SSD flash scratch storage to hold my running jobs, the storage that is accessed under &amp;#034;/scratch/$USER/job/$SLURM_JOB_ID&amp;#034;. The guide specifies this is only accessible during a job run. Obviously, after the job is done, whether it finishes successfully or fails, it goes to my next command in the slurm script which is to move all of the data back to home directory. But if it runs out of walltime while ORCA is still running, that next command to move the data back to home will not run. I was wondering what I can do to ensure that I am not wasting my SUs of that job by losing all of the data.&lt;br /&gt;&lt;br /&gt;Other clusters have a &amp;#034;orphan&amp;#034; folder where this lost data is sometimes held for a time. Do XSEDE clusters or expanse have an option like this?&lt;br /&gt;&lt;br /&gt;Alternatively, I have used epilog scripts before in TORQUE, and I know SLURM also has the option of epilog scripts. Will an epilog script have access to that scratch folder, and if so can I use an epilog script to copy it back so that it will always copy whether the job runs out of walltime or not?</summary>
    <dc:creator>Carl Lemmon</dc:creator>
    <dc:date>2022-05-12T21:24:03Z</dc:date>
  </entry>
  <entry>
    <title>Help with `srun nvidia-smi` for power measurements</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2975453" />
    <author>
      <name>Wileam Yonatan Phan</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2975453</id>
    <updated>2022-04-27T14:25:43Z</updated>
    <published>2022-04-27T14:25:43Z</published>
    <summary type="html">I&amp;#039;m currently writing a script to get power measurements from the NVIDIA V100 GPU on Expanse-GPU (SDSC). Out of the three methods possible (`nvidia-smi`, NVML, and CUPTI), it seems that `nvidia-smi` has the lowest barrier to entry because it comes with the NVIDIA drivers.&lt;br /&gt;&lt;br /&gt;Currently I have the following three concerns:&lt;br /&gt;&lt;br /&gt; 1. How to deal with Slurm.&lt;br /&gt;    For some reason, at Expanse-GPU, after getting an interactive node on the `gpu-shared` partition (or any other Expanse-GPU partition, for that matter) , you can only run `srun nvidia-smi` but not `nvidia-smi` directly. This suggests that either Expanse has a three-tiered node hierarchy (login, batch, compute) like Summit, or for some reason the environment isn&amp;#039;t setup properly outside of `srun`. Which one of these two scenarios is true?&lt;br /&gt;&lt;br /&gt;2. How to spawn and kill `nvidia-smi` properly.&lt;br /&gt;    This is a technical aspect, but I&amp;#039;m not sure how to reliably spawn and kill (or start and stop) `nvidia-smi` in the Slurm job script. I am only mildly familiar with `pidof` and `kill`.&lt;br /&gt;&lt;br /&gt;3. How to set polling interval appropriately.&lt;br /&gt;    This is more of a conceptual thing. In quantum physics, a measurement changes the system. I think a similar concept applies here since we&amp;#039;re polling the GPU by using `nvidia-smi`. If the frequency is too high, we might incur a performance cost. If the frequency is too low, the data will be inaccurate and the program to be analyzed might finish entirely within the polling interval. How can I determine the appropriate polling interval?&lt;br /&gt;&lt;br /&gt;I have also submitted ticket #21526 to the Expanse help desk.</summary>
    <dc:creator>Wileam Yonatan Phan</dc:creator>
    <dc:date>2022-04-27T14:25:43Z</dc:date>
  </entry>
  <entry>
    <title>RE: Running ORCA on Expanse</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2698444" />
    <author>
      <name>Nicole Wolter</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2698444</id>
    <updated>2021-03-22T20:21:52Z</updated>
    <published>2021-03-22T20:21:52Z</published>
    <summary type="html">HI Andrew-&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In general we would need to know a bit more to assist you.  What is the full command you ran with the complete filename. &lt;br /&gt;&lt;br /&gt;However, I recommend that you submit a help ticket to help@xsede.org and the SDSC staff can help you directly to install ORCA on the system.  &lt;br /&gt;&lt;br /&gt;Nicole</summary>
    <dc:creator>Nicole Wolter</dc:creator>
    <dc:date>2021-03-22T20:21:52Z</dc:date>
  </entry>
  <entry>
    <title>Running ORCA on Expanse</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2694422" />
    <author>
      <name>Andrew Isho</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2694422</id>
    <updated>2021-03-21T04:07:26Z</updated>
    <published>2021-03-17T01:26:31Z</published>
    <summary type="html">Hello,&lt;br /&gt;&lt;br /&gt;I received a start-up allocation for expanse and have been trying to run geometry optimizations for a research project. I have previous experience from a class I took in my graduate program but we submitted our jobs to the Comet cluster. My teacher provided us with the a bash file template and we used a script that called on ORCA from my teacher&amp;#039;s home directory. I tried installing ORCA on Expanse by downloading the tar.xz file from the ORCA forum and transferring the file to my home directory on the Expanse cluster. I then tried to install ORCA on Expanse by using the tar -xf command and received the following:&lt;br /&gt;&lt;br /&gt;xz: (stdin): File format not recognized&lt;br /&gt;tar: Child returned status 1&lt;br /&gt;tar: Error is not recoverable: exiting now&lt;br /&gt;&lt;br /&gt;I&amp;#039;m not sure if I am using the wrong command, but from my understanding Expanse runs on CentOS, so I don&amp;#039;t know why this command wouldn&amp;#039;t work.&lt;br /&gt;&lt;br /&gt;Has anyone else encountered this error or know how I can install ORCA to run on Expanse?</summary>
    <dc:creator>Andrew Isho</dc:creator>
    <dc:date>2021-03-17T01:26:31Z</dc:date>
  </entry>
  <entry>
    <title>comet's gdal missing glibc dependency?</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2521096" />
    <author>
      <name>Tylar Murray</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2521096</id>
    <updated>2020-07-25T20:54:12Z</updated>
    <published>2020-07-25T20:54:12Z</published>
    <summary type="html">I&amp;#039;m trying to run gdaladdo on comet and getting a &amp;#034;GLIBC not found&amp;#034; error.&lt;br /&gt;I am using the gdal module but still getting this error.&lt;br /&gt;&lt;br /&gt;Is there a way for me to fix the error in my submission script?&lt;br /&gt;I tried installing with yum but got a permissions error (not surprising).&lt;br /&gt;&lt;br /&gt;Here is my submission script:&lt;br /&gt;&lt;br /&gt;&lt;div class="code"&gt;&lt;span class="code-lines"&gt;&amp;nbsp;1&lt;/span&gt;#!/bin/bash&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;2&lt;/span&gt;#SBATCH --job-name =&amp;#034;wv_mosaic_gdal&amp;#034;&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;3&lt;/span&gt;#SBATCH --mem=1G&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;4&lt;/span&gt;#SBATCH --partition=large-shared&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;5&lt;/span&gt;#SBATCH --time=24:00:00&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;6&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;7&lt;/span&gt;module purge&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;8&lt;/span&gt;module load gdal&lt;br /&gt;&lt;span class="code-lines"&gt;&amp;nbsp;9&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;10&lt;/span&gt;export GDAL_VRT_ENABLE_PYTHON=YES&lt;br /&gt;&lt;span class="code-lines"&gt;11&lt;/span&gt;export GDAL_VERT_ENABLE_PYTHON&lt;br /&gt;&lt;span class="code-lines"&gt;12&lt;/span&gt;&lt;br /&gt;&lt;span class="code-lines"&gt;13&lt;/span&gt;gdaladdo --config BIGTIFF_OVERIEW YES -ro CETX_overview.vrt 1&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;and the full error:&lt;br /&gt;&lt;br /&gt;&lt;div class="code"&gt;&lt;span class="code-lines"&gt;1&lt;/span&gt;gdaladdo: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21&amp;#039; not found (required by gdaladdo)&lt;br /&gt;&lt;span class="code-lines"&gt;2&lt;/span&gt;gdaladdo: /lib64/libstdc++.so.6: version `CXXABI_1.3.8&amp;#039; not found (required by /opt/gdal/lib/libgdal.so.20)&lt;br /&gt;&lt;span class="code-lines"&gt;3&lt;/span&gt;gdaladdo: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21&amp;#039; not found (required by /opt/gdal/lib/libgdal.so.20)&lt;br /&gt;&lt;span class="code-lines"&gt;4&lt;/span&gt;gdaladdo: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20&amp;#039; not found (required by /opt/gdal/lib/libgdal.so.20)&lt;br /&gt;&lt;span class="code-lines"&gt;5&lt;/span&gt;gdaladdo: /lib64/libstdc++.so.6: version `CXXABI_1.3.9&amp;#039; not found (required by /opt/gdal/lib/libgdal.so.20)&lt;br /&gt;&lt;/div&gt;</summary>
    <dc:creator>Tylar Murray</dc:creator>
    <dc:date>2020-07-25T20:54:12Z</dc:date>
  </entry>
  <entry>
    <title>Protein Sequence XSEDE Beginners Help</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2486140" />
    <author>
      <name>subhadra paudel</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2486140</id>
    <updated>2020-06-04T00:07:25Z</updated>
    <published>2020-06-04T00:06:58Z</published>
    <summary type="html">Hello Everyone,&lt;br /&gt;&lt;br /&gt;I am working on a bioinformatics research project related to a bacterial protein. I am planning on utilizing XSEDE resources in my research. &lt;br /&gt;Actually, I have watched some XSEDE beginners tutorial which gives me the idea about what XSEDE is capable of. But I am not sure how and where to begin. I would be grateful if you guys with the similar experience could guide me through or share your experience as a beginner.&lt;br /&gt;&lt;br /&gt;Thank you in advance.&lt;br /&gt;&lt;br /&gt;Subha Paudel,</summary>
    <dc:creator>subhadra paudel</dc:creator>
    <dc:date>2020-06-04T00:06:58Z</dc:date>
  </entry>
  <entry>
    <title>module not found</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2354404" />
    <author>
      <name>Iftikhar Ali</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2354404</id>
    <updated>2020-01-11T17:35:23Z</updated>
    <published>2020-01-11T17:35:23Z</published>
    <summary type="html">Hello &lt;br /&gt;I upload python with the command &lt;em&gt;module load python3&lt;/em&gt;, but when I run my&lt;br /&gt;program for example &lt;em&gt;pyhton3 filename.py&lt;/em&gt;, it gives the following error &lt;em&gt;no module found&lt;br /&gt;matplotlib&lt;/em&gt;.  Similarly for other python libraries, like pandas, or tensorflow.&lt;br /&gt;I tries to install with the command &lt;em&gt;pip install package name &lt;/em&gt;, but it does not work.&lt;br /&gt;Mainly in the error message it says connection problem or no module is found.&lt;br /&gt;&lt;br /&gt;Any help is appreciated. Thanks.&lt;br /&gt;&lt;br /&gt;Iftikhar</summary>
    <dc:creator>Iftikhar Ali</dc:creator>
    <dc:date>2020-01-11T17:35:23Z</dc:date>
  </entry>
  <entry>
    <title>DOE SBIR "HPC Cybersecurity" - collaborators? feedback? comments?</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2290118" />
    <author>
      <name>Ulrich Lang</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2290118</id>
    <updated>2019-10-04T15:54:08Z</updated>
    <published>2019-10-04T15:53:59Z</published>
    <summary type="html">Dear HPC community,&lt;br /&gt;&lt;br /&gt;I would like to ask for your help.&lt;br /&gt;&lt;br /&gt;We are currently writing an HPC cybersecurity related proposal for the DOE SBIR (Dept. of Energy Small Business Innovation &amp;amp; Research) program.We are currently trying to identify partners for our project who operates an HPC center, and also for people in the field to provide feedback on our proposed approach.&lt;br /&gt;&lt;br /&gt;We are proposing to extend our OpenPMF fine-grained access control configuration solution for HPC environments. If you are knowledgeable in HPC cybersecurity and/or DOE&amp;#039;s HPC perspective, or are interested in partnering, then please let me know.&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Ulrich Lang&lt;br /&gt;CEO&lt;br /&gt;ObjectSecurity LLC</summary>
    <dc:creator>Ulrich Lang</dc:creator>
    <dc:date>2019-10-04T15:53:59Z</dc:date>
  </entry>
  <entry>
    <title>question about setting nodes/threads with spark-submit and srun</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2110311" />
    <author>
      <name>Weihao Ge</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2110311</id>
    <updated>2019-02-28T20:00:43Z</updated>
    <published>2019-02-28T20:00:43Z</published>
    <summary type="html">I&amp;#039;m really new with both spark and srun. In both spark-submit and srun, there are options to set nodes and tasks. How should I write the settings so that the spark jobs are distributed on the nodes and threads set in the sbatch script? Thanks!</summary>
    <dc:creator>Weihao Ge</dc:creator>
    <dc:date>2019-02-28T20:00:43Z</dc:date>
  </entry>
  <entry>
    <title>Publicly accessible Swift storage?</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1926784" />
    <author>
      <name>Dennis Heimbigner</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1926784</id>
    <updated>2018-07-20T17:31:00Z</updated>
    <published>2018-07-20T17:30:34Z</published>
    <summary type="html">I need to do some experimentation using the Swift API and it would be convenient if there was a &amp;#034;publicly&amp;#034; (i.e. XSEDE specific) Swift storage server available that I could use. Does such a thing exist?</summary>
    <dc:creator>Dennis Heimbigner</dc:creator>
    <dc:date>2018-07-20T17:30:34Z</dc:date>
  </entry>
  <entry>
    <title>RE: Distributed TensorFlow on Bridges?</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1829408" />
    <author>
      <name>Robert J Zigon</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1829408</id>
    <updated>2018-04-07T03:38:56Z</updated>
    <published>2018-04-07T03:38:56Z</published>
    <summary type="html">Adrian&lt;br /&gt;I&amp;#039;ve been trying to figure this out recently. Apparently I am not asking for help correctly.&lt;br /&gt;Did you happen to solve your distributed tensorflow problem? If so, would you be willing to share your script?&lt;br /&gt;&lt;br /&gt;Bob</summary>
    <dc:creator>Robert J Zigon</dc:creator>
    <dc:date>2018-04-07T03:38:56Z</dc:date>
  </entry>
  <entry>
    <title>[CFP] Pycon Tutorials, Talks, &amp; Posters</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1719191" />
    <author>
      <name>Jacqueline L Kazil</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1719191</id>
    <updated>2017-11-10T02:42:59Z</updated>
    <published>2017-11-10T02:41:51Z</published>
    <summary type="html">Have you dreamed of visiting Cleveland, Ohio? You should submit to PyCon&amp;#039;s Call for proposals. &lt;br /&gt;&lt;br /&gt;Tutorial proposals &amp;#x2014; deadline is 24 November 2017 AoE.&lt;br /&gt;Talk, Poster, and Education Summit proposals &amp;#x2014; deadline is 3 January 2018 AoE.&lt;br /&gt;&lt;br /&gt;All backgrounds, topics, and levels are encouraged to submit! &lt;br /&gt;If you submit early, you can get mentorship and feedback. &lt;br /&gt;&lt;br /&gt;More info here:  &lt;a href="https://us.pycon.org/2018/speaking/"&gt;https://us.pycon.org/2018/speaking/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;-Jackie Kazil&lt;br /&gt;Speaker Co-Chair&lt;br /&gt;&lt;a href="https://www.python.org/psf/"&gt;Python Software Foundation&lt;/a&gt; Board Member</summary>
    <dc:creator>Jacqueline L Kazil</dc:creator>
    <dc:date>2017-11-10T02:41:51Z</dc:date>
  </entry>
  <entry>
    <title>OpenSeesMP on Stampede2</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1636183" />
    <author>
      <name>Jazalyn Dukes</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1636183</id>
    <updated>2017-08-02T18:49:34Z</updated>
    <published>2017-08-02T18:49:34Z</published>
    <summary type="html">Has anyone run OpenSeesMP or OpenSeesSP on Stampede2 or know how we can do this? I placed a copy of the .exe file in the folder from which I&amp;#039;m working on, and there are tcl libraries installed already. I tried to run a job with OpenSeesMP using ibrun, but I get an error saying &amp;#034;execvp error on file OpenSeesMP.exe Permission denied&amp;#034;. Thanks.</summary>
    <dc:creator>Jazalyn Dukes</dc:creator>
    <dc:date>2017-08-02T18:49:34Z</dc:date>
  </entry>
  <entry>
    <title>RE: XSEDE user direct ssh to stampede2</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1627920" />
    <author>
      <name>Chris Hempel</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1627920</id>
    <updated>2017-07-20T20:06:22Z</updated>
    <published>2017-07-20T20:06:02Z</published>
    <summary type="html">Hello Shaowu,&lt;br /&gt;&lt;br /&gt;Multi-factor authentication (MFA) is now required to access TACC resources directly via ssh. Please visit the URL below for information on MFA and how to set it up.&lt;br /&gt;&lt;br /&gt;https://portal.tacc.utexas.edu/tutorials/multifactor-authentication&lt;br /&gt;&lt;br /&gt;Please let me know if you have additional questions.&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Chris&lt;br /&gt;&lt;br /&gt;Chris Hempel&lt;br /&gt;Director of User Services&lt;br /&gt;Texas Advanced Computing Center</summary>
    <dc:creator>Chris Hempel</dc:creator>
    <dc:date>2017-07-20T20:06:02Z</dc:date>
  </entry>
  <entry>
    <title>XSEDE user direct ssh to stampede2</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1627330" />
    <author>
      <name>Shaowu Bao</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1627330</id>
    <updated>2017-07-20T02:48:40Z</updated>
    <published>2017-07-20T02:48:40Z</published>
    <summary type="html">Dear all,&lt;br /&gt;&lt;br /&gt;As an XSEDE user, can I directly ssh login to stampede2.tacc.utexas.edu without going through the login.xsede.org server? When I tried it, it asked for TACC token which I do not have. &lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Shaowu Bao</summary>
    <dc:creator>Shaowu Bao</dc:creator>
    <dc:date>2017-07-20T02:48:40Z</dc:date>
  </entry>
  <entry>
    <title>Jobs running abnormally slow on Comet</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1585876" />
    <author>
      <name>Joseph Andrew Barranco</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1585876</id>
    <updated>2017-06-25T18:40:07Z</updated>
    <published>2017-06-25T18:38:47Z</published>
    <summary type="html">Recently, some jobs on Comet seem to be running a factor of 4 slower than usual (that is, they run slow from the start and throughout the entire calculation).  I have cancelled such jobs and and started them over, and then they run at the expected speed.  This has only occurred since I started running on 32 nodes (768 cores).  What could cause the exact same code to behave like this?&lt;br /&gt;&lt;br /&gt;I am going to start to keep track of which nodes are being used...  Could it be a problem with one faulty slow node?  My code runs at the pace of the slowest node.&lt;br /&gt;&lt;br /&gt;Any advice on how I might diagnose this?&lt;br /&gt;&lt;br /&gt;For the record, this code has run successfully on Stampede (TACC) and Pleiades (NASA) and not had this problem.</summary>
    <dc:creator>Joseph Andrew Barranco</dc:creator>
    <dc:date>2017-06-25T18:38:47Z</dc:date>
  </entry>
  <entry>
    <title>Using Persistent Communication in Fortran MPI</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1562708" />
    <author>
      <name>Julio Cesar Mendez</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1562708</id>
    <updated>2017-05-31T13:34:06Z</updated>
    <published>2017-05-31T13:33:19Z</published>
    <summary type="html">Dear community; &lt;br /&gt;I attended the summer bootcamp on 2015. We discussed OpemMP, MPI and OpenACC. &lt;br /&gt;In the boot camp we solved the famous problem of laplace equation. However, I am working with a more complex and bigger problem, a CFD real code. &lt;br /&gt;In this code, I need to update the ghost cells at each time step. Therefore the overhead is quite high. As a result, I have decided to use persistent communication, but my ghost cells are populated with zeroes. &lt;br /&gt;The code runs, but the solution is not right. As I said before, all my ghost cells are populated with zeroes, I have tried different things but nothing has worked out. &lt;br /&gt;I wanted to know if someone has used the persistent communication with the Laplace equation. If so, I will greatly appreciate if you can share the code with me to compare my syntax with a wroking version. Below I show a small part of the code. &lt;br /&gt;&lt;br /&gt;The code, calls the MPI_Subroutine where I set the communication characteristics. &lt;br /&gt; &lt;br /&gt;!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;!Starting up MPI&lt;br /&gt;call MPI_INIT(ierr)&lt;br /&gt;call MPI_COMM_SIZE(MPI_COMM_WORLD,npes,ierr)&lt;br /&gt;call MPI_COMM_RANK(MPI_COMM_WORLD,MyRank,ierr)&lt;br /&gt;&lt;br /&gt;!Compute the size of local block (1D Decomposition)&lt;br /&gt;Jmax = JmaxGlobal&lt;br /&gt;Imax = ImaxGlobal/npes&lt;br /&gt;if (MyRank.lt.(ImaxGlobal - npes*Imax)) then&lt;br /&gt;  Imax = Imax + 1&lt;br /&gt;end if&lt;br /&gt;if (MyRank.ne.0.and.MyRank.ne.(npes-1)) then&lt;br /&gt;  Imax = Imax + 2&lt;br /&gt;Else&lt;br /&gt;  Imax = Imax + 1&lt;br /&gt;endif&lt;br /&gt;&lt;br /&gt;! Computing neighboars&lt;br /&gt;if (MyRank.eq.0) then&lt;br /&gt;  Left = MPI_PROC_NULL&lt;br /&gt;else&lt;br /&gt;  Left = MyRank - 1&lt;br /&gt;end if&lt;br /&gt;&lt;br /&gt;if (MyRank.eq.(npes -1)) then&lt;br /&gt;  Right = MPI_PROC_NULL&lt;br /&gt;else&lt;br /&gt;  Right = MyRank + 1&lt;br /&gt;end if&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;! Initializing the Arrays in each processor, according to the number of local nodes&lt;br /&gt;Call InitializeArrays&lt;br /&gt;&lt;br /&gt;!Creating the channel of communication for this computation,&lt;br /&gt;!Sending and receiving the u_old (Ghost cells)&lt;br /&gt;Call MPI_SEND_INIT(u_old(2,: ),Jmax,MPI_DOUBLE_PRECISION,Left,tag,MPI_COMM_WORLD,req(1),ierr)&lt;br /&gt;Call MPI_RECV_INIT(u_old(Imax,: ),jmax,MPI_DOUBLE_PRECISION,Right,tag,MPI_COMM_WORLD,req(2),ierr)&lt;br /&gt;Call MPI_SEND_INIT(u_old(Imax-1,: ),Jmax,MPI_DOUBLE_PRECISION,Right,tag,MPI_COMM_WORLD,req(3),ierr)&lt;br /&gt;Call MPI_RECV_INIT(u_old(1,: ),jmax,MPI_DOUBLE_PRECISION,Left,tag,MPI_COMM_WORLD,req(4),ierr)&lt;br /&gt;&lt;br /&gt;End Subroutine MPI_Subroutine&lt;br /&gt;&lt;br /&gt;!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;br /&gt;&lt;br /&gt;From the main code, where the do loop is, I call the MPI_STARTALL and WaitALL in each time step. &lt;br /&gt;&lt;br /&gt;Call MPI_STARTALL(4,req,ierr) &lt;br /&gt;Call MPI_WAITALL(4,req,status,ierr) &lt;br /&gt;&lt;br /&gt;Req is an array of dimension (4) the same status. &lt;br /&gt;&lt;br /&gt;I am using Fortran 90... Any suggestions and comments? &lt;br /&gt;Thanks before hand</summary>
    <dc:creator>Julio Cesar Mendez</dc:creator>
    <dc:date>2017-05-31T13:33:19Z</dc:date>
  </entry>
  <entry>
    <title>'wait' - a simple question</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1185333" />
    <author>
      <name>Karen Pardos Olsen</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1185333</id>
    <updated>2016-03-31T15:51:24Z</updated>
    <published>2016-03-31T15:51:24Z</published>
    <summary type="html">Hi,&lt;br /&gt;&lt;br /&gt;I&amp;#039;m new to setting up batch files, and have a question about the &amp;#039;wait&amp;#039; command.&lt;br /&gt;&lt;br /&gt;The goal here is to run a lot of models with the &amp;#039;cloudy&amp;#039; code, but last time I ran my batch file, it only calculated 72 models out of ~700. So in my new batch file (part of it inserted below), I added an increasing &amp;#039;offset&amp;#039; (from 0 to 19),  then a &amp;#039;wait&amp;#039; to wait for the first 20 models to finish, after which the offset is reset to 0. Is that possible at all? Is there a more correct way of doing this?&lt;br /&gt;&lt;br /&gt;Batch file has been submitted but is still in the queue so I thought I&amp;#039;d just ask.&lt;br /&gt;&lt;br /&gt;Thanks in advance,&lt;br /&gt;Karen&lt;br /&gt;&lt;br /&gt;#!/bin/bash -l&lt;br /&gt;&lt;br /&gt;#----------------------------------------------------&lt;br /&gt;# SLURM job script to run SIGAME on &lt;br /&gt;# TACC&amp;#039;s Stampede system.&lt;br /&gt;#&lt;br /&gt;#----------------------------------------------------&lt;br /&gt;&lt;br /&gt;#SBATCH -J SIGAME-test              # Job name&lt;br /&gt;#SBATCH -o sigame.%j.out       # Name of stdout output file (%j expands to jobId)&lt;br /&gt;#SBATCH -p normal        # Queue name&lt;br /&gt;#SBATCH -N 20                 # Total number of nodes requested (16 cores/node)&lt;br /&gt;#SBATCH -n 20                  # Total number of mpi tasks requested&lt;br /&gt;#SBATCH -t 02:00:00           # Run time (hh:mm:ss) - 1.5 hours&lt;br /&gt;#SBATCH --mail-user=...&lt;br /&gt;#SBATCH --mail-type=begin  # email me when the job starts&lt;br /&gt;#SBATCH --mail-type=end    # email me when the job finishes&lt;br /&gt;&lt;br /&gt;#SBATCH -A ...      # &amp;lt;-- Allocation name to charge job against&lt;br /&gt;&lt;br /&gt;# Launch this&lt;br /&gt;ibrun -o 0 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_0 &amp;amp;&lt;br /&gt;ibrun -o 1 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_1 &amp;amp;&lt;br /&gt;ibrun -o 2 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_2 &amp;amp;&lt;br /&gt;ibrun -o 3 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_3 &amp;amp;&lt;br /&gt;ibrun -o 4 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_4 &amp;amp;&lt;br /&gt;ibrun -o 5 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_5 &amp;amp;&lt;br /&gt;ibrun -o 6 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_6 &amp;amp;&lt;br /&gt;ibrun -o 7 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_7 &amp;amp;&lt;br /&gt;ibrun -o 8 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_8 &amp;amp;&lt;br /&gt;ibrun -o 9 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_9 &amp;amp;&lt;br /&gt;ibrun -o 10 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_10 &amp;amp;&lt;br /&gt;ibrun -o 11 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_11 &amp;amp;&lt;br /&gt;ibrun -o 12 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_12 &amp;amp;&lt;br /&gt;ibrun -o 13 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_13 &amp;amp;&lt;br /&gt;ibrun -o 14 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_14 &amp;amp;&lt;br /&gt;ibrun -o 15 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_15 &amp;amp;&lt;br /&gt;ibrun -o 16 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_16 &amp;amp;&lt;br /&gt;ibrun -o 17 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_17 &amp;amp;&lt;br /&gt;ibrun -o 18 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_18 &amp;amp;&lt;br /&gt;ibrun -o 19 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_19 &amp;amp;&lt;br /&gt;wait&lt;br /&gt;ibrun -o 0 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_20 &amp;amp;&lt;br /&gt;ibrun -o 1 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_21 &amp;amp;&lt;br /&gt;ibrun -o 2 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_22 &amp;amp;&lt;br /&gt;ibrun -o 3 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_23 &amp;amp;&lt;br /&gt;ibrun -o 4 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_24 &amp;amp;&lt;br /&gt;ibrun -o 5 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_25 &amp;amp;&lt;br /&gt;ibrun -o 6 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_26 &amp;amp;&lt;br /&gt;ibrun -o 7 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_27 &amp;amp;&lt;br /&gt;ibrun -o 8 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_28 &amp;amp;&lt;br /&gt;ibrun -o 9 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_29 &amp;amp;&lt;br /&gt;ibrun -o 10 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_30 &amp;amp;&lt;br /&gt;ibrun -o 11 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_31 &amp;amp;&lt;br /&gt;ibrun -o 12 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_32 &amp;amp;&lt;br /&gt;ibrun -o 13 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_33 &amp;amp;&lt;br /&gt;ibrun -o 14 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_34 &amp;amp;&lt;br /&gt;ibrun -o 15 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_35 &amp;amp;&lt;br /&gt;ibrun -o 16 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_36 &amp;amp;&lt;br /&gt;ibrun -o 17 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_37 &amp;amp;&lt;br /&gt;ibrun -o 18 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_38 &amp;amp;&lt;br /&gt;ibrun -o 19 -n 1 /work/04075/kpolsen/SIGAME/cloudy/source/cloudy.exe -p GMC_39 &amp;amp;&lt;br /&gt;wait</summary>
    <dc:creator>Karen Pardos Olsen</dc:creator>
    <dc:date>2016-03-31T15:51:24Z</dc:date>
  </entry>
  <entry>
    <title>batch renaming files already on Ranch</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1086194" />
    <author>
      <name>Mary Katherine Clapp</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1086194</id>
    <updated>2015-11-03T20:27:05Z</updated>
    <published>2015-11-03T20:27:05Z</published>
    <summary type="html">Hi all,&lt;br /&gt;&lt;br /&gt;I have several directories of files already on Ranch archival storage that I need to batch-rename in order to run them efficiently through some code. Some of my files have spaces in them, and others have information in the incorrect order for processing.&lt;br /&gt;&lt;br /&gt;A couple of examples: I would like to rename the directory (and all the files therein) &amp;#039;CENTER1 20150825&amp;#039; to &amp;#039;CENTER1_20150825&amp;#039;, and the directory (and contents) &amp;#039;20150606 EALA-01&amp;#039; to EASTLA1_20150606&amp;#039;.&lt;br /&gt;&lt;br /&gt;Can I batch rename using ssh from my terminal? Do I need to stage all the files in order to rename them and then archive them all again? We&amp;#039;re talking ~8TB of data, so staging and re-uploading would NOT be a trivial process.&lt;br /&gt;&lt;br /&gt;I suppose I could also just wait until I actually pull them out of Ranch into Stampede for my actual computing, and rename them then. But if there is a simple way to rename that doesn&amp;#039;t involve re-uploading them all, it&amp;#039;d be nice to take care of it en masse.&lt;br /&gt;&lt;br /&gt;I am fairly new to the unix shell, so I may need more annotation than the average XSEDE user to understand the process.&lt;br /&gt;&lt;br /&gt;Thanks!</summary>
    <dc:creator>Mary Katherine Clapp</dc:creator>
    <dc:date>2015-11-03T20:27:05Z</dc:date>
  </entry>
  <entry>
    <title>Non-MPI jobs on comet (ie data parallelized)</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=959276" />
    <author>
      <name>Donald Gilbert</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=959276</id>
    <updated>2015-05-21T02:38:53Z</updated>
    <published>2015-05-21T02:38:53Z</published>
    <summary type="html">Please offer advice, and possibly add to your comet:/share/apps/examples/ on how to run tasks that are parallelized by splitting data, not by using MPI or open-MPI message passing. &lt;br /&gt;&lt;br /&gt;The examples at https://portal.xsede.org/sdsc-comet are for MPI or openMPI tasks.  &lt;br /&gt;Use of &amp;#039;ibrun&amp;#039; or openmpi run methods are tricky for data-split 1 cpu tasks. This is what&lt;br /&gt;I end up testing, but don&amp;#039;t know if it is making efficient use of the node&amp;#039;s 24 cpus, and it apparently won&amp;#039;t work for 2+ nodes:&lt;br /&gt;&lt;br /&gt;#! /bin/bash&lt;br /&gt;## env prog=blastallc.sh prodb=prot.db protin=prot.aa datad=`pwd` sbatch srun_prog.sh&lt;br /&gt;#SBATCH --job-name=&amp;#034;blasta1&amp;#034;&lt;br /&gt;#SBATCH --output=&amp;#034;blasta.%j.%N.out&amp;#034;&lt;br /&gt;#SBATCH --partition=compute&lt;br /&gt;#SBATCH --nodes=1&lt;br /&gt;#SBATCH --ntasks-per-node=24&lt;br /&gt;#SBATCH -t 39:55:00&lt;br /&gt;&lt;br /&gt;export ncpu=24&lt;br /&gt;if [ &amp;#034;X&amp;#034; = &amp;#034;X$datad&amp;#034; ]; then echo &amp;#034;ERR:datad=what?&amp;#034;; exit -1; fi&lt;br /&gt;if [ &amp;#034;X&amp;#034; = &amp;#034;X$prog&amp;#034; ]; then echo &amp;#034;ERR:prog=what?&amp;#034;; exit -1; fi&lt;br /&gt;cd $datad&lt;br /&gt;&lt;br /&gt;## * ibrun is wacky like aprun/ bigred in that it calls $prog ncpu times !! &lt;br /&gt;ibrun --npernode 1 -v $prog&lt;br /&gt;&lt;br /&gt;# $prog is a shell script that forks out $ncpu tasks on $ncpu data parts, as in&lt;br /&gt;  i=0; while [ $i -lt $ncpu ]; do { bioapp --part $i &amp;amp;; i=$(($i+1)); } done&lt;br /&gt;&lt;br /&gt;Data parallelization rather than MPI is the commonest method for bioinformatics and genomics apps, as the problems are data-bound and splitting data into N chunks, running single cpu tasks on each chunk works well.  Only a few of the many bio-apps are MPI-aware.   For example, comet has a module with an ancient &amp;#039;mpiblast&amp;#039; which I would not recommend because it likely hasn&amp;#039;t been updated to NCBI&amp;#039;s  current blast code in years. The above method is what I use for NCBI blast, splitting data to ncpu parts.&lt;br /&gt;&lt;br /&gt;- Don Gilbert</summary>
    <dc:creator>Donald Gilbert</dc:creator>
    <dc:date>2015-05-21T02:38:53Z</dc:date>
  </entry>
</feed>

