<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Non-MPI jobs on comet (ie data parallelized)</title>
  <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_thread?p_l_id=&amp;threadId=959277" />
  <subtitle>Non-MPI jobs on comet (ie data parallelized)</subtitle>
  <entry>
    <title>Non-MPI jobs on comet (ie data parallelized)</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=959276" />
    <author>
      <name>Donald Gilbert</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=959276</id>
    <updated>2015-05-21T02:38:53Z</updated>
    <published>2015-05-21T02:38:53Z</published>
    <summary type="html">Please offer advice, and possibly add to your comet:/share/apps/examples/ on how to run tasks that are parallelized by splitting data, not by using MPI or open-MPI message passing. &lt;br /&gt;&lt;br /&gt;The examples at https://portal.xsede.org/sdsc-comet are for MPI or openMPI tasks.  &lt;br /&gt;Use of &amp;#039;ibrun&amp;#039; or openmpi run methods are tricky for data-split 1 cpu tasks. This is what&lt;br /&gt;I end up testing, but don&amp;#039;t know if it is making efficient use of the node&amp;#039;s 24 cpus, and it apparently won&amp;#039;t work for 2+ nodes:&lt;br /&gt;&lt;br /&gt;#! /bin/bash&lt;br /&gt;## env prog=blastallc.sh prodb=prot.db protin=prot.aa datad=`pwd` sbatch srun_prog.sh&lt;br /&gt;#SBATCH --job-name=&amp;#034;blasta1&amp;#034;&lt;br /&gt;#SBATCH --output=&amp;#034;blasta.%j.%N.out&amp;#034;&lt;br /&gt;#SBATCH --partition=compute&lt;br /&gt;#SBATCH --nodes=1&lt;br /&gt;#SBATCH --ntasks-per-node=24&lt;br /&gt;#SBATCH -t 39:55:00&lt;br /&gt;&lt;br /&gt;export ncpu=24&lt;br /&gt;if [ &amp;#034;X&amp;#034; = &amp;#034;X$datad&amp;#034; ]; then echo &amp;#034;ERR:datad=what?&amp;#034;; exit -1; fi&lt;br /&gt;if [ &amp;#034;X&amp;#034; = &amp;#034;X$prog&amp;#034; ]; then echo &amp;#034;ERR:prog=what?&amp;#034;; exit -1; fi&lt;br /&gt;cd $datad&lt;br /&gt;&lt;br /&gt;## * ibrun is wacky like aprun/ bigred in that it calls $prog ncpu times !! &lt;br /&gt;ibrun --npernode 1 -v $prog&lt;br /&gt;&lt;br /&gt;# $prog is a shell script that forks out $ncpu tasks on $ncpu data parts, as in&lt;br /&gt;  i=0; while [ $i -lt $ncpu ]; do { bioapp --part $i &amp;amp;; i=$(($i+1)); } done&lt;br /&gt;&lt;br /&gt;Data parallelization rather than MPI is the commonest method for bioinformatics and genomics apps, as the problems are data-bound and splitting data into N chunks, running single cpu tasks on each chunk works well.  Only a few of the many bio-apps are MPI-aware.   For example, comet has a module with an ancient &amp;#039;mpiblast&amp;#039; which I would not recommend because it likely hasn&amp;#039;t been updated to NCBI&amp;#039;s  current blast code in years. The above method is what I use for NCBI blast, splitting data to ncpu parts.&lt;br /&gt;&lt;br /&gt;- Don Gilbert</summary>
    <dc:creator>Donald Gilbert</dc:creator>
    <dc:date>2015-05-21T02:38:53Z</dc:date>
  </entry>
</feed>

