<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Bridges: Processing large number of large files</title>
  <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_recent_posts?p_l_id=" />
  <subtitle>Bridges: Processing large number of large files</subtitle>
  <entry>
    <title>Bridges: Processing large number of large files</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2532599" />
    <author>
      <name>Mohan Sun</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2532599</id>
    <updated>2020-08-13T20:28:57Z</updated>
    <published>2020-08-13T20:27:09Z</published>
    <summary type="html">Hey guys,&lt;br /&gt;&lt;br /&gt;Do you have suggestions on best way to submitting batch job on Bridges for a project with large number of large file?&lt;br /&gt;&lt;br /&gt;I have a projects where I need to process around 100,000 csv.gz. An average gz file is 500 MB. I am currently processing with grep | sed | awk. So approximately for each file the processing time is 0.5 hour on a RM-shared core.&lt;br /&gt;&lt;br /&gt;My thinking on this is that I submit 100,000 RM-shared batch where each batch process one file. But my concern is that my later jobs will have lower priority since I have already submitted/completed many jobs.&lt;br /&gt;&lt;br /&gt;I cannot find such mechanism in PSC documentation but I think it&amp;#039;s a common thing for supercomputers. Can anyone help me verify that? Also any suggestions are welcome.</summary>
    <dc:creator>Mohan Sun</dc:creator>
    <dc:date>2020-08-13T20:27:09Z</dc:date>
  </entry>
</feed>

