<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Choosing resource for large memory jobs?</title>
  <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_thread?p_l_id=&amp;threadId=1539660" />
  <subtitle>Choosing resource for large memory jobs?</subtitle>
  <entry>
    <title>Choosing resource for large memory jobs?</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1539659" />
    <author>
      <name>Gaurav Kandoi</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=1539659</id>
    <updated>2017-05-03T18:28:37Z</updated>
    <published>2017-05-03T18:14:22Z</published>
    <summary type="html">Hi&lt;br /&gt;&lt;br /&gt;We are in process of submitting a startup allocation grant and I would like to know which compute resource would be most suitable for our usage.&lt;br /&gt;&lt;br /&gt;My work will involve:&lt;br /&gt;&lt;br /&gt;&lt;ul style="list-style: disc inside;"&gt;&lt;li&gt;Protein sequence property calculation: First, I&amp;#039;ll calculate various sequence based properties in R for ~75k protein/mRNAs. Then, I&amp;#039;ll calculate the pairwise correlation between all possible pairs of ~75k protein/mRNA. And, finally store them for use in developing prediction models. This step requires large amount of RAM because I need to store the entire ~75k * ~75k correlation matrix.&lt;/li&gt;&lt;li&gt;mRNA quantification using RNA-Seq: Process ~100 RNA-Seq datasets to quantify the expression of mRNA. Calculate pairwise correlation of all mRNA&amp;#039;s for every RNA-Seq dataset. Store them for use in developing prediction models. RNA-Seq processing can be parallelized fairly easily and won&amp;#039;t be very memory intensive. However, calculating and saving the correlation matrix requires a large amount of RAM.&lt;/li&gt;&lt;li&gt; Combine features: Each of the correlation score serves as a feature for my machine learning model and hence I join these scores to get a single file where each row is a protein/mRNA pair and the columns contains the correlation scores from sequence properties and RNA-Seq dataset.&lt;/li&gt;&lt;/ul style="list-style: disc inside;"&gt;&lt;br /&gt;&lt;br /&gt;Because the project involves steps which require multiple nodes (RNA-Seq processing) and large RAM (correlation calculation), I am wondering which resources would be best? I can then ask my PI to include those in our proposal.&lt;br /&gt;&lt;br /&gt;Thanks</summary>
    <dc:creator>Gaurav Kandoi</dc:creator>
    <dc:date>2017-05-03T18:14:22Z</dc:date>
  </entry>
</feed>

