<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Help with `srun nvidia-smi` for power measurements</title>
  <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_recent_posts?p_l_id=" />
  <subtitle>Help with `srun nvidia-smi` for power measurements</subtitle>
  <entry>
    <title>Help with `srun nvidia-smi` for power measurements</title>
    <link rel="alternate" href="https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2975453" />
    <author>
      <name>Wileam Yonatan Phan</name>
    </author>
    <id>https://conferences.xsede.org/c/message_boards/find_message?p_l_id=&amp;messageId=2975453</id>
    <updated>2022-04-27T14:25:43Z</updated>
    <published>2022-04-27T14:25:43Z</published>
    <summary type="html">I&amp;#039;m currently writing a script to get power measurements from the NVIDIA V100 GPU on Expanse-GPU (SDSC). Out of the three methods possible (`nvidia-smi`, NVML, and CUPTI), it seems that `nvidia-smi` has the lowest barrier to entry because it comes with the NVIDIA drivers.&lt;br /&gt;&lt;br /&gt;Currently I have the following three concerns:&lt;br /&gt;&lt;br /&gt; 1. How to deal with Slurm.&lt;br /&gt;    For some reason, at Expanse-GPU, after getting an interactive node on the `gpu-shared` partition (or any other Expanse-GPU partition, for that matter) , you can only run `srun nvidia-smi` but not `nvidia-smi` directly. This suggests that either Expanse has a three-tiered node hierarchy (login, batch, compute) like Summit, or for some reason the environment isn&amp;#039;t setup properly outside of `srun`. Which one of these two scenarios is true?&lt;br /&gt;&lt;br /&gt;2. How to spawn and kill `nvidia-smi` properly.&lt;br /&gt;    This is a technical aspect, but I&amp;#039;m not sure how to reliably spawn and kill (or start and stop) `nvidia-smi` in the Slurm job script. I am only mildly familiar with `pidof` and `kill`.&lt;br /&gt;&lt;br /&gt;3. How to set polling interval appropriately.&lt;br /&gt;    This is more of a conceptual thing. In quantum physics, a measurement changes the system. I think a similar concept applies here since we&amp;#039;re polling the GPU by using `nvidia-smi`. If the frequency is too high, we might incur a performance cost. If the frequency is too low, the data will be inaccurate and the program to be analyzed might finish entirely within the polling interval. How can I determine the appropriate polling interval?&lt;br /&gt;&lt;br /&gt;I have also submitted ticket #21526 to the Expanse help desk.</summary>
    <dc:creator>Wileam Yonatan Phan</dc:creator>
    <dc:date>2022-04-27T14:25:43Z</dc:date>
  </entry>
</feed>

