How to Run A Python Script in Slurm-Based Cluster in Five Minutes
April 18, 2017
This actually took me a couple hours to figure outErm, more like a day if we take into account the frustration and the resulting procrastination. On the other hand, I got to listen to this amazing podcast with Kara Swisher, a fearlessly straightforward journalist. Life-changing. , so hopefully I save you some time!
This tutorial was designed with running a Python script in mind, but is pretty generalizable.
- Let’s Do It!
- The Fancy Clustery Part
- Additional Resources
Have an cluster account with a Python module installed. To verify this, login to your ssh account
[hsu01@login001 ~]$ module load python
If the Python module is supported, no error should be returned. You can double-check that Python is loaded by checking the version.
[hsu01@login001 ~]$ python -V Python 2.7.3
Let’s Do It!
On Your Local Computer
We will make the cluster run the file hello.py which is simply a ‘Hello World’ script.
The code is literally
print 'Hello World!'
To run it, we will save the commands to run this file into a shell script, run.sh, which looks like
#!/bin/sh module load python python hello.py
There are several ways to get these files into your cluster, but my favorite method is saving this script into my local computer, and then copying it to the remote computer using scp Stands for secure copy. .
hello.py file is in my local directory. From my local computer, I will type into command line
scp hello.py run.sh email@example.com:~/
(notice the tilde) and should be prompted for my ssh password.
On Your SSH Account
Login again to your ssh account, and use sbatchSubmit batch. to run the script
[hsu01@login001 ~]$ sbatch run.sh Submitted batch job 12616333
The cluster should respond with the submitted batch jobA process you run is called a job in Cluster parlance ID, in this case 12616333.
Now once the job is done, which should be immediately, the output of the job will appear. If we lsList FileS…whatever , we should see the output file
slurm-12616333.out appear. Viewing it using the lessAs opposed to seeing more, I guess. These command lines are pretty arbitrary. command, we see this
Hello World! slurm-12616333.out (END)
press q to end the viewing.
Yay! This is good, we now know how to run any typical Python script on the server!
The Fancy Clustery Part
We haven’t really exploited the strength of the Cluster though, which is the fact that we can hook our script to multiple cores with a very high amount of RAM. We’ve been stuck in the head node of our server, which is just a very small portion of the cluster of computers we can have access to This is super optional, but I found this short guide, written by an academic, very easy to read and informative about how the Cluster is structured. .
To connect to the other parts of the Cluster, which is basically a scheduling system to allocate resources between a cluster of computers, we need to tell the scheduling system, how many resources we need. This can be done by modifying the run.sh script.
#!/bin/sh #SBATCH -N 1 # nodes requested #SBATCH -n 1 # tasks requested #SBATCH -c 4 # cores requested #SBATCH --mem=10 # memory in Mb #SBATCH -o outfile # send stdout to outfile #SBATCH -e errfile # send stderr to errfile #SBATCH -t 0:01:00 # time requested in hour:minute:second module load python python hello.py
I used these parameters for the purpose of the tutorial because it took no time to allocate. The more resources, the longer it takes to make room for your computation. You should modify the number of nodes, tasks, cores and size of memory as you see fit to your task.
To be honest, I have much to learn about what numbers are reasonable to put here for my own script! I’ll end the post here.
Hope this helped.
Quick Start for Slum by Tufts University.
Understanding the Cluster by the Zhang Lab at the University of Michigan.