How to Run A Python Script in Slurm-Based Cluster in Five Minutes

April 18, 2017

This actually took me a couple hours to figure outErm, more like a day if we take into account the frustration and the resulting procrastination. On the other hand, I got to listen to this amazing podcast with Kara Swisher, a fearlessly straightforward journalist. Life-changing. , so hopefully I save you some time!

This tutorial was designed with running a Python script in mind, but is pretty generalizable.

Requirement

Have an cluster account with a Python module installed. To verify this, login to your ssh account

ssh hsu01@login.cluster.tufts.edu
[hsu01@login001 ~]$ module load python

If the Python module is supported, no error should be returned. You can double-check that Python is loaded by checking the version.

[hsu01@login001 ~]$ python -V
Python 2.7.3

Let’s Do It!

On Your Local Computer

We will make the cluster run the file hello.py which is simply a ‘Hello World’ script.

The code is literally

print 'Hello World!'

To run it, we will save the commands to run this file into a shell script, run.sh, which looks like

#!/bin/sh
module load python
python hello.py

There are several ways to get these files into your cluster, but my favorite method is saving this script into my local computer, and then copying it to the remote computer using scp Stands for secure copy. .

Suppose hello.py file is in my local directory. From my local computer, I will type into command line

scp hello.py run.sh hsu01@login.cluster.tufts.edu:~/

(notice the tilde) and should be prompted for my ssh password.

On Your SSH Account

Login again to your ssh account, and use sbatchSubmit batch. to run the script

[hsu01@login001 ~]$ sbatch run.sh
Submitted batch job 12616333

The cluster should respond with the submitted batch jobA process you run is called a job in Cluster parlance ID, in this case 12616333.

Now once the job is done, which should be immediately, the output of the job will appear. If we lsList FileS…whatever , we should see the output file slurm-12616333.out appear. Viewing it using the lessAs opposed to seeing more, I guess. These command lines are pretty arbitrary. command, we see this

Hello World!
slurm-12616333.out (END)

press q to end the viewing.

Yay! This is good, we now know how to run any typical Python script on the server!

The Fancy Clustery Part

We haven’t really exploited the strength of the Cluster though, which is the fact that we can hook our script to multiple cores with a very high amount of RAM. We’ve been stuck in the head node of our server, which is just a very small portion of the cluster of computers we can have access to This is super optional, but I found this short guide, written by an academic, very easy to read and informative about how the Cluster is structured. .

To connect to the other parts of the Cluster, which is basically a scheduling system to allocate resources between a cluster of computers, we need to tell the scheduling system, how many resources we need. This can be done by modifying the run.sh script.

#!/bin/sh
#SBATCH -N 1      # nodes requested
#SBATCH -n 1      # tasks requested
#SBATCH -c 4      # cores requested
#SBATCH --mem=10  # memory in Mb
#SBATCH -o outfile  # send stdout to outfile
#SBATCH -e errfile  # send stderr to errfile
#SBATCH -t 0:01:00  # time requested in hour:minute:second

module load python
python hello.py

I used these parameters for the purpose of the tutorial because it took no time to allocate. The more resources, the longer it takes to make room for your computation. You should modify the number of nodes, tasks, cores and size of memory as you see fit to your task.

To be honest, I have much to learn about what numbers are reasonable to put here for my own script! I’ll end the post here.

Hope this helped.

Additional Resources

Quick Start for Slum by Tufts University.

Understanding the Cluster by the Zhang Lab at the University of Michigan.

How to Run A Python Script in Slurm-Based Cluster in Five Minutes - April 18, 2017 - Hang Lu Su