Setup
Approximate time: 20 minutes
Goals
- Connect to the HPC cluster via On Demand Interface
- Download data
Log into the HPC cluster's On Demand interface
- Open a Chrome browser visit ondemand.cluster.tufts.edu
- Log in with your Tufts Credentials
- On the top menu bar choose Clusters->HPC Shell Access
- Type your password at the prompt (the password will be hidden for security purposes):
whuo01@login.cluster.tufts.edu's password:
- You'll see a welcome message and a bash prompt, for example for user
whuo01
:
[whuo01@login001 ~]$
This indicates you are logged in to the login node.
- Type
clear
to clear the screen
Compute node allocation
- Get an interactive session on a compute node by typing:
srun --pty -t 3:00:00 --mem 16G -N 1 -n 4 bash
Once you hit enter, you will see something like below showing that the job is queued:
[whuo01@login001 ~]$ srun --pty -t 3:00:00 --mem 16G -N 1 -n 4 bash
srun: job 55918493 queued and waiting for resources
-p interactive
before bash
.
Or, if you are you registered for the workshop, you can use following option before bash
: -p preempt --reservation=bioworkshop
.
This reservation will be available for one week after the workshop start.
You can press Ctrl-C
to cancel your request and try again with different options, e.g.:
[whuo01@login001 ~]$ srun --pty -t 3:00:00 --mem 16G -N 1 -n 4 -p interactive bash
[whuo01@pcomp45 ~]$
The success is indicated by the change of node name after your username. Here it was changed from login001
to pcomp45
.
This is an indication that you may proceed to the next step.
Note: If you go through this workshop in multiple steps, you will have to rerun this step each time you log in.
Course data
- Since our home directory will likely not have enough space for the analysis (> 3Gb), we'll work in a course directory. Your work will be saved here for 30 days.** Change to the course directory
cd /cluster/tufts/bio/tools/training/intro-to-rnaseq/users/
**Note: If you have a project directory for your lab, you may use this instead.
These are located in /cluster/tufts
with names like /cluster/tufts/labname/username/
.
If you don't know whether you have project space, please email tts-research@tufts.edu.
-
Make a directory for your work (replace
whuo01
in the below commands with your username)mkdir whuo01 cd whuo01
-
Copy the course files into your own directory:
cp /cluster/tufts/bio/tools/training/intro-to-rnaseq/intro-to-RNA-seq-May-2020.tar.gz .
-
Unzip the course directory:
tar -xvzf intro-to-RNA-seq-May-2020.tar.gz
-
Take a look at the contents of the unzipped directory by typing:
tree intro-to-RNA-seq
Result:
intro-to-RNA-seq/
├── ERP004763_info.txt <-- sample description
├── raw_data <-- Folder with fastq files
│ ├── sample_info.txt
│ ├── SNF2
│ │ ├── ERR458500.fastq.gz <-- gzip compressed fastq files
│ │ ├── ERR458501.fastq.gz
│ │ ├── ERR458502.fastq.gz
│ │ ├── ERR458503.fastq.gz
│ │ ├── ERR458504.fastq.gz
│ │ ├── ERR458505.fastq.gz
│ │ └── ERR458506.fastq.gz
│ └── WT
│ ├── ERR458493.fastq.gz
│ ├── ERR458494.fastq.gz
│ ├── ERR458495.fastq.gz
│ ├── ERR458496.fastq.gz
│ ├── ERR458497.fastq.gz
│ ├── ERR458498.fastq.gz
│ └── ERR458499.fastq.gz
└── scripts <-- Folder with all commands
├── fastqc.sh
├── featurecounts.sh
├── intro.R
├── sbatch_star_align_individual.sh
├── sbatch_star_align.sh
└── sbatch_star_align_SNF2.sh
4 directories, 22 files
Data for the class
Publication: Statistical Models for RNA-seq Data Derived From a Two-Condition 48-replicate Experiment.
Purpose: The experiment seeks to compare a wild type Saccharomyces cerevisiae with a mutant that contains a knock-out in the gene SNF2. The purpose of the study is to analyze variability in sequencing replicates.
Project access number: PRJEB5348
Samples: The WT
folder contains 7 sequencing files from a wild type yeast sample, SNF2
contains 7 sequencing files from a yeast sample with a knock-out mutation in the gene SNF2.
Note that for the workshop purposes we are treating the 7 sequencing files as if they originate from separate biological replicates.
Organism: Saccharomyces cerevisiae
Sequencing: Illumina HiSeq, Single End, 50bp read length