Skip to content

Tufts TTS Research Technology Tutorials - Beta

Background

Tufts TTS Research Technology Tutorials - Beta

About
About
- Introduction
- News
  News
  - 2023
  - 2022
HPC User Guide
HPC User Guide
- Introduction To The Cluster
  Introduction To The Cluster
- Cluster Resources
  Cluster Resources
- Introduction To SLURM
  Introduction To SLURM
- HPC Services
  HPC Services
  - HPC Services
HPC Software/Tools
HPC Software/Tools
- Available HPC Tools
- Python
  Python
- R
  R
- Misc
  Misc
Unix/R/Python Tutorials
Unix/R/Python Tutorials
- Introduction
- Unix
  Unix
  - Intro To Unix
    Intro To Unix
    
    Intro To Linux
    
    Starting with Shell
    
    Bash Parameters
    
    Shell Navigation
    
    Creating & Manipulating Files
    
    Going Home
    
    Running an Interactive Session
    
    Example with BLAST
    
    BLAST Batch Script
- R
  R
  - Intro To R
    Intro To R
    
    Introduction To R OnDemand
    
    R Basics
    
    Data Structures
    
    Functions & Flow
    
    Inspecting/Manipulating Data
    
    Visualization
- Python
  Python
  - Intro To Python
    Intro To Python
    
    Introduction To Python OnDemand
    
    Variables & Data Types
    
    Libraries & Data Frames
    
    Plotting with Plotly
    
    Lists
    
    Loops & Conditionals
    
    Functions & Scope
Omics Tutorials
Omics Tutorials
- Introduction
- Genomics
  Genomics
  - NGS Tips & Tricks
    NGS Tips & Tricks
    
    Fastq Manipulation
  - Intro To NGS
    Intro To NGS
    
    Background Background
    Table of contents
    
    Background
    
    DNA Sequencing
    
    RNA Sequencing
    
    Next Generation Sequencing
    
    Singe End v. Paired End Data
    
    Variant Calling
    
    Ploidy
    
    References
    
    Setup
    
    Quality Control
    
    Alignment
    
    Alignment Cleanup
    
    Variant Calling
    
    Variant Annotation
  - Intro To 16S Metabarcoding
    Intro To 16S Metabarcoding
    
    Background
    
    Setup
    
    Quality Control
    
    Error Model & ASVs
    
    Merging, Chimeras & Taxonomy
    
    Diversity Analysis
    
    Differential Abundance
- Transcriptomics
  Transcriptomics
  - Intro To RNA-Seq
    Intro To RNA-Seq
    
    Background
    
    Setup
    
    Quality Control
    
    Read Alignment
    
    Gene Quantification
    
    Differential Expression
    
    Pathway Enrichment
- Proteomics
  Proteomics
  - Intro To Proteomics
    Intro To Proteomics
    
    Background
    
    Setup
  - Intro To AlphaFold2
    Intro To AlphaFold2
    
    Background
    
    Setup
    
    AlphaFold2 Pre-Processing
    
    AlphaFold2 Evoformer/Structure Module
    
    AlphaFold2 Output
    
    PyMOL Visualization
    
    Optional: AlphaFold2 Batch Script
Biostatistics
Biostatistics
- Introduction To Biostatistics
- Setup
- Variables and Sampling
  Variables and Sampling
- Analyzing One Categorial Variable
  Analyzing One Categorial Variable
  - Binomial Test
- Analyzing Two Categorical Variables
  Analyzing Two Categorical Variables
- Analyzing One Numeric Variable
  Analyzing One Numeric Variable
  - One Sample T-Test
- Analyzing Numeric Variable With Two Groups
  Analyzing Numeric Variable With Two Groups
  - Paired T-Test
  - Two Sample T-Test
- Analyzing Two Numeric Variables
  Analyzing Two Numeric Variables
  - Correlation
- Analyzing Two Or More Groups
  Analyzing Two Or More Groups
  - One-Way ANOVA
Machine Learning
Machine Learning
- Introduction To Machine Learning
  Introduction To Machine Learning
  - Introduction To Machine Learning
  - Tutorial Setup
- Unsupervised Learning
  Unsupervised Learning
- Supervised Learning
  Supervised Learning

Background

Background

Sequencing data analysis typically focuses on either assessing DNA or RNA. As a reminder here is the interplay between DNA, RNA, and protein:

DNA Sequencing

Fixed copy of a gene per cell
Analysis goal: Variant calling and interpretation

RNA Sequencing

Copy of a transcript per cell depends on gene expression
Analysis goal: Differential expression and interpretation

Note

Here we are working with DNA sequencing

Next Generation Sequencing

Here we will analyze a DNA sequence using next generation sequencing data. Here are the steps to get that data:

Library Preparation: DNA is fragmented and adapters are added to these fragments

Cluster Amplification: This library is loaded onto a flow cell, where the adapters help hybridize the fragments to the flow cell. Each fragment is then amplified to form a clonal cluster

Sequencing: Fluorescently labelled nucleotides are added to this flow cell and each time a base in the fragment bonds a light signal is emmitted telling the sequencer which base is which in the sequence.

Alignment & Data Analysis: These sequenced fragments, or reads, can then be aligned to a reference sequence to determine differences.

Singe End v. Paired End Data

single-end sequence each DNA fragement from one end only
paired-end sequence each DNA fragement from both sides. Paired-end data is useful when sequencing highly repetitive sequences.

Variant Calling

Ploidy

When discussing variant calling it is worth mentioning an organism's ploidy. Ploidy is the number of copies of each chromosomes.
- Humans cells are diploid for autosomal chromosome and haploid for sex chromosomes
- Bacteria are haploid
- Viruses and Yeast can by haploid or diploid

Variant callers can use ploidy to improve specificity (avoid false positives) because there are expected variant frequencies, e.g. for a diploid:
- Homozygous
- both copies contain variant
- fraction of the reads ~1
- Heterozygous
- one copy of variant
- fraction of reads with variant ~0.5

References