RNA-Seq Analysis: Quantifying Gene Expression

Overview

RNA sequencing (RNA-seq) has become the standard method for measuring gene expression, replacing earlier hybridization-based approaches. By converting RNA molecules into a cDNA library and sequencing millions of fragments, RNA-seq provides both the identity and abundance of transcripts in a biological sample. Unlike microarrays, RNA-seq can detect novel transcripts, splice variants, and non-coding RNAs without prior probe design. The technique delivers a dynamic range spanning several orders of magnitude and single-base resolution. RNA-seq data underpin most modern transcriptomics studies, from model organisms to clinical specimens.

Key Concepts

The RNA-seq analysis pipeline begins with raw sequencing reads, which undergo quality control (FastQC), adapter trimming (Trimmomatic or Cutadapt), and alignment to a reference genome using splice-aware aligners such as STAR or HISAT2. Quantification at the gene or transcript level is performed by tools like featureCounts, RSEM, or Salmon. Expression values are normalized as FPKM, RPKM, or TPM to account for sequencing depth and transcript length. Detection of alternative splicing and novel transcripts requires specialized tools such as Cufflinks or StringTie, which assemble transcripts from aligned reads. Quality metrics include mapping rates, read distribution across gene features, and saturation analysis.

Applications

RNA-seq is applied across virtually all areas of biology. It profiles gene expression changes in disease versus healthy tissue, identifies biomarkers for cancer subtypes, and tracks developmental gene expression programs. In microbiology, RNA-seq reveals transcriptional responses to antibiotics and environmental stress. The technique builds directly on RNA sequencing methods and is frequently integrated with next-generation sequencing workflows. RNA-seq also enables discovery of novel RNA structures and types, including long non-coding RNAs and circular RNAs, expanding our understanding of the transcriptome’s complexity.