DE Analysis

This guide contains a brief discription of DE analysis methods used within Dprofiler. These methods are primarily incorporated within each iteration of the Differential Heterogeneity Analysis.

Introduction

Differential gene expression analysis has become an increasingly popular tool in determining and viewing up and/or down experssed genes between two or more sets of samples. The goal of Differential expression analysis is to find genes, transcripts or regions whose difference in expression/count, when accounting for the variance within condition, is higher than expected by chance. DESeq2 is one of the highly used package in R available via Bioconductor and is designed to normalize count data from high-throughput sequencing assays such as RNA-Seq and test for differential expression (Love et al. 2014). For more information on the DESeq2 algorithm, you can read its documentation below;

DESeq2 userguide

With multiple parameters such as padjust values, log fold changes, and plot styles, altering plots, created with your DE data can be a hassle as well as time consuming. The Dprofiler uses DESeq2, EdgeR, and Limma coupled with shiny to produce real-time changes within your plot queries and allows for interactive browsing of your DE results. In addition to DE analysis, Dprofiler also offers a variety of other plots and analysis tools to help visualize your data even further.

DESeq2

For the details please check the user guide. DESeq2 userguide

DESeq2 performs multiple steps in order to analyze the data you’ve provided for it. The first step is to indicate the condition that each column (experiment) in the table represent. You can group multiple samples into one condition column. DESeq2 will compute the probability that a gene is differentially expressed (DE) for ALL genes in the table. It outputs both a nominal and a multiple hypothesis corrected p-value (padj) using a negative binomial distribution.

Un-normalized counts

DESeq2 requires count data as input obtained from RNA-Seq or another high-thorughput sequencing experiment in the form of matrix values. Here we convert un-integer values to integer to be able to run DESeq2. The matrix values should be un-normalized, since DESeq2 model internally corrects for library size. So, transformed or normalized values such as counts scaled by library size should not be used as input. Please use edgeR or limma for normalized counts.

Used parameters for DESeq2

  • fitType:

    either “parametric”, “local”, or “mean” for the type of fitting of dispersions to the mean intensity. See estimateDispersions for description.

  • betaPrior:

    whether or not to put a zero-mean normal prior on the non-intercept coefficients See nbinomWaldTest for description of the calculation of the beta prior. By default, the beta prior is used only for the Wald test, but can also be specified for the likelihood ratio test.

  • testType:

    either “Wald” or “LRT”, which will then use either Wald significance tests (defined by nbinomWaldTest), or the likelihood ratio test on the difference in deviance between a full and reduced model formula (defined by nbinomLRT)

  • rowsum.filter:

    regions/genes/isoforms with total count (across all samples) below this value will be filtered out

EdgeR

For the details please check the user guide. EdgeR userguide.

Used parameters for EdgeR

  • Normalization:

    Calculate normalization factors to scale the raw library sizes. Values can be “TMM”,”RLE”,”upperquartile”,”none”.

  • Dispersion:

    either a numeric vector of dispersions or a character string indicating that dispersions should be taken from the data object.

  • testType:

    exactTest or glmLRT. exactTest: Computes p-values for differential abundance for each gene between two samples, conditioning on the total count for each gene. The counts in each group are assumed to follow a binomial distribution. glmLRT: Fits a negative binomial generalized log-linear model to the read counts for each gene and conducts genewise statistical tests.

  • rowsum.filter:

    regions/genes/isoforms with total count (across all samples) below this value will be filtered out

Limma

For the details please check the user guide. Limma userguide.

Limma is a package to analyse of microarray or RNA-Seq data. If data is normalized with spike-in or any other scaling, tranforamtion or normalization method, Limma can be ideal. In that case, prefer limma rather than DESeq2 or EdgeR.

Used parameters for Limma

  • Normalization:

    Calculate normalization factors to scale the raw library sizes. Values can be “TMM”,”RLE”,”upperquartile”,”none”.

  • Fit Type:

    fitting method; “ls” for least squares or “robust” for robust regression

  • Norm. Bet. Arrays:

    Normalization Between Arrays; Normalizes expression intensities so that the intensities or log-ratios have similar distributions across a set of arrays.

  • rowsum.filter:

    regions/genes/isoforms with total count (across all samples) below this value will be filtered out

ComBat

For more details on ComBat, please check the user guide. ComBat userguide.

ComBat is part of the SVA R Bioconductor package which specializes in corecting for known batch effects. No additional parameters are selected or altered when running SVA’s ComBat.