---
title: "Performing differential co-expression analysis using dcanr"
author:
  - name: Dharmesh D Bhuva
    affiliation:
      - Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
      - School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia
date: "`r Sys.Date()`"
package: dcanr
output:
  BiocStyle::html_document:
    toc_float: true
    fig_caption: true
    number_sections: true
vignette: >
  %\VignetteIndexEntry{1. Differential co-expression analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction
This package provides methods to perform differential co-expression analysis and 
to evaluate differential co-expression methods using simulated data. Differential 
co-expression analysis attempts to identify gene-gene associations that change 
across conditions. Currently, 10 methods that identify changes between binary
conditions are included: 8 are novel Bioconductor implementations of previously
published methods, and; 2 are accessed through interfaces to existing packages.

This vignette focuses on the application of differential co-expression inference
methods to real data. Available methods, putative pipelines, and visualisations
provided by the method are introduced.

# Installation

Download the package from Bioconductor 

```{r getPackage, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("dcanr")
```

Or install the development version of the package from Github.

```{r, eval = FALSE}
BiocManager::install("DavisLaboratory/dcanr")
```

Load the installed package into an R session.

```{r Load, message=FALSE}
library(dcanr)
```

# Available inference methods
The package implements 10 methods to infer differential co-expression networks
across binary conditions. The list of available methods can be accessed by the
`dcMethods()` function.

```{r echo=TRUE, message=FALSE, warning=FALSE}
library(dcanr)
dcMethods()
```

# A generic differential co-expression analysis pipeline
A differential co-expression pipeline generally consists of 4 steps:

1. Generating scores/statistics for each pair of genes
2. Assessing scores using statistical tests
3. Correcting for multiple hypothesis testing
4. Selecting differential associations

Not all methods follow this pipeline. EBcoexpress computes posterior
probabilities therefore no statistical test needs to be performed and steps 2-3
can be skipped. Like-wise DiffCoEx does not perform any statistical tests and
instead performs a soft-thresholding on the scores. FTGI performs a statistical
test and $p$-values from this test are used as scores, therefore step 2 is
skipped. A standard analysis with the z-score method using all 4 steps is shown
here.

# Load an example dataset (simulated)
We first load an example simulated dataset (included in the package) to extract
the expression matrix and condition vector. Please note that multiple knock-down
experiments are performed per simulation and we use one such knock-down as a
condition here. The list of all knock-downs can be retrieved using
`getConditionNames()`.

```{r}
#load data
data(sim102)
#get available conditions
getConditionNames(sim102)
#get expression data and conditions for 'UME6' knock-down
simdata <- getSimData(sim102, cond.name = 'UME6', full = FALSE)
emat <- simdata$emat
ume6_kd <- simdata$condition
print(emat[1:5, 1:5]) #149 genes and 406 samples
head(ume6_kd) #NOTE: binary conditions encoded with 1's and 2's
```

# Step 1: Compute scores
All inference methods can be accessed using the same call therefore making it
easier to change between methods. Method specific parameters can be passed to
this function and will be managed accordingly. The default inference method is
z-score therefore it does not need to be specified via `dc.method`. We recommend
using the Spearman correlation as a measure of correlation as it is robust to
outliers which may be present in RNA-seq data.

```{r}
#apply the z-score method with Spearman correlations
z_scores <- dcScore(emat, ume6_kd, dc.method = 'zscore', cor.method = 'spearman')
print(z_scores[1:5, 1:5])
```

# Step 2: Perform a statistical test
Appropriate statistical tests are automatically selected for the method applied.
Tests are applied on the result of the `dcScore()` function (z-test for the 
z-score method and permutation tests for other methods). The testing
function returns the score matrix (unmodified) if the method is either 
EBcoexpress, FTGI or DiffCoEx.

*NOTE: Do NOT modify the result of the scoring method as this will result in
failure of the testing function. This is intended as tests should be performed
for all computed scores to prevent bias in the subsequent correction for
multiple hypothesis testing. The same applies for the next step.*

```{r}
#perform a statistical test: the z-test is selected automatically
raw_p <- dcTest(z_scores, emat, ume6_kd)
print(raw_p[1:5, 1:5])
```

For methods such as MINDy that require a permutation test, the number of
permutations can be specified by the `B` parameter. Permutation tests are
computationally expensive therefore we also provide a parallelised
implementation. See the help page of `dcTest` for examples.

# Step 3: Correcting for multiple hypothesis testing
Since all pairwise combinations of genes are tested, $p$-values need to be
adjusted. Given $n$ genes, the total number of hypothesis is $\frac{n(n-1)}{2}$
as the score matrices are symmetric. Adjustment is performed accordingly. The
default adjustment function is `stats::p.adjust` with the 'fdr' method used,
however, custom functions and their parameters can be specified instead.
`dcAdjust` provides a wrapper to apply an adjustment method to the raw $p$-value
matrix. Results from EBcoexpress and DiffCoEx remain unmodified.

```{r}
#adjust p-values (raw p-values from dcTest should NOT be modified)
adj_p <- dcAdjust(raw_p, f = p.adjust, method = 'fdr')
print(adj_p[1:5, 1:5])
```

# Step 4: Generating the differential co-expression network
The last step is thresholding the score/adjusted $p$-value matrix to select
differential associations. Default adjusted $p$-value thresholds of 0.1 are applied
where statistical tests are performed (to control for FDR at 0.1). Results are
presented as an `igraph` object shown below where edges are coloured based on
the score (negative to positive scores are represented using the purple to green
gradient of colours).

```{r message=FALSE, warning=FALSE, fig.wide=TRUE}
library(igraph)

#get the differential network
dcnet <- dcNetwork(z_scores, adj_p)
plot(dcnet, vertex.label = '')
#convert to an adjacency matrix
adjmat <- as_adj(dcnet, sparse = FALSE)
print(adjmat[1:5, 1:5])
#convert to a data.frame
edgedf <- as_data_frame(dcnet, what = 'edges')
print(head(edgedf))
```

# Session info {.unnumbered}
```{r sessionInfo, echo=FALSE}
sessionInfo()
```