---
title: "Getting Started With SEraster"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started With SEraster}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6,
  fig.height=4,
  dpi = 72,
  dev = "png"
)
```

# Introduction

`SEraster` is a rasterization preprocessing framework that aggregates cellular information into spatial pixels to reduce resource requirements for spatial omics data analysis.

`SEraster` reduces the number of spatial points in spatial omics datasets for downstream analysis through a process of rasterization where single cells' gene expression or cell-type labels are aggregated into equally sized pixels based on a user-defined `resolution`. Here, we refer to a particular `resolution` of rasterization by the side length of the pixel such that finer `resolution` indicates smaller pixel size and coarser `resolution` indicates larger pixel size.

More details describing `SEraster` are available in [our paper](https://doi.org/10.1093/bioinformatics/btae412).

This tutorial walks you through the basic functionalities of `SEraster` and two examples of downstream analysis that can be performed with the rasterized spatial omics data. Additional vignettes are available on [our documentation website](jef.works/SEraster/).

For downstream analyses, we will be using [`nnSVG`](https://bioconductor.org/packages/nnSVG) for spatially variable gene (SVG) analysis and [`CooccurrenceAffinity`](https://cran.r-project.org/package=CooccurrenceAffinity) for cell-type co-enrichment analysis.

References for nnSVG and CooccurrenceAffinity can be found below:

-   [Weber, L. et al. (2023), "nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes", *Nature Communications*](https://doi.org/10.1038/s41467-023-39748-z)
-   [Mainali, K. et al. (2021), "A better index for analysis of co-occurrence and similarity", *Science Advances*](https://doi.org/10.1126/sciadv.abj9204)
-   [Mainali,K. et al. (2022), "CooccurrenceAffinity: An R package for computing a novel metric of affinity in co-occurrence data that corrects for pervasive errors in traditional indices", *bioRxiv*](https://doi.org/10.1101/2022.11.01.514801)

# Installation

To install this package using Bioconductor, start R (version "4.4.0") and enter:

```{r, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SEraster")
```

The latest development version can also be installed from [GitHub](https://github.com/JEFworks-Lab/SEraster) using `remotes`:

```{r, eval=FALSE}
require(remotes)
remotes::install_github('JEFworks-Lab/SEraster')
```

In addition, `SEraster` is also compatible with `SeuratObject` through `SeuratWrappers`. `SeuratWrappers` implementation can be installed using `remotes`:

```{r, eval=FALSE}
require(remotes)
remotes::install_github('satijalab/seurat-wrappers@SEraster')
```

Documentation and tutorial for the `SeuratWrappers` implementation can be found in the `SEraster` branch of the [`SeuratWrappers` GitHub repository](https://github.com/satijalab/seurat-wrappers/tree/SEraster).

# Dataset

In the examples below, we will use the MERFISH mouse preoptic area (mPOA) dataset available in the `SEraster` package. The dataset is taken from [Moffitt, J. et al. (2018), "Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region", *Science*](https://doi.org/10.1126/science.aau5324) and formatted as a `SpatialExperiment` object as shown [here](https://jef.works/SEraster/reference/merfish_mousePOA.html).

Please refer to the following documentations to see how you can format your data into a `SpatialExperiment` object:

-   [SpatialExperiment](https://bioconductor.org/packages/SpatialExperiment) package
-   [Formatting a SpatialExperiment Object for SEraster](https://jef.works/SEraster/articles/formatting-SpatialExperiment-for-SEraster.html)

# Tutorial

## Load libraries

```{r, message=FALSE}
library(SEraster)
library(SpatialExperiment)
library(nnSVG)
library(CooccurrenceAffinity)
library(ggplot2)
```

## Load example dataset

```{r}
data("merfish_mousePOA")

# check the dimension of the genes-by-cells matrix at single-cell resolution
dim(merfish_mousePOA)

# check the number of cell-types
length(unique(colData(merfish_mousePOA)$celltype))
```

This MERFISH mPOA dataset contains 6,509 cells and 16 cell-types.

```{r}
# plot at single-cell resolution
df <- data.frame(spatialCoords(merfish_mousePOA), celltype = colData(merfish_mousePOA)$celltype)

ggplot(df, aes(x = x, y = y, col = celltype)) +
  coord_fixed() +
  geom_point(size = 1.5, stroke = 0) +
  guides(col = guide_legend(override.aes = list(size = 3))) +
  labs(x = "x (μm)",
       y = "y (μm)",
       col = "Cell-types") +
  theme_bw() +
  theme(panel.grid = element_blank())
```

## SEraster basic functionalities

`SEraster` reduces the number of spatial points in spatial omics datasets for downstream analysis through a process of rasterization where single cells’ gene expression or cell-type labels are aggregated into equally sized square or hexagonal pixels (can be changed using the `square` argument) based on a user-defined resolution.

Here, we demonstrate the basic functionalities of `SEraster`.

### Rasterize gene expression

For continuous variables such as gene expression or other molecular information (e.g. protein expression if you are using spatial proteomics datasets), `SEraster` aggregates the observed raw counts or normalized expression values for each molecule within each pixel using means by default (can be changed using the `fun` argument).

Let's try rasterizing the gene expression of the MERFISH mouse POA dataset we loaded.

```{r}
rastGexp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = 50)

# check the dimension of the genes-by-cells matrix after rasterizing gene expression
dim(rastGexp)
```

As you can see, `SEraster` aggregated 6,509 single cells into 1,301 pixels.

```{r}
# plot total rasterized gene expression
SEraster::plotRaster(rastGexp, name = "Total rasterized gene expression")
```

```{r}
# plot a specific gene
SEraster::plotRaster(rastGexp, feature_name = "Esr1", name = "Esr1")
```

### Rasterize gene expression within cell-type

Such rasterization can also be performed in a cell-type-specific manner by restricting to cells of a particular cell-type prior to rasterization. Here, we subset the dataset to Inhibitory cell-type and run `SEraster` on the subsetted dataset.

```{r}
# rasterize cell-type specific gene expression by subsetting to cell-type of interest
ct_interest <- "Inhibitory"
spe_subset <- merfish_mousePOA[,merfish_mousePOA$celltype == ct_interest]

# rasterize gene expression
rastGexpSubset <- SEraster::rasterizeGeneExpression(spe_subset, assay_name="volnorm", resolution = 50)
```

```{r}
# plot
SEraster::plotRaster(rastGexpSubset, name = paste0("Total rast gexp in ", ct_interest))
```

```{r}
SEraster::plotRaster(rastGexpSubset, feature_name = "Esr1", name = paste0("Esr1 in ", ct_interest))
```

### Rasterize cell-type

For categorical variables such as cell-type or cluster labels, `SEraster` aggregates the number of cells for each label within each pixel using sums by default (can be changed using the `fun` argument).

Let's try rasterizing the cell-type labels of the MERFISH mouse POA dataset.

```{r}
rastCt <- SEraster::rasterizeCellType(merfish_mousePOA, col_name = "celltype", resolution = 50)

# check the dimension of the cell-types-by-cells matrix after rasterizing cell-type labels
dim(rastGexp)
```

```{r}
# plot total cell counts
SEraster::plotRaster(rastCt, name = "cell counts", option = "inferno")
```

```{r}
# plot specific cell-type
SEraster::plotRaster(rastCt, feature_name = "Inhibitory", name = "Inhibitory neuron counts", option = "inferno")
```

### Setting rasterization resolution

Rasterization resolution can be controlled by the `resolution` argument of the `rasterizeGeneExpression` and `rasterizeCellType` functions. Here, we refer to a particular resolution of rasterization by the side length for square pixels and the distance between opposite edges for hexagonal pixels such that finer resolution indicates smaller pixel size and vice versa.

Let's see how the rasterized MERFISH mouse POA dataset look with various resolutions using square pixels.

```{r}
resolutions <- c(50, 100, 200)
for (resolution in resolutions) {
  # rasterize at defined resolution
  temp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = resolution)
  # plot a specific gene
  plt <- SEraster::plotRaster(temp, feature_name = "Esr1", name = "Esr1", plotTitle = paste0("resolution: ", resolution))
  show(plt)
}
```

Now, let's see the same resolutions using hexagonal pixels.

```{r}
for (resolution in resolutions) {
  # rasterize at defined resolution
  temp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = resolution, square = FALSE)
  # plot a specific gene
  plt <- SEraster::plotRaster(temp, feature_name = "Esr1", name = "Esr1", plotTitle = paste0("resolution: ", resolution))
  show(plt)
}
```

### Creating and rasterizing permutations

Since rasterized values may be sensitive to edge effects such as the specific boundaries of grids upon rasterization, `SEraster` enables permutation by rotating the dataset at various angles before rasterization.

For example, let's create 3 permutations of the MERFISH mouse POA dataset, which would output a `list` of 3 `SpatialExperiment` objects with x,y coordinates rotated at 0, 120, and 240 degrees around the midrange point.

In addition to a single `SpatialExperiment` object, `rasterizeGeneExpression` and `rasterizeCellType` functions can both take a `list` of `SpatialExperiment` objects. This essentially allows users to streamline the preprocessing of permutations with `SEraster`; followed by a downstream analysis of choice. For instance, in our manuscript, we have shown that permutations can be used to improve the performance of SVG analysis.

```{r}
# permutate
spe_list <- permutateByRotation(merfish_mousePOA, n_perm = 3)

# rasterize permutated datasets at once
out_list <- rasterizeGeneExpression(spe_list, assay_name = "volnorm", resolution = 50)

for (i in seq_along(out_list)) {
  # extract rotated angle
  angle <- gsub("rotated_", "", paste0("rotated ", names(out_list)[[i]], " degrees"))
  # plot a specific gene
  plt <- SEraster::plotRaster(out_list[[i]], feature_name = "Esr1", name = "Esr1", plotTitle = angle)
  show(plt)
}
```

As you can see from the plots above, when `SEraster` rasterizes a `list` of `SpatialExperiment` objects, all `SpatialExperiment` objects in the inputted `list` are rasterized with the same pixel coordinate framework (same bounding box, resolution, centroid coordinates). This feature may not be particularly useful for permutations; however, it can potentially be applied to compare two or more datasets, such as structurally aligned tissues as well as healthy vs. disease tissues.

## Examples of downstream analyses after SEraster preprocessing

### Spatial variable gene (SVG) analysis

Here, we use a previously developed tool called `nnSVG`. Please refer to [nnSVG](https://bioconductor.org/packages/nnSVG) for more details about the package. We can directly input rasterized gene expression `SpatialExperiment` object from `SEraster` into `nnSVG`.

```{r}
# run nnSVG
set.seed(0)
rastGexp <- nnSVG(rastGexp, assay_name = "pixelval")
```

```{r}
# number of significant SVGs based on the selected adjusted p-value threshold
table(rowData(rastGexp)$padj <= 0.05)
```

```{r}
# plot rasterized gene expression of top-ranked SVG
top_svg <- which(rowData(rastGexp)$rank == 1)
top_svg_name <- rownames(rowData(rastGexp))[top_svg]
SEraster::plotRaster(rastGexp, feature_name = top_svg_name, name = top_svg_name)
```

We can also perform cell-type specific SVG analysis by subsetting the dataset prior to applying `SEraster`.

```{r}
# subset data
ct_interest <- "Excitatory"
spe_sub <- merfish_mousePOA[,merfish_mousePOA$celltype == ct_interest]

# run SEraster
rastGexp_sub <- SEraster::rasterizeGeneExpression(spe_sub, assay_name="volnorm", resolution = 50)

# run nnSVG
set.seed(0)
rastGexp_sub <- nnSVG(rastGexp_sub, assay_name = "pixelval")
```

```{r}
# number of significant SVGs
table(rowData(rastGexp_sub)$padj <= 0.05)
```

```{r}
# plot rasterized gene expression of top-ranked SVG
top_svg <- which(rowData(rastGexp_sub)$rank == 1)
top_svg_name <- rownames(rowData(rastGexp_sub))[top_svg]
SEraster::plotRaster(rastGexp_sub, feature_name = top_svg_name, name = top_svg_name)
```

### Cell-type co-enrichment analysis

Rasterized cell-type labels can be used to analyze pair-wise cell-type co-enrichment To do so, we binarize the rasterized cell-type labels using a relative enrichment metric and a previously developed tool called `CooccurrenceAffinity`. Please refer to our paper for more details about the methodology and [CooccurrenceAffinity](https://CRAN.R-project.org/package=CooccurrenceAffinity) for more details about the package.

```{r}
# extract cell-type labels
ct_labels <- as.factor(colData(merfish_mousePOA)$celltype)

# compute relative enrichment (RE) metric
mat <- assay(rastCt, "pixelval")
mat_re <- do.call(rbind, lapply(rownames(rastCt), function(ct_label) {
    mat[ct_label,] / (sum(mat[ct_label,]) / sum(mat) * colSums(mat))
}))
rownames(mat_re) <- rownames(mat)

# binarize
mat_bin <- ifelse(mat_re >= 1, 1, 0)

# add RE and binarized layers to SpatialExperiment object
assays(rastCt) <- list(pixelval = assay(rastCt, "pixelval"), re = mat_re, bin = mat_bin)
```

```{r}
ct_interest <- "Ependymal"

# plot pixel value for a cell-type of interest
plotRaster(rastCt, assay_name = "pixelval", feature_name = ct_interest, name = "cell-type counts", option = "inferno")
```

```{r}
# plot RE value for a cell-type of interest
plotRaster(rastCt, assay_name = "re", feature_name = ct_interest, name = "RE", option = "inferno")
```

```{r}
# plot binarized value for a cell-type of interest
plotRaster(rastCt, assay_name = "bin", feature_name = ct_interest, factor_levels = c(0,1), name = "binarized", option = "inferno")
```

```{r}
# run CooccurrenceAffinity
ct_coocc <- CooccurrenceAffinity::affinity(data = mat_bin, row.or.col = "row", squarematrix = c("all"))

# plot maximum likelihood estimates of affinity metric (alpha MLE)
CooccurrenceAffinity::plotgg(data = ct_coocc, variable = "alpha_mle", legendlimit = "datarange")
```

# Session Information

```{r}
sessionInfo()
```