---
title: "Introduction to the SpatialExperiment class"
author: "Dario Righelli, Helena L. Crowell, Lukas M. Weber"
date: "`r format(Sys.Date(), '%b %d, %Y')`"
output:
    BiocStyle::html_document:
        toc: true
        number_sections: true
        toc_depth: 3
        toc_float:
            collapsed: true
vignette: >
    %\VignetteIndexEntry{Introduction to the SpatialExperiment class}
    %\VignetteEncoding{UTF-8}
    %\VignetteEngine{knitr::rmarkdown}
editor_options: 
    chunk_output_type: inline
---

<style type="text/css"> .smaller { font-size: 10px } </style>

```{r setup, include = FALSE}
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE, cache.lazy = FALSE)
```


# Class structure

## Introduction

The `SpatialExperiment` class is an R/Bioconductor S4 class for storing 
data from spatially resolved transcriptomics (ST) experiments. The class 
extends the `SingleCellExperiment` class for single-cell data to support 
storage and retrieval of additional information from spot-based and 
molecule-based ST platforms, including spatial coordinates, images, and 
image metadata. A specialized constructor function is included for data 
from the 10x Genomics Visium platform.

The following schematic illustrates the `SpatialExperiment` class 
structure.

```{r, echo=FALSE, out.width = "100%", fig.cap="Overview of the SpatialExperiment class structure."}
knitr::include_graphics("SPE.png")
```

As shown, an object consists of: (i) `assays` containing expression counts, 
(ii) `rowData` containing information on features, i.e. genes, (iii) 
`colData` containing information on spots or cells, including nonspatial 
and spatial metadata, (iv) `spatialCoords` containing spatial coordinates, 
and (v) `imgData` containing image data. For spot-based ST data (e.g. 10x 
Genomics Visium), a single `assay` named `counts` is used. For molecule-based 
ST data (e.g. seqFISH), two `assays` named `counts` and `molecules` can be used.

Additional details on the class structure are provided in our 
[preprint](https://www.biorxiv.org/content/10.1101/2021.01.27.428431v3).


## Load data

For demonstration of the general class structure, we load an example 
`SpatialExperiment` (abbreviated as SPE) object (variable `spe`):

```{r message = FALSE}
library(SpatialExperiment)
example(read10xVisium, echo = FALSE)
spe
```


## `spatialCoords`

In addition to observation metadata stored inside the `colData` slot, 
the `SpatialExperiment` class stores spatial coordinates as:

- `spatialCoords`, a numeric matrix of spatial coordinates (e.g. `x` and `y`)

`spatialCoords` are stored inside the `int_colData`, and are directly 
accessible via the corresponding accessor:

```{r}
head(spatialCoords(spe))
```

The corresponding column names can be also accessed with `spatialCoordsNames()`:

```{r}
spatialCoordsNames(spe)
```


## `imgData`

All image related data are stored inside the `int_metadata`'s 
`imgData` field as a `DataFrame` of the following structure: 

* each row corresponds to one image for a given sample  
and with a given unique image identifier (e.g. its resolutions)
* for each image, columns specify:
  * which `sample_id` the image belongs to
  * a unique `image_id` in order to accommodate multiple images  
  for a given sample (e.g. of different resolutions)
  * the image's `data` (a `SpatialImage` object)
  * the `scaleFactor` that adjusts pixel positions of the original,  
  full-resolution image to pixel positions in the image

The `imgData()` accessor can be used to retrieve 
the image data stored within the object:

```{r}
imgData(spe)
```


### The `SpatialImage` class

Images are stored inside the `data` field of the `imgData` as a list of 
`SpatialImage`s. Each image may be of one of the following sub-classes:

* `LoadedSpatialImage`
  * represents an image that is fully realized into memory as a `raster` object
  * `@image` contains a `raster` object: a matrix 
  of RGB colors for each pixel in the image
* `StoredSpatialImage`
  * represents an image that is stored in a local file (e.g., as  
  .png, .jpg or .tif), and loaded into memory only on request
  * `@path` specifies a local file from which to retrieve the image
* `RemoteSpatialImage`
  * represents an image that is remotely hosted  
  (under some URL), and retrieved only on request
  * `@url` specifies where to retrieve the image from

A `SpatialImage` can be accessed using `getImg()`, 
or retrieved directly from the `imgData()`:

```{r}
(spi <- getImg(spe))
identical(spi, imgData(spe)$data[[1]])
```

Data available in an object of class `SpatialImage` may be 
accessed via the `imgRaster()` and `imgSource()` accessors:

```{r fig.small = TRUE}
plot(imgRaster(spe))
```


### Adding or removing images

Image entries may be added or removed from a `SpatialExperiment`'s 
`imgData` `DataFrame` using `addImg()` and `rmvImg()`, respectively.

Besides a path or URL to source the image from and a numeric scale factor, 
`addImg()` requires specification of the `sample_id` the new image belongs to, 
and an `image_id` that is not yet in use for that sample:

```{r fig.small = TRUE, eval = TRUE}
url <- "https://i.redd.it/3pw5uah7xo041.jpg"
spe <- addImg(spe, 
    sample_id = "section1", 
    image_id = "pomeranian",
    imageSource = url, 
    scaleFactor = NA_real_, 
    load = TRUE)
img <- imgRaster(spe, 
    sample_id = "section1", 
    image_id = "pomeranian")
plot(img)
```

The `rmvImg()` function is more flexible in the specification 
of the `sample_id` and `image_id` arguments. Specifically:

- `TRUE` is equivalent to *all*, e.g.  
`sample_id = "<sample>"`, `image_id = TRUE`  
will drop all images for a given sample
- `NULL` defaults to the first entry available, e.g.  
`sample_id = "<sample>"`, `image_id = NULL`  
will drop the first image for a given sample

For example, `sample_id = TRUE`, `image_id = TRUE` will specify all images; 
`sample_id = NULL`, `image_id = NULL` corresponds to the first image entry in the `imgData`; 
`sample_id = TRUE`, `image_id = NULL` equals the first image for all samples; and 
`sample_id = NULL`, `image_id = TRUE` matches all images for the first sample.

Here, we remove `section1`'s `pomeranian` image added in the previous 
code chunk; the image is now completely gone from the `imgData`:

```{r}
imgData(spe <- rmvImg(spe, "section1", "pomeranian"))
```


# Object construction

## Manually

The `SpatialExperiment` constructor provides several arguments 
to give maximum flexibility to the user.

In particular, these include: 

- `spatialCoords`, a numeric `matrix` containing spatial coordinates
- `spatialCoordsNames`, a character vector specifying which  
`colData` fields correspond to spatial coordinates

`spatialCoords` can be supplied via `colData`
by specifying the column names that correspond to spatial coordinates 
with `spatialCoordsNames`:

```{r}
n <- length(z <- letters)
y <- matrix(nrow = n, ncol = n)
cd <- DataFrame(x = seq(n), y = seq(n), z)

spe1 <- SpatialExperiment(
    assay = y, 
    colData = cd, 
    spatialCoordsNames = c("x", "y"))
```

Alternatively, `spatialCoords` may be supplied separately
as a `matrix`, e.g.:

```{r}
xy <- as.matrix(cd[, c("x", "y")])

spe2 <- SpatialExperiment(
    assay = y, 
    colData = cd["z"], 
    spatialCoords = xy)
```

Importantly, both of the above `SpatialExperiment()` function calls 
lead to construction of the exact same object:

```{r}
identical(spe1, spe2)
```

Finally, `spatialCoords(Names)` are optional, i.e., 
we can construct a SPE using only a subset of the above arguments:

```{r}
spe <- SpatialExperiment(
    assays = y)
isEmpty(spatialCoords(spe))
```

In general, `spatialCoordsNames` takes precedence over `spatialCoords`,
i.e., if both are supplied, the latter will be ignored. In other words,
`spatialCoords` are preferentially extracted from the `DataFrame`
provided via `colData`. E.g., in the following function call, 
`spatialCoords` will be ignored, and xy-coordinates are instead extracted
from the input `colData` according to the specified `spatialCoordsNames`. 
In this case, a message is also provided:

```{r results = "hide"}
n <- 10; m <- 20
y <- matrix(nrow = n, ncol = m)
cd <- DataFrame(x = seq(m), y = seq(m))
xy <- matrix(nrow = m, ncol = 2)
colnames(xy) <- c("x", "y")

SpatialExperiment(
    assay = y, 
    colData = cd,
    spatialCoordsNames = c("x", "y"),
    spatialCoords = xy)
```


## Spot-based

When working with spot-based ST data, such as *10x Genomics Visium* or other 
platforms providing images, it is possible to store the image information 
in the dedicated `imgData` structure.

Also, the `SpatialExperiment` class stores a `sample_id` value in the
`colData` structure, which is possible to set with the `sample_id` 
argument (default is "sample_01").

Here we show how to load the default *Space Ranger* data files from a 
10x Genomics Visium experiment, and build a `SpatialExperiment` object.

In particular, the `readImgData()` function is used to build an `imgData`
`DataFrame` to be passed to the `SpatialExperiment` constructor.
The `sample_id` used to build the `imgData` object must be the same one 
used to build the `SpatialExperiment` object, otherwise an error is returned.

```{r}
dir <- system.file(
   file.path("extdata", "10xVisium", "section1", "outs"),
   package = "SpatialExperiment")

# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)

# read in image data
img <- readImgData(
    path = file.path(dir, "spatial"),
    sample_id = "foo")

# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
    col.names = c(
        "barcode", "in_tissue", "array_row", "array_col",
        "pxl_row_in_fullres", "pxl_col_in_fullres"))

# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
    symbol = rowData(sce)$Symbol)

# construct 'SpatialExperiment'
(spe <- SpatialExperiment(
    assays = list(counts = assay(sce)),
    rowData = rd, 
    colData = DataFrame(xyz), 
    spatialCoordsNames = c("pxl_col_in_fullres", "pxl_row_in_fullres"),
    imgData = img,
    sample_id = "foo"))
```

Alternatively, the `read10xVisium()` function facilitates the import of 
*10x Genomics Visium* data to handle one or more samples organized in
folders reflecting the default *Space Ranger* folder tree organization, 
as illustrated below (where "raw/filtered" refers to either "raw" or 
"filtered" to match the `data` argument). Note that the base directory 
"outs/" from Space Ranger can either be included manually in the paths 
provided in the `samples` argument, or can be ignored; if ignored, it will 
be added automatically. The `.h5` files are used if `type = "HDF5"`.

```{bash, eval = FALSE}
sample
 . | — outs
 · · | — raw/filtered_feature_bc_matrix.h5
 · · | — raw/filtered_feature_bc_matrix
 · · · | — barcodes.tsv.gz
 · · · | — features.tsv.gz
 · · · | — matrix.mtx.gz
 · · | — spatial
 · · · | — scalefactors_json.json
 · · · | — tissue_lowres_image.png
 · · · | — tissue_positions_list.csv
```

Using `read10xVisium()`, the above code to construct the same SPE is reduced to:

```{r}
dir <- system.file(
    file.path("extdata", "10xVisium"),
    package = "SpatialExperiment")

sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids, "outs")

(spe10x <- read10xVisium(samples, sample_ids,
    type = "sparse", data = "raw",
    images = "lowres", load = FALSE))
```

Or alternatively, omitting the base directory `outs/` from Space Ranger:

```{r}
samples2 <- file.path(dir, sample_ids)

(spe10x2 <- read10xVisium(samples2, sample_ids,
    type = "sparse", data = "raw",
    images = "lowres", load = FALSE))
```


## Molecule-based

To demonstrate how to accommodate molecule-based ST data 
(e.g. *seqFISH* platform) inside a `SpatialExperiment` object, 
we generate some mock data of 1000 molecule coordinates across 
50 genes and 20 cells. These should be formatted into a `data.frame` 
where each row corresponds to a molecule, and columns specify the 
xy-positions as well as which gene/cell the molecule has been assigned to: 

```{r message = FALSE, warning = FALSE}
n <- 1e3  # number of molecules
ng <- 50  # number of genes
nc <- 20  # number of cells
# sample xy-coordinates in [0, 1]
x <- runif(n)
y <- runif(n)
# assign each molecule to some gene-cell pair
gs <- paste0("gene", seq(ng))
cs <- paste0("cell", seq(nc))
gene <- sample(gs, n, TRUE)
cell <- sample(cs, n, TRUE)
# assure gene & cell are factors so that
# missing observations aren't dropped
gene <- factor(gene, gs)
cell <- factor(cell, cs)
# construct data.frame of molecule coordinates
df <- data.frame(gene, cell, x, y)
head(df)
```

Next, it is possible to re-shape the above table into a 
`r BiocStyle::Biocpkg("BumpyMatrix")` using `splitAsBumpyMatrix()`, which takes 
as input the xy-coordinates, as well as arguments specifying the row and column 
index of each observation:

```{r message = FALSE, warning = FALSE}
# construct 'BumpyMatrix'
library(BumpyMatrix)
mol <- splitAsBumpyMatrix(
    df[, c("x", "y")], 
    row = gene, col = cell)
```

Finally, it is possible to construct a `SpatialExperiment` object with two data 
slots: 

- The `counts` assay stores the number of molecules per gene and cell  
(equivalent to transcript counts in spot-based data)
- The `molecules` assay holds the spatial molecule positions (xy-coordinates)

Each entry in the `molecules` assay is a `DFrame` that contains the positions 
of all molecules from a given gene that have been assigned to a given cell. 

```{r message = FALSE, warning = FALSE}
# get count matrix
y <- with(df, table(gene, cell))
y <- as.matrix(unclass(y))
y[1:5, 1:5]
# construct SpatialExperiment
spe <- SpatialExperiment(
    assays = list(
        counts = y, 
        molecules = mol))
spe
```

The `BumpyMatrix` of molecule locations can be accessed 
using the dedicated `molecules()` accessor:

```{r message = FALSE, warning = FALSE}
molecules(spe)
```


# Common operations

## Subsetting

Subsetting objects is automatically defined to synchronize across 
all attributes, as for any other Bioconductor *Experiment* class.

For example, it is possible to `subset` by `sample_id` as follows:

```{r}
sub <- spe10x[, spe10x$sample_id == "section1"]
```

Or to retain only observations that map to tissue via:

```{r}
sub <- spe10x[, colData(spe10x)$in_tissue]
sum(colData(spe10x)$in_tissue) == ncol(sub)
```


## Combining samples

To work with multiple samples, the `SpatialExperiment` class provides the `cbind`
method, which assumes unique `sample_id`(s) are provided for each sample.

In case the `sample_id`(s) are duplicated across multiple samples, the `cbind`
method takes care of this by appending indices to create unique sample identifiers.

```{r}
spe1 <- spe2 <- spe
spe3 <- cbind(spe1, spe2)
unique(spe3$sample_id)
```

Alternatively (and preferentially), we can create unique 
`sample_id`(s) prior to `cbind`ing as follows:

```{r}
# make sample identifiers unique
spe1 <- spe2 <- spe
spe1$sample_id <- paste(spe1$sample_id, "A", sep = ".")
spe2$sample_id <- paste(spe2$sample_id, "B", sep = ".")

# combine into single object
spe3 <- cbind(spe1, spe2)
```


## Sample ID replacement

In particular, when trying to replace the `sample_id`(s) of a `SpatialExperiment`
object, these must map uniquely with the already existing ones, otherwise an 
error is returned.

```{r, error=TRUE}
new <- spe3$sample_id
new[1] <- "section2.A"
spe3$sample_id <- new
new[1] <- "third.one.of.two"
spe3$sample_id <- new
```

Importantly, the `sample_id` `colData` field is *protected*, i.e., 
it will be retained upon attempted removal (= replacement by `NULL`):

```{r}
# backup original sample IDs
tmp <- spe$sample_id
# try to remove sample IDs
spe$sample_id <- NULL
# sample IDs remain unchanged
identical(tmp, spe$sample_id)
```


## Image transformations

Both the `SpatialImage` (SpI) and `SpatialExperiment` (SpE) class currently support 
two basic image transformations, namely, rotation (via `rotateImg()`) and 
mirroring (via `mirrorImg()`). Specifically, for a SpI/E `x`:

* `rotateImg(x, degrees)` expects as `degrees` a single numeric in +/-[0,90,...,360].  
Here, a (negative) positive value corresponds to (counter-)clockwise rotation.
* `mirrorImg(x, axis)` expects as `axis` a character string specifying  
whether to mirror horizontally (`"h"`) or vertically (`"v"`).

Here, we apply various transformations using both a SpI (`spi`) and SpE (`spe`)
as input, and compare the resulting images to the original:


### Rotation

```{r}
# extract first image
spi <- getImg(spe10x)
# apply counter-/clockwise rotation
spi1 <- rotateImg(spi, -90)
spi2 <- rotateImg(spi, +90)
# visual comparison
par(mfrow = c(1, 3))
plot(as.raster(spi))
plot(as.raster(spi1))
plot(as.raster(spi2))
```

```{r}
# specify sample & image identifier
sid <- "section1"
iid <- "lowres"
# counter-clockwise rotation
tmp <- rotateImg(spe10x, 
    sample_id = sid, 
    image_id = iid,
    degrees = -90)
# visual comparison
par(mfrow = c(1, 2))
plot(imgRaster(spe10x, sid, iid))
plot(imgRaster(tmp, sid, iid))
```


### Mirroring

```{r}
# extract first image
spi <- getImg(spe10x)
# mirror horizontally/vertically
spi1 <- mirrorImg(spi, "h")
spi2 <- mirrorImg(spi, "v")
# visual comparison
par(mfrow = c(1, 3))
plot(as.raster(spi))
plot(as.raster(spi1))
plot(as.raster(spi2))
```

```{r}
# specify sample & image identifier
sid <- "section2"
iid <- "lowres"
# mirror horizontally
tmp <- mirrorImg(spe10x, 
    sample_id = sid, 
    image_id = iid,
    axis = "h")
# visual comparison
par(mfrow = c(1, 2))
plot(imgRaster(spe10x, sid, iid))
plot(imgRaster(tmp, sid, iid))
```


# Session Info {.smaller}

```{r tidy = TRUE}
sessionInfo()
```