---
title: "CoreGx: Class and Function Abstractions for PharmacoGx, RadioGx and ToxicoGx"

author:
- name: Petr Smirnov
- name: Ian Smith
- name: Christopher Eeles
- name: Benjamin Haibe-Kains

output: BiocStyle::html_document

vignette: |
  %\VignetteEngine{knitr::knitr}
  %\VignetteIndexEntry{CoreGx: Class and Function Abstractions}
---
# CoreGx

```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
```

This package provides a foundation for the PharmacoGx, RadioGx and ToxicoGx
packages. It is not intended for standalone use, only as a dependency for the
aforementioned software. Its existence allows abstracting generic definitions,
method definitions and class structures common to all three of the Gx suite
packages.

## Importing and Using CoreGx

Load the pacakge:

```{r coregx_load, hide=TRUE, message=FALSE}
library(CoreGx)
library(Biobase)
library(SummarizedExperiment)
```

## The CoreSet Class

The `CoreSet` class is intended as a general purpose data structure for storing
multiomic treatment response data. Extensions of this class have been
customized for their respective fields of study. For example, the `PharmacoSet`
class inherits from the `CoreSet` and is specialized for storing and analyzing
drug sensitivity and perturbation experiments on cancer cell lines together with
associated multiomic data for each treated sample. The RadioSet class serves
a role similar to the PharmacoSet with radiation instead of drug treatments.
Finally, the ToxicoSet class is used to store toxicity data for healthy
human and rat hepatocytes along with the associated multiomic profile for each
treatment.

```{r CoreSet_class}
getClass("CoreSet")
```

The `annotation` slot holds the CoreSet name, the original constructor call, and
a range of metadata about the R session in which the constructor was called.
This allows easy comparison of `CoreSet` versions across time and ensures the
code used to generate a `CoreSet` is well-documented and reproducible.

The `molecularProfiles` slot contains a list of `SummarizedExperiment` objects
for each multi-omic molecular datatype available for a given experiment. Within
the `SummarizedExperiments` are feature and sample annotations for each data
type. We are currently in the process of adopting the `MultiAssayExperiment`
class instead of a `list` for storing molecular profile `SummarizedExperiment`s.
However, the `list` version of the `molecularProfiles` slot is still supported
for backwards compatability.

The `sample` slot contains a `data.frame` with annotations for samples used in
the molecularProfiles or sensitivity slot. It should at minimum have the
standardized column 'sampleid', with a unique identifier for each sample in the
`CoreSet`.

The `treatment` slot contains a `data.frame` of metadata for treatments applied
to samples in the `molecularProfiles` or `treatmentResponse` slot. It should at
minimum have the standarized column 'treatmentid', containing a unique
identifier for each treatment in the `CoreSet`.

The `datasetType` slot contains a character vector indicating the experiment
type the `CoreSet` contains. This slot is soft deprecated and may be removed
in future updates.

The `treatmentResponse` slot contains a list of raw, curated and meta data for
treatment-response experiments. We are currently in the process of adopting
our new S4-class, the `TreamtentResponseExperiment` to store treatment-response
data within a `CoreSet` and inheriting classes. However, the old `list` format
for sensitivity experiments will continue to be support for backwards
compatability.

The `perturbation` slot contains a list of raw, curated and meta data for
perturbation experiments. This slot is soft-deprecated and may be removed
in the future. The reason is that treatment perturbation experiments can be
efficiently stored in the `colData` slot of their respective
`SummarizedExperiment` objects and thus no longer require their own space within
a `CoreSet`.

The `curation` slot contains a list of manually curated identifiers
such as standardized cell-line, tissue and treatment names. Inclusion of such
identifiers ensures a consistent nomenclature is used across all datasets
curated into the classes inheriting from the `CoreSet`, enabling results from
such datasets to be easily compared to validate results from published studies or
combine them for use in larger meta-analyses. The slot contains a list of
`data.frame`s, one for each entity, and should at minimum include a mapping from
curated identifiers used throughout the object to those used in the original
dataset publication.

The `CoreSet` class provides a set of standardized accessor methods which
simplify curation, annotation, and retrieval of data associated with a specfic
treatment response experiment. All accessors are implemented as generics to
allow new methods to be defined on classes inheriting from the `CoreSet`.

```{r CoreSet_accessors}
methods(class="CoreSet")
```

We have provided a sample `CoreSet` (cSet) in this package. In the below code
we load the example cSet and demonstrate a few of the accessor methods.

```{r the_cSet_object}
data(clevelandSmall_cSet)
clevelandSmall_cSet
```

Access a specific molecular profiles:

```{r cSet_accessor_demo}
mProf <- molecularProfiles(clevelandSmall_cSet, "rna")
mProf[seq_len(5), seq_len(5)]
```

Access cell-line metadata:

```{r cSet_accessor_demo1}
cInfo <- sampleInfo(clevelandSmall_cSet)
cInfo[seq_len(5), seq_len(5)]
```

Access treatment-response data:

```{r cSet_accessor_demo2}
sensProf <- sensitivityProfiles(clevelandSmall_cSet)
sensProf[seq_len(5), seq_len(5)]
```

For more information about the accessor methods available for the `CoreSet`
class please see the `class?CoreSet` help page.

## Extending the CoreSet Class

Given that the CoreSet class is intended for extension, we will show some
examples of how to define a new class based on it and implement new methods
for the generics provided for the CoreSet class.

Here we will define a new class, the `DemoSet`, with an additional slot, the
`demoSlot`. We will then view the available methods for this class as well as
define new S4 methods on it.

```{r defining_a_new_class}
DemoSet <- setClass("DemoSet",
                    representation(demoSlot="character"),
                    contains="CoreSet")
getClass("DemoSet")
```

Here we can see the class extending `CoreSet` has all of the same slots as the
original `CoreSet`, plus the new slot we defined: `demoSlot`.

We can see which methods are available for this new class.

```{r inherit_methods}
methods(class="DemoSet")
```

We see that all the accessors defined for the `CoreSet` are also defined for the
inheriting `DemoSet`. These methods all assume the inherit slots have the same
structure as the `CoreSet`. If this is not true, for example, if molecularProfiles
holds `ExpressionSet`s instead of `SummarizedExperiment`s, we can redefine
existing methods as follows:

```{r defining_methods_for_a_new_class}
clevelandSmall_dSet <- DemoSet(clevelandSmall_cSet)
class(clevelandSmall_dSet@molecularProfiles[['rna']])

expressionSets <- lapply(molecularProfilesSlot(clevelandSmall_dSet), FUN=as,
  'ExpressionSet')
molecularProfilesSlot(clevelandSmall_dSet) <- expressionSets

# Now this will error
tryCatch({molecularProfiles(clevelandSmall_dSet, 'rna')},
         error=function(e)
             print(paste("Error: ", e$message)))
```

Since we changed the data in the `molecularProfiles` slot of the `DemoSet`,
the original method from `CoreGx` no longer works. Thus we get an error when
trying to access that slot. To fix this we will need to set a new S4 method
for the molecularProfiles generic function defined in `CoreGx`.

```{r redefine_the_method}
setMethod(molecularProfiles,
          signature("DemoSet"),
          function(object, mDataType) {
            pData(object@molecularProfiles[[mDataType]])
          })
```

This new method is now called whenever we use the `molecularProfiles` method
on a `DemoSet`. Since the new method uses `ExpressionSet` accessor methods
instead of `SummarizedExperiment` accessor methods, we now expect to be able
to access the data in our modified slot.

```{r testing_new_method}
# Now we test our new method
mProf <- molecularProfiles(clevelandSmall_dSet, 'rna')
head(mProf)[seq_len(5), seq_len(5)]
```

We can see our new method works! In order to finish updating the methods
for our new class, we would have to redefine all the methods which access the
modified slot.

However, additional work needs to be done to define accessors for the new
`demoSlot`. Since no generics are available in CoreGx to access this slot,
we need to first define a generic, then implement methods which dispatch on
the 'DemoSet' class to retrieve data in the slot.

```{r defining_setter_methods}
# Define generic for setter method
setGeneric('demoSlot<-', function(object, value) standardGeneric('demoSlot<-'))

# Define a setter method
setReplaceMethod('demoSlot',
                 signature(object='DemoSet', value="character"),
                 function(object, value) {
                   object@demoSlot <- value
                   return(object)
                 })

# Lets add something to our demoSlot
demoSlot(clevelandSmall_dSet) <- c("This", "is", "the", "demoSlot")
```

```{r defining_getter_methods}
# Define generic for getter method
setGeneric('demoSlot', function(object, ...) standardGeneric("demoSlot"))

# Define a getter method
setMethod("demoSlot",
          signature("DemoSet"),
          function(object) {
            paste(object@demoSlot, collapse=" ")
          })

# Test our getter method
demoSlot(clevelandSmall_dSet)
```

Now you should have all the knowledge you need to extend the CoreSet class
for use in other treatment-response experiments!

For more information about this package and the possibility of collaborating on
its extension please contact benjamin.haibe.kains@utoronto.ca.

# sessionInfo

```{r sessionInfo}
sessionInfo()
```