Cardinal 3 lays the groundwork for future improvements to the existing toolbox of pre-processing, visualization, and statistical methods for mass spectrometry (MS) imaging experiments. Cardinal has been updated to support matter 2, and legacy support has been dropped.
Despite minimal user-visible changes in Cardinal (at first), the entire matter package that provides the backend for Cardinal’s computing on larger-than-memory MS imaging datasets has been rewritten. This should provide more robust support for larger-than-memory computations, as well as greater flexibility in handling many data files in the future.
Further changes will be coming soon to Cardinal 3 in future point updates that are aimed to greatly improve the user experience and simplify the code that users need to write to process and analyze MS imaging data.
Major improvements from earlier versions are further described below.
Cardinal 2 provides new classes and methods for the manipulation, transformation, visualization, and analysis of imaging experiments–specifically MS imaging experiments.
MS imaging is a rapidly advancing field with consistent improvements in instrumentation for both MALDI and DESI imaging experiments. Both mass resolution and spatial resolution are steadily increasing, and MS imaging experiments grow increasingly complex.
The first version of Cardinal was written with certain assumptions about MS imaging data that are no longer true. For example, the basic assumption that the raw spectra can be fully loaded into memory is unreasonable for many MS imaging experiments today.
Cardinal 2 was re-written from the ground up to handle the evolving needs of high-resolution MS imaging experiments. Some advancements include:
New imaging experiment classes such as ImagingExperiment, SparseImagingExperiment, and MSImagingExperiment provide better support for out-of-memory datasets and a more flexible representation of complex experiments
New imaging metadata classes such as PositionDataFrame and MassDataFrame make it easier to manipulate experimental runs, pixel coordinates, and m/z-values by storing them as separate slots rather than ordinary columns
New plot() and image() visualization methods that can handle non-gridded pixel coordinates and allow assigning the resulting plot (and data) to a variable for later re-plotting
Support for writing imzML in addition to reading it; more options and support for importing out-of-memory imzML for both “continuous” and “processed” formats
Data manipulation and summarization verbs such as subset(), aggregate(), and summarizeFeatures(), etc. for easier subsetting and summarization of imaging datasets
Delayed pre-processing via a new process() method that allows queueing of delayed pre-processing methods such as normalize() and peakPick() for later execution
Parallel processing support via the BiocParallel package for all pre-processing methods and any statistical analysis methods with a BPPARAM option
Classes from older versions of Cardinal should be coerced to their Cardinal 2 equivalents. For example, to return an updated MSImageSet object called x, use as(x, "MSImagingExperiment").
Cardinal can be installed via the BiocManager package.
install.packages("BiocManager")
BiocManager::install("Cardinal")The same function can be used to update Cardinal and other Bioconductor packages.
Once installed, Cardinal can be loaded with library():
library(Cardinal)MSImagingExperimentIn Cardinal, imaging experiment datasets are composed of multiple sets of metadata, in addition to the actual experimental data. These are (1) pixel metadata, (2) feature (\(m/z\)) metadata, (3) the actual imaging data, and (4) a class that holds all of these and represents the experiment as a whole.
MSImagingExperiment is a matrix-like container for complete MS imaging experiments, where the “rows” represent the mass features, and the columns represent the pixels. An MSImagingExperiment object contains the following components:
pixelData() accesses the pixel information, stored in a PositionDataFrame
featureData() accesses the feature information, stored in MassDataFrame
imageData() accesses the spectral information, stored in a ImageArrayList
Unlike many software packages designed for analysis of MS imaging experiments, Cardinal is designed to work with multiple datasets simultaneously and can integrate all aspects of experimental design and metadata.
We will use simulateImage() to prepare an example dataset.
set.seed(2020)
mse <- simulateImage(preset=1, npeaks=10, nruns=2, baseline=1)
mse## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEPositionDataFrameThe pixelData() accessor extracts the pixel information for an MSImagingExperiment. The pixel data are stored in a PositionDataFrame object, which is a type of data frame that separately tracks pixel coordinates and experimental run information.
pixelData(mse)## PositionDataFrame with 800 rows and 1 column
##        :run:   coord:x   coord:y    circle
##     <factor> <integer> <integer> <logical>
## 1       run0         1         1     FALSE
## 2       run0         2         1     FALSE
## 3       run0         3         1     FALSE
## 4       run0         4         1     FALSE
## 5       run0         5         1     FALSE
## ...      ...       ...       ...       ...
## 796     run1        16        20     FALSE
## 797     run1        17        20     FALSE
## 798     run1        18        20     FALSE
## 799     run1        19        20     FALSE
## 800     run1        20        20     FALSEThe coord() accessor specifically extracts the data frame of pixel coordinates.
coord(mse)## DataFrame with 800 rows and 2 columns
##             x         y
##     <integer> <integer>
## 1           1         1
## 2           2         1
## 3           3         1
## 4           4         1
## 5           5         1
## ...       ...       ...
## 796        16        20
## 797        17        20
## 798        18        20
## 799        19        20
## 800        20        20The run() accessor specifically extracts the vector of experimental runs.
run(mse)[1:10]##  [1] run0 run0 run0 run0 run0 run0 run0 run0 run0 run0
## Levels: run0 run1MassDataFrameThe featureData() accessor extracts the feature information for an MSImagingExperiment. The feature data are stored in a MassDataFrame object, which is a type of data frame that separately tracks the m/z-values associated with the mass spectral features.
featureData(mse)## MassDataFrame with 3919 rows and 0 columns
##           :mz:
##      <numeric>
## 1      426.529
## 2      426.699
## 3      426.870
## 4      427.041
## 5      427.211
## ...        ...
## 3915   2041.17
## 3916   2041.99
## 3917   2042.81
## 3918   2043.62
## 3919   2044.44The mz() accessor specifically extracts the vector of m/z-values.
mz(mse)[1:10]##  [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
##  [9] 427.8956 428.0668The imageData() accessor extracts the image data for an MSImagingExperiment. The data are stored in an ImageArrayList, which is a list of matrix-like objects.
It is possible to store multiple matrices of intensities in this list. Typically, only the first entry will be used by pre-processing and analysis methods.
imageData(mse)## MSContinuousImagingSpectraList of length 1 
##  names(1):    intensity
##  class(1):       matrix
##    dim(1): <3919 x 800>
##    mem(1):     25.08 MBEntries in this list can be extracted by name with iData().
iData(mse, "intensity")The spectra() accessor is a shortcut for accessing the first data matrix.
spectra(mse)[1:5, 1:5]##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.9295940 0.9779923 0.9415157 0.9115036 0.8960595
## [2,] 1.0087009 1.3108664 1.0928983 1.0243944 1.0706272
## [3,] 1.0578001 1.0625834 1.2407371 0.9319758 0.8822412
## [4,] 0.8949165 1.1568158 0.9250994 0.9499621 1.0127282
## [5,] 1.0660395 1.0123048 1.0291570 0.8999156 1.1126816Rows of these matrices correspond to mass features. Columns correspond to pixels. In other words, each column is a mass spectrum, and each row is a flattened vector of images.
In order to specialize to the needs of different datasets, Cardinal provides specialized versions of MSImagingExperiment that reflect different spectra storage strategies.
MSContinuousImagingExperiment is a specialization that enforces the data matrices be stored as dense, column-major matrices. These include R’s native matrix and matter_matc objects from the matter package.
mse## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEimageData(mse)## MSContinuousImagingSpectraList of length 1 
##  names(1):    intensity
##  class(1):       matrix
##    dim(1): <3919 x 800>
##    mem(1):     25.08 MBMSProcessedImagingExperiment is a specialization that enforces the data matrices be stored as sparse, column-major matrices. These include sparse_matc objects from the matter package. This specialization is useful for processed data.
set.seed(2020)
mse2 <- simulateImage(preset=3, npeaks=10, nruns=2)
mse3 <- as(mse2, "MSProcessedImagingExperiment")
mse3## An object of class 'MSProcessedImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(3): square1 square2 circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEimageData(mse3)## MSProcessedImagingSpectraList of length 1 
##  names(1):    intensity
##  class(1):   sparse_mat
##    dim(1): <3919 x 800>
##    mem(1):     25.15 MBBecause the data is stored sparsely, spectra from MSProcessedImagingExperiment objects are binned on-the-fly to the m/z-values specified by mz().
The resolution of the binning can be accessed by resolution().
resolution(mse3)## ppm 
## 400The resolution can be set to change how the binning is performed.
resolution(mse3) <- c(ppm=200)Changing the binned mass resolution will typically change the effective dimensions of the experiment.
dim(mse2)## Features   Pixels 
##     3919      800dim(mse3)## Features   Pixels 
##     7837      800The effective m/z-values are also updated to reflect the new bins.
mz(mse2)[1:10]##  [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
##  [9] 427.8956 428.0668mz(mse3)[1:10]##  [1] 426.5285 426.6138 426.6991 426.7845 426.8699 426.9552 427.0406 427.1261
##  [9] 427.2115 427.2969Note that the underlying data will remain unchanged, but the binned values for the intensities will be different.
Cardinal 2 natively supports reading and writing imzML (both “continuous” and “processed” versions) and Analyze 7.5 formats via the readMSIData() and writeMSIData() functions.
We will focus on imzML.
The imzML format is an open standard designed specifically for interchange of mass spectrometry imaging datasets. Many other formats can be converted to imzML with the help of free applications available online at .
Let’s create a small image to demonstrate data import/export.
set.seed(2020)
tiny <- simulateImage(preset=1, from=500, to=600, dim=c(3,3))
tiny## An object of class 'MSContinuousImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSEWe’ll also create a “processed” version for writing “processed” imzML.
tiny2 <- as(tiny, "MSProcessedImagingExperiment")
tiny2## An object of class 'MSProcessedImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSENote that despite the name, the only difference between “continuous” and “processed” imzML is how the data are stored, rather than what processing has been applied to the data. “Continuous” imzML stores spectra densely, with each spectrum sharing the same m/z-values. “Processed” imzML stores spectra sparsely, and each spectrum can have its own distinct m/z-values.
Use writeMSIData() to write datasets to imzML and Analyze formats.
Internally, writeMSIData() will call either writeImzML() or writeAnalyze() depending on the value of outformat. The default is outformat="imzML".
path <- tempfile()
writeMSIData(tiny, file=path, outformat="imzML")Two files are produced with extensions “.imzML” and “.ibd”. The former contains an XML description of the dataset, and the latter contains the actual intensity data.
## [1] "file2d4262e9d3efd.ibd"   "file2d4262e9d3efd.imzML"Because tiny is a MSContinuousImagingExperiment, it is written as “continuous” imzML.
mzml <- readLines(paste0(path, ".imzML"))
grep("continuous", mzml, value=TRUE)## [1] "\t\t\t<cvParam cvRef=\"IMS\" accession=\"IMS:1000030\" name=\"continuous\" value=\"\" />"We can also write “processed” imzML if we export a MSProcessedImagingExperiment file.
path2 <- tempfile()
writeMSIData(tiny2, file=path2, outformat="imzML")mzml2 <- readLines(paste0(path2, ".imzML"))
grep("processed", mzml2, value=TRUE)## [1] "\t\t\t<cvParam cvRef=\"IMS\" accession=\"IMS:1000031\" name=\"processed\" value=\"\" />"If an experiment contains multiple runs, then each run will be written to a separate imzML file.
set.seed(2020)
tiny3 <- simulateImage(preset=1, from=500, to=600, dim=c(3,3), nruns=2)
runNames(tiny3)## [1] "run0" "run1"path3 <- tempfile()
writeMSIData(tiny3, file=path3, outformat="imzML")## [1] "file2d42622517f0bb-run0.ibd"   "file2d42622517f0bb-run0.imzML"
## [3] "file2d42622517f0bb-run1.ibd"   "file2d42622517f0bb-run1.imzML"Use readMSIData() to read datasets in imzML or Analyze formats.
Internally, readMSIData() will guess the format based on the file extension (which must be included) and call either readImzML() or readAnalyze().
path_in <- paste0(path, ".imzML")
tiny_in <- readMSIData(path_in, attach.only=TRUE)
tiny_in## An object of class 'MSContinuousImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(2): 3DPositionX 3DPositionY
##     metadata(8): spectrum representation ibd binary type ... files name
##     run(1): file2d4262e9d3efd
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSEThe attach.only argument is used to specify that the intensity data should not be loaded into memory, but instead attached as a file-based matrix using the matter package.
Starting in Cardinal 2, the default is attach.only=TRUE. This is more memory-efficient, but some methods may be more time-consuming due to the file I/O.
imageData(tiny_in)## MSContinuousImagingSpectraList of length 1 
##  names(1):  intensity
##  class(1): matter_mat
##    dim(1):  <456 x 9>
##    mem(1):    6.22 KBspectra(tiny_in)## <456 row x 9 col> matter_mat :: out-of-memory double matrix
##            [,1]       [,2]       [,3]       [,4]       [,5]       [,6] ...
## [1,] 0.00000000 0.04087833 0.00000000 0.00000000 0.13038468 0.03781159 ...
## [2,] 0.00000000 0.16240965 0.00000000 0.16388728 0.00000000 0.00000000 ...
## [3,] 0.00000000 0.00000000 0.03591078 0.00000000 0.00000000 0.09754688 ...
## [4,] 0.18744168 0.04057066 0.08853593 0.00000000 0.01664007 0.03274257 ...
## [5,] 0.00000000 0.19307238 0.00000000 0.00000000 0.00000000 0.00000000 ...
## [6,] 0.30000976 0.00000000 0.08789141 0.01251065 0.07945876 0.08794949 ...
## ...         ...        ...        ...        ...        ...        ... ...
## (6.22 KB real | 16.42 KB virtual)An out-of-memory matter matrix can be subsetted like an ordinary R matrix. The data values are only read from file and loaded into memory when they are requested (via subsetting).
spectra(tiny_in)[1:5, 1:5]##           [,1]       [,2]       [,3]      [,4]       [,5]
## [1,] 0.0000000 0.04087833 0.00000000 0.0000000 0.13038468
## [2,] 0.0000000 0.16240965 0.00000000 0.1638873 0.00000000
## [3,] 0.0000000 0.00000000 0.03591078 0.0000000 0.00000000
## [4,] 0.1874417 0.04057066 0.08853593 0.0000000 0.01664007
## [5,] 0.0000000 0.19307238 0.00000000 0.0000000 0.00000000If loading the data fully into memory is desired, either set attach.only=FALSE when reading the data, or use as.matrix() on the intensity matrix.
spectra(tiny_in) <- as.matrix(spectra(tiny_in))
imageData(tiny_in)## MSContinuousImagingSpectraList of length 1 
##  names(1): intensity
##  class(1):    matrix
##    dim(1): <456 x 9>
##    mem(1):  33.05 KBUsing collect() on the MSImagingExperiment will also load all data into memory.
tiny_in <- collect(tiny_in)For “processed” imzML files, the spectra must be binned to common m/z-values. By default, readImzML() will detect the mass range from the data. This requires reading a large proportion of data from the file, even if attach.only=TRUE.
path2_in <- paste0(path2, ".imzML")
tiny2_in <- readMSIData(path2_in)
tiny2_in## An object of class 'MSProcessedImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(2): 3DPositionX 3DPositionY
##     metadata(8): spectrum representation ibd binary type ... files name
##     run(1): file2d426261a41c7a
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSEIf known, it can be much more efficient to specify mass.range when importing “processed” imzML data. This can also be used to pre-filter the data to a smaller mass range.
The size of the m/z bins can be changed with the resolution argument.
tiny2_in <- readMSIData(path2_in, mass.range=c(510,590),
                        resolution=100, units="ppm")
tiny2_in## An object of class 'MSProcessedImagingExperiment'
##   <1458 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(2): 3DPositionX 3DPositionY
##     metadata(8): spectrum representation ibd binary type ... files name
##     run(1): file2d426261a41c7a
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 510.000 to 589.993 
##     centroided: FALSEThe resolution for the m/z bins can be changed after importing the data by setting the resolution() of the dataset.
resolution(tiny2_in) <- c(ppm=400)While importing formats besides imzML and Analyze are not explicitly supported by Cardinal, if the data can be read into R, it is easy to construct a MSImagingExperiment object from its components.
set.seed(2020)
s <- simulateSpectrum(n=9, peaks=10, from=500, to=600)
coord <- expand.grid(x=1:3, y=1:3)
run <- factor(rep("run0", nrow(coord)))
fdata <- MassDataFrame(mz=s$mz)
pdata <- PositionDataFrame(run=run, coord=coord)
out <- MSImagingExperiment(imageData=s$intensity,
                            featureData=fdata,
                            pixelData=pdata)
out## An object of class 'MSContinuousImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(0):
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSEFor loading data into R, read.csv() and read.table() can be used to read CSV and tab-delimited text files, respectively.
Likewise, write.csv() and write.table() can be used to write pixel metadata and feature metadata after coercing them to an ordinary R data.frame with as.data.frame().
Use saveRDS() and readRDS() to save and read and entire R object such as a MSImagingExperiment. Note that if intensity data is to be saved as well, it should be pulled into memory and coerced to an R matrix with as.matrix() first.
Visualization of mass spectra and molecular ion images is vital for exploratory analysis of MS imaging experiments. Cardinal provides a plot() method for plotting mass spectra and an image() method for plotting ion images.
Cardinal 2 provides some useful visualization updates compared to previous versions:
A new default color scale (viridis) for images that doesn’t use the flawed rainbow color scheme
Non-gridded pixel coordinates are allowed, and plotting of non-rastered image data is better supported
The new plot() and image() methods return values that can be assigned to a variable for later re-plotting
plot()Use plot() to plot mass spectra. Either pixel or coord can be specified to determine which mass spectra should be plotted.
plot(mse, pixel=c(211, 611))The plusminus argument can be used to specify that multiple spectra should be averaged and plotted together.
plot(mse, coord=list(x=10, y=10), plusminus=1, fun=mean)A formula can be specified to further customize mass spectra plotting. The LHS of the formula should specify one or more elements of imageData() and the RHS of the formula should be a variable in featureData().
plot(mse, intensity + I(-log1p(intensity)) ~ mz, pixel=211, superpose=TRUE)image()Use image() to plot ion images. Either feature or mz can be specified to determine which mass features should be plotted.
image(mse, mz=1200)The plusminus argument can be used to specify that multiple mass features should be averaged and plotted together.
image(mse, mz=1227, plusminus=0.5, fun=mean)By default, images from all experimental runs are plotted. If this is not desired, a subset can be specified instead.
image(mse, mz=1227, subset=run(mse) == "run0")image(mse, mz=c(1200, 1227), subset=circle)Multiplicative variance in spectral intensities can cause images to be noisy and dark due to hot spots.
Often, images may require some type of processing and enhancement to improve interpretation.
image(mse, mz=1200, smooth.image="gaussian")image(mse, mz=1200, contrast.enhance="histogram")The default viridis colorscale is chosen to enable ease of interpretation. However, other colorscales can be chosen. All colorscales from the viridisLite package are available in Cardinal, including cividis, magma, inferno, and plasma.
The magma colorscale is used below with a different dataset.
image(mse2, mz=1136, colorscale=magma)
Cardinal will try to choose an appropriate layout for the images. However, a user-defined one can be specified by 
layout. Use layout=NULL to use the current plotting device as-is.
image(mse2, mz=c(781, 1361), layout=c(1,4), colorscale=magma)
Multiple images can be superposed with 
superpose=TRUE. Use normalize.image="linear" to normalize all images to the same intensity range.
image(mse2, mz=c(781, 1361), superpose=TRUE, normalize.image="linear")Like plot(), a formula can be specified. The LHS should specify one or more elements of imageData() and the RHS should specify two to three variables from pixelData().
image(mse2, log1p(intensity) ~ x * y, mz=1136, colorscale=magma)3D imaging datasets can be plotted with image3D().
set.seed(1)
mse3d <- simulateImage(preset=9, nruns=7, dim=c(7,7), npeaks=10,
                        peakheight=c(2,4), representation="centroid")
image3D(mse3d, mz=c(1001, 1159), colorscale=plasma, cex=3, theta=30, phi=30)Any dataset with 3 or more spatial coordinates (columns in coord()), can be plotted in 3D.
Use selectROI() to select regions-of-interest on an ion image. It is important to specify a subset so that selection is only made on a single experimental run, otherwise results may be unexpected. The form of selectROI() is the same as image().
sampleA <- selectROI(mse, mz=1200, subset=run(mse) == "run0")
sampleB <- selectROI(mse, mz=1200, subset=run(mse) == "run1")selectROI() returns a logical vector specifying which pixels from the imaging experiment are contained in the selected region.
makeFactor() can then be used to combine multiple logical vectors (e.g., from selectROI()) into a single factor.
regions <- makeFactor(A=sampleA, B=sampleB)Plots and images can be saved to a file by using R’s built-in graphics devices.
Use pdf() to initialize a PDF graphics device, create the plot, and then use dev.off() to turn off the graphics device.
Any plots printed while the graphics device is active will be saved to the specified file(s).
pdf_file <- paste0(tempfile(), ".pdf")
pdf(file=pdf_file, width=9, height=4)
image(mse, mz=1200)
dev.off()Graphics devices for png(), jpeg(), bmp(), and tiff() are also available. See their documentation for usage.
While many software for MS imaging data use a light-on-dark theme, Cardinal uses a transparent background by default. However, a dark theme is also provided through darkmode().
darkmode()
image(mse, mz=1200)darkmode()
image(mse2, mz=1135, colorscale=magma)
Any future plots will use the new mode. The default mode can be restored with 
lightmode().
lightmode()While plotting spectra should typically be fast for most datasets regardless of data format, plotting images will typically be slower for out-of-memory (file-based) datasets and for MSProcessedImagingExperiment objects in general.
This is due to the way the spectra are stored in imzML and Analyze files, and the way sparse spectra are stored (regardless of location). Calculating and decompressing the images simply takes longer than reading the spectra.
For the fastest visualization of images, experiments should be coerced to an in-memory MSContinuousImagingExperiment.
Also note that all Cardinal 2 visualizations produce a print()-able object that can be assigned to a variable and print()-ed later without the need to read the data again.
g <- image(mse, mz=1200)
print(g)This is useful for re-creating or updating plots without accessing the data again.
MSImagingExperimentMSImagingExperiment are matrix-like objects that can be subsetted using the [ operator.
When subsetting, the “rows” are the mass features, and the “columns” are the pixels.
# subset first 10 mass features and first 5 pixels
mse[1:10, 1:5]## An object of class 'MSContinuousImagingExperiment'
##   <10 feature, 5 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 5 x 1
##     coord(2): x = 1..5, y = 1..1
##     mass range: 426.5285 to 428.0668 
##     centroided: FALSESubsetting the dataset this way requires knowing the desired row and column indices.
features() returns row indices based on specified feature metadata.
# get row index corresponding to m/z 1200
features(mse, mz=1200)## [1] 2587# get row indices corresponding to m/z 1199 - 1201
features(mse, 1199 < mz & mz < 1201)## [1] 2585 2586 2587 2588 2589pixels() returns column indices based on specified pixel metadata.
# get column indices corresponding to x = 10, y = 10 in all runs
pixels(mse, coord=list(x=10, y=10))## [1] 190 590# get column indices corresponding to x <= 3, y <= 3 in "run0"
pixels(mse, x <= 3, y <= 3, run == "run0")## [1]  1  2  3 21 22 23 41 42 43These methods can be used to determine row/column indices of particular m/z-values or pixel coordinates to use for subsetting.
fid <- features(mse, 1199 < mz & mz < 1201)
pid <- pixels(mse, x <= 3, y <= 3, run == "run0")
mse[fid, pid]## An object of class 'MSContinuousImagingExperiment'
##   <5 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 1199.043 to 1200.963 
##     centroided: FALSEMSImagingExperiment represents the data as a matrix, where each column is a mass spectrum, rather than as a true “data cube”. This is typically simpler when operating on the mass spectra, and more space efficient when the data is pixel-sparse (i.e., non-rectangular).
Sometimes, however, it is useful to index into the data as an actual “data cube”, with explicit array dimensions for each spatial dimension.
Use slice() to slice an MSImagingExperiment as a data cube and extract images.
# slice image for first mass feature
a <- slice(mse, 1)
dim(a)##   x   y run 
##  20  20   2Any arguments to slice() are passed to features(), making it easy to select the desired image slices.
By default, array dimensions with only one level are dropped; use .preserve=TRUE to keep all dimensions.
# slice image for m/z 1200
a2 <- slice(mse, mz=1200, drop=FALSE)
dim(a2)##       x       y     run feature 
##      20      20       2       1Note that when plotting images from raw arrays, the images are upside-down due to differing coordinate conventions used by graphics::image().
image(a2[,,1,1], col=bw.colors(100))Because MSImagingExperiment is matrix-like, rbind() and cbind() can be used to combine multiple MSImagingExperiment objects by “row” or “column”, assumping certain conditions are met.
Use cbind() to combine datasets from different experimental runs. The m/z-values must match between all datasets to succesfully combine them.
# divide dataset into separate objects for each run
mse_run0 <- mse[,run(mse) == "run0"]
mse_run1 <- mse[,run(mse) == "run1"]
mse_run0## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 400 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEmse_run1## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 400 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE# recombine the separate datasets back together
mse_cbind <- cbind(mse_run0, mse_run1)
mse_cbind## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSESome processing may be necessary to ensure datasets are compatible before combining them.
Most components of an MSImagingExperiment that can be accessed through getter functions like pixelData(), featureData(), and imageData(), can also be re-assigned with analogous setter functions. These can likewise be used to get and set columns of the pixel and feature metadata.
pData() and fData() are aliases for pixelData() and featureData(), respectively.
The $ operator will access the corresponding columns of pixelData().
mse$region <- makeFactor(circle=mse$circle,
                            bg=!mse$circle)
pData(mse)## PositionDataFrame with 800 rows and 2 columns
##        :run:   coord:x   coord:y    circle   region
##     <factor> <integer> <integer> <logical> <factor>
## 1       run0         1         1     FALSE       bg
## 2       run0         2         1     FALSE       bg
## 3       run0         3         1     FALSE       bg
## 4       run0         4         1     FALSE       bg
## 5       run0         5         1     FALSE       bg
## ...      ...       ...       ...       ...      ...
## 796     run1        16        20     FALSE       bg
## 797     run1        17        20     FALSE       bg
## 798     run1        18        20     FALSE       bg
## 799     run1        19        20     FALSE       bg
## 800     run1        20        20     FALSE       bgiData() can be used to access elements of the imageData() list by name or index.
Using iData() with no arguments besides the dataset will get or set the first element of imageData(). Providing a name or index will get or set that element.
iData(mse, "log2intensity") <- log2(iData(mse) + 1)
imageData(mse)## MSContinuousImagingSpectraList of length 2 
##  names(2):    intensity log2intensity
##  class(2):       matrix        matrix
##    dim(2): <3919 x 800>  <3919 x 800>
##    mem(2):     25.08 MB      25.08 MBFor MSImagingExperiment, spectra() is an alias for iData().
spectra(mse, "log2intensity")[1:5, 1:5]##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.9482973 0.9840368 0.9571834 0.9347079 0.9230042
## [2,] 1.0062627 1.2084339 1.0655022 1.0174904 1.0500678
## [3,] 1.0411028 1.0444525 1.1639734 0.9500770 0.9124515
## [4,] 0.9221343 1.1089029 0.9449329 0.9634461 1.0091524
## [5,] 1.0468679 1.0088489 1.0208805 0.9259353 1.0790753Whether or not the spectra have been centroided or not can be accessed using centroided()
centroided(mse)## [1] FALSEThis can also be used to set whether the spectra should be treated as centroided or not.
centroided(mse) <- FALSECardinal 2 implements several convenient data manipulation verbs for subsetting and summarizing MSImagingExperiment objects.
subset() subsets an MSImagingExperiment
subsetFeatures() subsets an MSImagingExperiment by row, i.e., mass features
subsetPixels() subsets an MSImagingExperiment by column, i.e., pixels
aggregate() summarizes an MSImagingExperiment by either feature or pixel
summarizeFeatures() summarizes an MSImagingExperiment by feature (e.g., mean spectrum)
summarizePixels() summarizes an MSImagingExperiment by pixel (e.g., TIC)
The %>% operator can be used to chain these operations together. For file-based data, the subsetting should be quick, as only metadata is modified.
# subset by mass range
subsetFeatures(mse, mz > 700, mz < 900)## An object of class 'MSContinuousImagingExperiment'
##   <628 feature, 800 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range: 700.1392 to 899.7160 
##     centroided: FALSE# subset by pixel coordinates
subsetPixels(mse, x <= 3, y <= 3, run == "run0")## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 9 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE# subset by mass range + pixel coordinates
subset(mse, mz > 700 & mz < 900, x <=3 & y <= 3 & run == "run0")## An object of class 'MSContinuousImagingExperiment'
##   <628 feature, 9 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 700.1392 to 899.7160 
##     centroided: FALSE# chain verbs together
mse %>%
    subsetFeatures(mz > 700, mz < 900) %>%
    subsetPixels(x <= 3, y <= 3, run == "run0")## An object of class 'MSContinuousImagingExperiment'
##   <628 feature, 9 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 700.1392 to 899.7160 
##     centroided: FALSE# calculate mean spectrum
summarizeFeatures(mse, "mean", as="DataFrame")## MassDataFrame with 3919 rows and 1 column
##           :mz:      mean
##      <numeric> <numeric>
## 1      426.529  0.997692
## 2      426.699  0.999218
## 3      426.870  0.996708
## 4      427.041  0.994184
## 5      427.211  0.994789
## ...        ...       ...
## 3915   2041.17 0.0362793
## 3916   2041.99 0.0418589
## 3917   2042.81 0.0406153
## 3918   2043.62 0.0362828
## 3919   2044.44 0.0410249# calculate tic
summarizePixels(mse, c(tic="sum"), as="DataFrame")## PositionDataFrame with 800 rows and 1 column
##        :run:   coord:x   coord:y       tic
##     <factor> <integer> <integer> <numeric>
## 1       run0         1         1   932.111
## 2       run0         2         1   939.237
## 3       run0         3         1   937.598
## 4       run0         4         1   933.216
## 5       run0         5         1   937.512
## ...      ...       ...       ...       ...
## 796     run1        16        20   934.901
## 797     run1        17        20   934.547
## 798     run1        18        20   938.088
## 799     run1        19        20   947.894
## 800     run1        20        20   931.592# calculate mean spectrum of circle region
mse %>%
    subsetPixels(region == "circle") %>%
    summarizeFeatures("mean", as="DataFrame")## MassDataFrame with 3919 rows and 1 column
##           :mz:      mean
##      <numeric> <numeric>
## 1      426.529  1.002838
## 2      426.699  1.003761
## 3      426.870  0.995426
## 4      427.041  0.990888
## 5      427.211  0.999006
## ...        ...       ...
## 3915   2041.17 0.0351678
## 3916   2041.99 0.0430222
## 3917   2042.81 0.0389528
## 3918   2043.62 0.0376079
## 3919   2044.44 0.0414156By default, Cardinal 2 does not load the spectra from imzML and Analyze files into memory, but retrieves them from files when necessary.
For very large datasets, this is necessary and memory-efficient.
However, for datasets that are known to fit in computer memory, this may be unnecessarily slow, especially when plotting images (which are perpendicular to how data are stored in the files).
# coerce spectra to file-based matter matrix
spectra(mse) <- matter::as.matter(spectra(mse))
spectra(mse)## <3919 row x 800 col> matter_mat :: out-of-memory double matrix
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6] ...
## [1,] 0.9295940 0.9779923 0.9415157 0.9115036 0.8960595 0.9951040 ...
## [2,] 1.0087009 1.3108664 1.0928983 1.0243944 1.0706272 1.0655016 ...
## [3,] 1.0578001 1.0625834 1.2407371 0.9319758 0.8822412 0.9776461 ...
## [4,] 0.8949165 1.1568158 0.9250994 0.9499621 1.0127282 1.1561683 ...
## [5,] 1.0660395 1.0123048 1.0291570 0.8999156 1.1126816 0.9558122 ...
## [6,] 0.9217598 1.0664798 0.9839717 0.8448350 1.1348211 1.0852472 ...
## ...        ...       ...       ...       ...       ...       ... ...
## (5.82 KB real | 25.08 MB virtual)imageData(mse)## MSContinuousImagingSpectraList of length 2 
##  names(2):    intensity log2intensity
##  class(2):   matter_mat        matrix
##    dim(2): <3919 x 800>  <3919 x 800>
##    mem(2):      5.82 KB      25.08 MBUse as.matrix() on the spectra() to pull the spectra into memory as a dense matrix.
spectra(mse) <- as.matrix(spectra(mse))
imageData(mse)## MSContinuousImagingSpectraList of length 2 
##  names(2):    intensity log2intensity
##  class(2):       matrix        matrix
##    dim(2): <3919 x 800>  <3919 x 800>
##    mem(2):     25.08 MB      25.08 MBUse as() to coerce between different MSImagingExperiment sub-classes.
mse3## An object of class 'MSProcessedImagingExperiment'
##   <7837 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(3): square1 square2 circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEas(mse3, "MSContinuousImagingExperiment")## An object of class 'MSContinuousImagingExperiment'
##   <7837 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(3): square1 square2 circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEThis will often change the underlying data representation, so some information may be lost depending on the coercion.
Objects of the older MSImageSet class can also be coerced to MSImagingExperiment this way.
# requires CardinalWorkflows installed
data(cardinal, package="CardinalWorkflows")
cardinal2 <- as(cardinal, "MSImagingExperiment")A major change in Cardinal 2 from earlier versions is how pre-processing is handled.
Instead of applying pre-processing immediately, each pre-processing step is queued to the dataset, and only applied once process() is called.
This approach is more computationally and memory efficient in most cases, as ideally each spectrum is only processed once, and no extraneous copies of the data are made.
process()On its own, the process() method queues a new pre-processing function, and then applies all currently queued processing functions.
For example, the following code applies very basic total-ion-current (TIC) normalization to all spectra.
mse_tic <- process(mse, function(x) length(x) * x / sum(x), label="norm")
mse_tic## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     processing complete(1): norm
##     processing pending(0):
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSEBy default, this is applied immediately. However, delay=TRUE delays this, and allows us to queue multiple pre-processing functions at once.
mse_pre <- mse %>%
    process(function(x) length(x) * x / sum(x), label="norm", delay=TRUE) %>%
    process(function(x) signal::sgolayfilt(x), label="smooth", delay=TRUE)
processingData(mse_pre)## List of length 2
## names(2): norm smoothWe can view all pending and completed pre-processing steps in more detail.
mcols(processingData(mse_pre))[,-1]## DataFrame with 2 rows and 6 columns
##               kind   pending  complete   has_pre  has_post  has_plot
##        <character> <logical> <logical> <logical> <logical> <logical>
## norm         pixel      TRUE     FALSE     FALSE     FALSE     FALSE
## smooth       pixel      TRUE     FALSE     FALSE     FALSE     FALSECalling process() on the dataset again without any other arguments will apply all queued pre-processing steps.
mse_proc <- process(mse_pre)
mse_proc## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     processing complete(2): norm smooth
##     processing pending(0):
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSENote that subsetting retains any queued pre-processing steps.
mse_pre %>%
    subsetPixels(x <= 3, y <= 3) %>%
    subsetFeatures(mz <= 1000) %>%
    process()## An object of class 'MSContinuousImagingExperiment'
##   <2131 feature, 18 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     processing complete(2): norm smooth
##     processing pending(0):
##     run(2): run0 run1
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 426.5285 to 999.9239 
##     centroided: FALSEAll pre-processing methods for MSImagingExperiment queue delayed processing by default.
If this is not desired, you can set options(Cardinal.delay=FALSE) to apply all pre-processing steps immediately.
Use normalize() to queue normalization to an MSImagingExperiment.
mse_pre <- normalize(mse, method="tic")method="tic" performs total-ion-current (TIC) normalization
method="rms" performs root-mean-square (RMS) normalization
method="reference" normalizes all spectra to a reference feature
TIC normalization is one of the most common normalization methods for mass spectrometry imaging. For comparison between datasets, TIC normalization requires that all spectra are the same length. RMS normalization is more appropriate when spectra are of different lengths.
Normalization to a reference is the most reliable form of normalization, but is only possible when the experiment contains a known reference peak with a constant abundance throughout the dataset. This is often not possible in unsupervised and exploratory experiments.
Use smoothSignal() to queue spectral smoothing to an MSImagingExperiment.
mse %>% smoothSignal(method="gaussian") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE,
            par=list(main="Gaussian smoothing", layout=c(3,1)))
mse %>% smoothSignal(method="ma") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Moving average smoothing"))
mse %>% smoothSignal(method="sgolay") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Savitzky-Golay smoothing"))mse_pre <- smoothSignal(mse_pre, method="gaussian")method="gaussian" performs smoothing with a Gaussian kernel
method="ma" performs moving average smoothing
method="sgolay" applies a Savitzky-Golay smoothing filter
Use reduceBaseline() to queue baseline correction to an MSImagingExperiment.
mse %>% reduceBaseline(method="locmin") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Local minima", layout=c(2,1)))
mse %>% reduceBaseline(method="median") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Median interpolation"))mse_pre <- reduceBaseline(mse_pre, method="locmin")method="locmin" interpolates a baseline from local minima
method="median" splits a spectrum into blocks and interpolates from binned medians
Peak processing encompasses multiple steps, including (1) picking peaks, (2) aligning peaks, (3) filtering peaks, and (4) binning profile spectra to the detected peaks.
Prior to peak detection is a good time to apply the previous processing steps.
mse_pre <- process(mse_pre)(This is optional, and not necessary if only the peaks are desired, and if it is acceptable to have peaks with intensities of zero in pixels where that peak was not detected.)
Use peakPick() to queue peak picking to an MSImagingExperiment.
mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="mad") %>%
    process(plot=TRUE, par=list(main="MAD noise", layout=c(3,1)))
mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="simple") %>%
    process(plot=TRUE,
            par=list(main="Simple SD noise"))
mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="adaptive") %>%
    process(plot=TRUE, par=list(main="Adaptive SD noise"))mse_peaks <- peakPick(mse_pre, method="mad")method="mad" calculates adaptive noise from interpolating local mean absolute deviations
method="simple" calculates a constant noise from standard deviations of low-kurtosis bins
method="adaptive" calculates adaptive noise from standard deviations of low-kurtosis bins
Use peakAlign() to queue peak alignment to an MSImagingExperiment.
mse_peaks <- peakAlign(mse_pre, tolerance=200, units="ppm")Peaks are matched based on proximity of their m/z-values, according to tolerance, in either parts-per-millin (“ppm”) or absolute (“mz”) units.
The m/z-values of known reference peaks can be provided. If no reference is provided, the mean spectrum is calculated, and the local maxima of the mean spectrum are used as the reference.
Use peakFilter() to queue peak filtering to an MSImagingExperiment.
mse_peaks <- peakFilter(mse_pre, freq.min=0.05)The proportions of pixels where a peak was detected at each m/z-value are calculated. Only peaks with frequencies greater than freq.min are retained.
Use peakBin() to queue binning of spectra to reference peaks.
Typically, this is applied to processed profile spectra after peak detection, to extract a more accurate representation of the peak intensities.
mse_peaks <- process(mse_peaks)
mse_peaks <- peakBin(mse_pre, ref=mz(mse_peaks), type="height")
mse_peaks <- mse_peaks %>% process()A tolerance in either parts-per-million (“ppm”) or absolute (“mz”) units is used to match the reference peaks to local maxima in each spectrum.
The peak is then expanded to the nearest local minima in both directions. The intensity of the peak is then summarized either by the maximum intensity (type="height") or sum of intensities (type="area").
Rather than use the m/z-values of the detected peaks, we can also use known reference peaks (in this case, from the design of the simulated data).
mzref <- mz(metadata(mse)$design$featureData)
mse_peaks <- peakBin(mse_pre, ref=mzref, type="height")
mse_peaks <- mse_peaks %>% subsetPixels(x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_peaks, coord=list(x=10, y=10))We typically recommend binning the processed profile spectra to detected or reference peaks to produce centroided spectra suitable for downstream analysis.
It is possible to use the detected and aligned peaks directly (after applying peakAlign() and peakFilter()), but pixels where a peak was not detected will have zero intensities for those peaks. Moreover, all peak intensities will be the peak heights, even when peak areas may be desired instead. However, using the detected peaks directly does mean that there is no need to calculate and store the processed profile spectra, so this saves on storage space.
Generally, it is preferable and more accurate to bin the processed profile spectra to the detected peaks whenever possible and reasonable.
Although peak alignment and peak binning will generally account for small differences in m/z between spectra, alignment of the profile spectra is sometimes desireable as well.
Use mzAlign() to queue alignment of spectra so that peaks will have a consistent m/z-value.
First, we need to simulate spectra that are in need of calibration.
set.seed(2020)
mse4 <- simulateImage(preset=1, npeaks=10, from=500, to=600,
                            sdmz=750, units="ppm")
plot(mse4, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)To align the spectra, we need to provide a vector of reference m/z values of expected peaks. Here, we will simply use the peaks of the mean spectrum.
mse4_mean <- summarizeFeatures(mse4)
mse4_peaks <- peakPick(mse4_mean, SNR=2)
mse4_peaks <- peakAlign(mse4_peaks, tolerance=1000, units="ppm")
mse4_peaks <- process(mse4_peaks)
fData(mse4_peaks)## MassDataFrame with 7 rows and 0 columns
##        :mz:
##   <numeric>
## 1   509.897
## 2   527.953
## 3   541.644
## 4   548.182
## 5   553.470
## 6   561.049
## 7   590.050mse4_align1 <- mzAlign(mse4, ref=mz(mse4_peaks), tolerance=2000, units="ppm")
mse4_align1 <- process(mse4_align1)
plot(mse4_align1, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)If no reference spectrum is provided, the mean spectrum is calculated automatically and used as the reference.
mse4_align2 <- mzAlign(mse4, tolerance=2000, units="ppm")
mse4_align2 <- process(mse4_align2)
plot(mse4_align2, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)The algorithm will try to match the most intense peaks in the reference to local maxima in each spectrum, within tolerance. If tolerance is too small, the matching local maxima may not be found. If tolerance is too large, then peaks may be matched to the wrong local maxima.
While the m/z binning scheme of MSProcessedImagingExperiment objects can be adjusted on-the-fly, this does not apply to other types of MSImagingExperiment.
Use mzBin() to queue binning of spectra to new m/z-values.
mse_bin <- mzBin(mse_pre, from=1000, to=1600, resolution=1000, units="ppm")
mse_bin <- subsetPixels(mse_bin, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_bin, coord=list(x=10, y=10))This is useful if you need to combine datasets with different m/z-values.
Sometimes it is necessary to filter the mass features of a dataset that has not been peak picked and aligned. For example, to remove noisy or low-intensity m/z-values.
Use mzFilter() to queue filtering of mass features.
mse_filt <- mzFilter(mse_pre, freq.min=0.05)
mse_filt <- subsetPixels(mse_filt, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_filt, coord=list(x=10, y=10), type='h')mzFilter() and peakFilter() are actually the same function internally, but with different defaults for profile and centroided spectra, respectively.
Note that mzFilter() this does not set centroided() to TRUE; it is up to the user to decide whether the result represent centroided spectra or not, and set centroided() appropriately.
Delayed pre-processing makes it easy to chain together multiple pre-processing steps with the %>% operator, and then apply them all at once.
mse_proc <- mse %>%
    normalize() %>%
    smoothSignal() %>%
    reduceBaseline() %>%
    peakPick()
# preview processing
mse_proc %>%
    subsetPixels(x == 10, y == 10, run == "run0") %>%
    process(plot=TRUE, par=list(layout=c(2,2)))# process detected peaks
mse_peakref <- mse_proc %>%
    peakAlign() %>%
    peakFilter() %>%
    process()
# bin profile spectra to peaks
mse_peaks <- mse %>%
    normalize() %>%
    smoothSignal() %>%
    reduceBaseline() %>%
    peakBin(mz(mse_peakref))MSImagingExperimentInternally, most methods that iterate over pixels or mass features use some combination of pixelApply(), featureApply(), and spatialApply().
While summarize() is useful for summarizing a MSImagingExperiment, it is limited in that it can only apply summary functions that return a single value, which can be simplified into the columns of a MassDataFrame or PositionDataFrame.
By contrast, these functions (like apply() and lapply()) can return any value.
pixelApply() and featureApply()pixelApply() is used to apply functions over pixels.
# calculate the TIC for every pixel
tic <- pixelApply(mse, sum)
image(mse, tic ~ x * y)featureApply() is used to apply functions over features.
# calculate the mean spectrum
smean <- featureApply(mse, mean)
plot(mse, smean ~ mz)Note that featureApply() will typically be slower than pixelApply() for out-of-memory data for MSProcessedImagingExperiment objects, due to the way the data is stored.
All pre-processing methods and most statistical analysis methods in Cardinal 2 can be executed in parallel using BiocParallel.
By default, a serial backend is used (no parallelization). This is for maximum stability and compatibility.
BPPARAMAny method that supports parallelization includes BPPARAM as an argument (see method documentation). The BPPARAM argument can be used to specify a parallel backend for the operation, such as SerialParam(), MulticoreParam(), SnowParam(), etc.
# run in parallel, rather than serially
tic <- pixelApply(mse, sum, BPPARAM=MulticoreParam())Several parallelization backends are available, depending on OS:
SerialParam() creates a serial (non-parallel) backend. Use this to avoid potential issues caused by parallelization.
MulticoreParam() creates a multicore backend by forking the current R session. This is typically the fastest parallelization option, but is only available on macOS and Linux.
SnowParam() creates a SNOW backend by creating new R sessions via socket connections. This is typically slower than multicore, but is available on all platforms including Windows.
Use of MulticoreParam() will frequently improve speed on macOS and Linux dramatically. However, due to the extra overhead of SnowParam(), Windows users may prefer the default SerialParam(), depending on the size of the dataset.
Available backends can be viewed with BiocParallel::registered().
BiocParallel::registered()## $MulticoreParam
## class: MulticoreParam
##   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
##   bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: FALSE
##   bpfallback: TRUE
##   bplogdir: NA
##   bpresultdir: NA
##   cluster type: FORK
## 
## $SnowParam
## class: SnowParam
##   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
##   bpexportglobals: TRUE; bpexportvariables: TRUE; bpforceGC: FALSE
##   bpfallback: TRUE
##   bplogdir: NA
##   bpresultdir: NA
##   cluster type: SOCK
## 
## $SerialParam
## class: SerialParam
##   bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
##   bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
##   bpfallback: FALSE
##   bplogdir: NA
##   bpresultdir: NAThe current backend used by Cardinal can be viewed with getCardinalBPPARAM():
getCardinalBPPARAM()## class: SerialParam
##   bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
##   bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
##   bpfallback: FALSE
##   bplogdir: NA
##   bpresultdir: NAA new default backend can be set for use with Cardinal by calling setCardinalBPPARAM().
# register a SNOW backend
setCardinalBPPARAM(SnowParam())See the BiocParallel package documentation for more details on available parallel backends.
For methods that rely on random number generation to be reproducible when run in parallel, the RNG must be set to “L’Ecuyer-CMRG” before setting a seed.
RNGkind("L'Ecuyer-CMRG")
set.seed(1)sessionInfo()## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] Cardinal_3.2.1      S4Vectors_0.38.1    EBImage_4.42.0     
## [4] BiocParallel_1.34.0 BiocGenerics_0.46.0 ProtGenerics_1.32.0
## [7] BiocStyle_2.28.0   
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.5          tiff_0.1-11         bitops_1.0-7       
##  [4] jpeg_0.1-10         lattice_0.21-8      digest_0.6.31      
##  [7] magrittr_2.0.3      evaluate_0.20       grid_4.3.0         
## [10] bookdown_0.33       fftwtools_0.9-11    fastmap_1.1.1      
## [13] jsonlite_1.8.4      Matrix_1.5-4        matter_2.2.0       
## [16] DBI_1.1.3           mclust_6.0.0        biglm_0.9-2.1      
## [19] BiocManager_1.30.20 viridisLite_0.4.2   codetools_0.2-19   
## [22] jquerylib_0.1.4     abind_1.4-5         cli_3.6.1          
## [25] rlang_1.1.1         Biobase_2.60.0      cachem_1.0.8       
## [28] yaml_2.3.7          tools_4.3.0         parallel_4.3.0     
## [31] locfit_1.5-9.7      R6_2.5.1            png_0.1-8          
## [34] magick_2.7.4        htmlwidgets_1.6.2   MASS_7.3-59        
## [37] irlba_2.3.5.1       bslib_0.4.2         Rcpp_1.0.10        
## [40] xfun_0.39           highr_0.10          knitr_1.42         
## [43] htmltools_0.5.5     nlme_3.1-162        rmarkdown_2.21     
## [46] signal_0.7-7        compiler_4.3.0      RCurl_1.98-1.12    
## [49] sp_1.6-0