Package: MsBackendMgf
Authors: RforMassSpectrometry Package Maintainer [cre],
Laurent Gatto [aut] (https://orcid.org/0000-0002-1520-2268),
Johannes Rainer [aut] (https://orcid.org/0000-0002-6977-7147),
Sebastian Gibb [aut] (https://orcid.org/0000-0001-7406-4443),
Michael Witting [ctb] (https://orcid.org/0000-0002-1462-4426)
Last modified: 2023-10-24 14:40:11.694535
Compiled: Tue Oct 24 18:01:36 2023
The Spectra package provides a central infrastructure for the
handling of Mass Spectrometry (MS) data. The package supports
interchangeable use of different backends to import MS data from a
variety of sources (such as mzML files). The MsBackendMgf package
allows the import of MS/MS data from mgf (Mascot Generic
Format)
files. This vignette illustrates the usage of the MsBackendMgf
package.
To install this package, start R and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MsBackendMgf")This will install this package and all eventually missing dependencies.
Mgf files store one to multiple spectra, typically centroided and of MS level 2. In our short example below, we load 2 mgf files which are provided with this package. Below we first load all required packages and define the paths to the mgf files.
library(Spectra)
library(MsBackendMgf)
fls <- dir(system.file("extdata", package = "MsBackendMgf"),
           full.names = TRUE, pattern = "mgf$")
fls## [1] "/tmp/RtmpsQxjZx/Rinst2e36de5815bf5d/MsBackendMgf/extdata/spectra.mgf"             
## [2] "/tmp/RtmpsQxjZx/Rinst2e36de5815bf5d/MsBackendMgf/extdata/spectra2.mgf"            
## [3] "/tmp/RtmpsQxjZx/Rinst2e36de5815bf5d/MsBackendMgf/extdata/spectra3_empty_peaks.mgf"
## [4] "/tmp/RtmpsQxjZx/Rinst2e36de5815bf5d/MsBackendMgf/extdata/spectra4.mgf"MS data can be accessed and analyzed through Spectra objects. Below
we create a Spectra with the data from these mgf files. To this end
we provide the file names and specify to use a MsBackendMgf()
backend as source to enable data import. Note that below we also disable
parallel processing by specifically registering the serial processing as the
default. See ?bpparam for more details on parallel processing options with the
BiocParallel package.
library(BiocParallel)
register(SerialParam())
sps <- Spectra(fls, source = MsBackendMgf())## Start data import from 4 files ... doneWith that we have now full access to all imported spectra variables that we list below.
spectraVariables(sps)##  [1] "msLevel"                 "rtime"                  
##  [3] "acquisitionNum"          "scanIndex"              
##  [5] "dataStorage"             "dataOrigin"             
##  [7] "centroided"              "smoothed"               
##  [9] "polarity"                "precScanNum"            
## [11] "precursorMz"             "precursorIntensity"     
## [13] "precursorCharge"         "collisionEnergy"        
## [15] "isolationWindowLowerMz"  "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz"  "TITLE"                  
## [19] "RAWFILE"                 "CLUSTER_ID"             
## [21] "MSLEVEL"Besides default spectra variables, such as msLevel, rtime,
precursorMz, we also have additional spectra variables such as the
TITLE of each spectrum in the mgf file.
sps$rtime##  [1] 1028.000 1117.000 1127.000 2678.940 2373.511 2511.030       NA  162.070
##  [9] 1028.000 1028.000sps$TITLE##  [1] "File193 Spectrum1719 scans: 2162"                                                         
##  [2] "File193 Spectrum1944 scans: 2406"                                                         
##  [3] "File193 Spectrum1968 scans: 2432"                                                         
##  [4] "mzspec:PXD004732:01650b_BC2-TUM_first_pool_53_01_01-3xHCD-1h-R2:scan:41840:WNQLQAFWGTGK/2"
##  [5] "mzspec:PXD002084:TCGA-AA-A01D-01A-23_W_VU_20121106_A0218_5I_R_FR15:scan:5209:DLTDYLMK/2"  
##  [6] "mzspec:MSV000080679:j11962_C1orf144:scan:10671:DLTDYLMK/2"                                
##  [7] "CCMSLIB00000840351"                                                                       
##  [8] "blank_2-A,1_01_29559.812.812.1"                                                           
##  [9] "File193 Spectrum1719 scans: 2162"                                                         
## [10] "File193 Spectrum1719 scans: 2162"By default, fields in the mgf file are mapped to spectra variable names using
the mapping returned by the spectraVariableMapping function:
spectraVariableMapping(MsBackendMgf())##              rtime     acquisitionNum        precursorMz precursorIntensity 
##      "RTINSECONDS"            "SCANS"          "PEPMASS"       "PEPMASSINT" 
##    precursorCharge 
##           "CHARGE"The names of this character vector are the spectra variable names (such as
"rtime") and the field in the mgf file that contains that information are the
values (such as "RTINSECONDS"). Note that it is also possible to overwrite
this mapping (e.g. for certain mgf dialects) or to add additional
mappings. Below we add the mapping of the mgf field "TITLE" to a spectra
variable called "spectrumName".
map <- c(spectrumName = "TITLE", spectraVariableMapping(MsBackendMgf()))
map##       spectrumName              rtime     acquisitionNum        precursorMz 
##            "TITLE"      "RTINSECONDS"            "SCANS"          "PEPMASS" 
## precursorIntensity    precursorCharge 
##       "PEPMASSINT"           "CHARGE"We can then pass this mapping to the backendInitialize method, or the
Spectra constructor.
sps <- Spectra(fls, source = MsBackendMgf(), mapping = map)## Start data import from 4 files ... doneWe can now access the spectrum’s title with the newly created spectra variable
"spectrumName":
sps$spectrumName##  [1] "File193 Spectrum1719 scans: 2162"                                                         
##  [2] "File193 Spectrum1944 scans: 2406"                                                         
##  [3] "File193 Spectrum1968 scans: 2432"                                                         
##  [4] "mzspec:PXD004732:01650b_BC2-TUM_first_pool_53_01_01-3xHCD-1h-R2:scan:41840:WNQLQAFWGTGK/2"
##  [5] "mzspec:PXD002084:TCGA-AA-A01D-01A-23_W_VU_20121106_A0218_5I_R_FR15:scan:5209:DLTDYLMK/2"  
##  [6] "mzspec:MSV000080679:j11962_C1orf144:scan:10671:DLTDYLMK/2"                                
##  [7] "CCMSLIB00000840351"                                                                       
##  [8] "blank_2-A,1_01_29559.812.812.1"                                                           
##  [9] "File193 Spectrum1719 scans: 2162"                                                         
## [10] "File193 Spectrum1719 scans: 2162"In addition we can also access the m/z and intensity values of each spectrum.
mz(sps)## NumericList of length 10
## [[1]] 102.0548 103.00494 103.03531 ... 1388.58691 1405.59729 1406.57666
## [[2]] 101.07074 102.05486 103.00227 ... 1331.56726 1348.58496 1349.59241
## [[3]] 102.05556 103.00014 115.05058 ... 1333.599 1334.61304 1335.64368
## [[4]] 101.07122 109.68925 115.86999 120.0811 ... 1260.6073 1261.614 1272.6572
## [[5]] 130.164459228516 144.150299072266 ... 1019.23852539062 1020.52404785156
## [[6]] 110.070594787598 120.080627441406 ... 887.756652832031 998.447387695312
## [[7]] 51.022404 57.033543 57.060638 ... 636.130188 660.481445 753.358521
## [[8]] numeric(0)
## [[9]] 102.0548 103.00494 103.03531 ... 1388.58691 1405.59729 1406.57666
## [[10]] 102.0548 103.00494 103.03531 ... 1388.58691 1405.59729 1406.57666intensity(sps)## NumericList of length 10
## [[1]] 753.738 385.376 315.441 413.206 ... 3038.73 2016.43 1146.04 704.175
## [[2]] 1228.93 1424.66 1550.9 1455.45 ... 7380.41 4960.92 5743.83 1780.76
## [[3]] 1340.44 1714.76 1938.82 1450.36 2019 ... 5323.02 2265.43 4768.14 1532.12
## [[4]] 81011.57 4123.349 4006.9321 66933.17 ... 22042.248 18096.48 12666.438
## [[5]] 14.1766004562378 18.5806427001953 ... 22.7096385955811 14.864013671875
## [[6]] 1748.57495117188 8689.9951171875 ... 2907.08422851562 2663.30908203125
## [[7]] 65.219513 178.758606 13.01786 119.898499 ... 22.05921 30.57095 14.11111
## [[8]] numeric(0)
## [[9]] 753.738 385.376 315.441 413.206 ... 3038.73 2016.43 1146.04 704.175
## [[10]] 753.738 385.376 315.441 413.206 ... 3038.73 2016.43 1146.04 704.175The MsBackendMgf backend allows also to export data in mgf format. Below we
export the data to a temporary file. We hence call the export function on our
Spectra object specifying backend = MsBackendMgf() to use this backend for
the export of the data. Note that we use again our custom mapping of variables
such that the spectra variable "spectrumName" will be exported as the
spectrums’ title.
fl <- tempfile()
export(sps, backend = MsBackendMgf(), file = fl, mapping = map)We next read the first lines from the exported file to verify that the title was exported properly.
readLines(fl)[1:12]##  [1] "BEGIN IONS"                            
##  [2] "msLevel=2"                             
##  [3] "RTINSECONDS=1028"                      
##  [4] "SCANS=2162"                            
##  [5] "centroided=TRUE"                       
##  [6] "PEPMASS=816.33826"                     
##  [7] "CHARGE=2+"                             
##  [8] "TITLE=File193 Spectrum1719 scans: 2162"
##  [9] "102.0548 753.738"                      
## [10] "103.00494 385.376"                     
## [11] "103.03531 315.441"                     
## [12] "115.05001 413.206"Note that the MsBackendMgf exports all spectra variables as fields in the mgf
file. To illustrate this we add below a new spectra variable to the object and
export the data.
sps$new_variable <- "A"
export(sps, backend = MsBackendMgf(), file = fl)
readLines(fl)[1:12]##  [1] "BEGIN IONS"                                   
##  [2] "TITLE=msLevel 2; retentionTime ; scanNum "    
##  [3] "msLevel=2"                                    
##  [4] "RTINSECONDS=1028"                             
##  [5] "SCANS=2162"                                   
##  [6] "centroided=TRUE"                              
##  [7] "PEPMASS=816.33826"                            
##  [8] "CHARGE=2+"                                    
##  [9] "spectrumName=File193 Spectrum1719 scans: 2162"
## [10] "new_variable=A"                               
## [11] "102.0548 753.738"                             
## [12] "103.00494 385.376"We can see that also our newly defined variable was exported. Also, because we
did not provide our custom variable mapping this time, the variable
"spectrumName" was not used as the spectrum’s title.
Sometimes it might be required to not export all spectra variables since some
exported fields might not be recognized/supported by external tools. Using the
selectSpectraVariables function we can reduce our Spectra object to export
to contain only relevant spectra variables. Below we restrict the data to only
m/z, intensity, retention time, acquisition number, precursor m/z and precursor
charge and export these to an mgf file. Also, some external tools don’t support
the "TITLE" field in the MGF file. To disable export of the spectrum ID/title
exportTitle = FALSE can be used.
sps_ex <- selectSpectraVariables(sps, c("mz", "intensity", "rtime",
                                        "acquisitionNum", "precursorMz",
                                        "precursorCharge"))
export(sps_ex, backend = MsBackendMgf(), file = fl, exportTitle = FALSE)
readLines(fl)[1:12]##  [1] "BEGIN IONS"        "RTINSECONDS=1028"  "SCANS=2162"       
##  [4] "PEPMASS=816.33826" "CHARGE=2+"         "102.0548 753.738" 
##  [7] "103.00494 385.376" "103.03531 315.441" "115.05001 413.206"
## [10] "115.08686 588.273" "120.08063 800.016" "124.10555 526.761"sessionInfo()## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] MsBackendMgf_1.10.0 Spectra_1.12.0      ProtGenerics_1.34.0
## [4] BiocParallel_1.36.0 S4Vectors_0.40.0    BiocGenerics_0.48.0
## [7] BiocStyle_2.30.0   
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.1              knitr_1.44             rlang_1.1.1           
##  [4] xfun_0.40              clue_0.3-65            jsonlite_1.8.7        
##  [7] htmltools_0.5.6.1      sass_0.4.7             rmarkdown_2.25        
## [10] evaluate_0.22          jquerylib_0.1.4        MASS_7.3-60           
## [13] fastmap_1.1.1          yaml_2.3.7             IRanges_2.36.0        
## [16] bookdown_0.36          MsCoreUtils_1.14.0     BiocManager_1.30.22   
## [19] cluster_2.1.4          compiler_4.3.1         codetools_0.2-19      
## [22] fs_1.6.3               MetaboCoreUtils_1.10.0 digest_0.6.33         
## [25] R6_2.5.1               parallel_4.3.1         bslib_0.5.1           
## [28] tools_4.3.1            cachem_1.0.8