---
title: "cadd.v1.6.hg19 AnnotationHub Resource Metadata"
author:
- name: Robert Castelo
  affiliation: 
  - &id Dept. of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
  email: robert.castelo@upf.edu
package: "`r pkg_ver('cadd.v1.6.hg19')`"
abstract: >
  Store cadd.v1.6.hg19 AnnotationHub Resource Metadata.
vignette: >
  %\VignetteIndexEntry{cadd.v1.6.hg19 AnnotationHub Resource Metadata}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output: 
  BiocStyle::html_document:
    toc: true
    toc_float: true
    number_sections: true
---

```{r setup, echo=FALSE}
options(width=80)
```

# Retrieval of cadd.v1.6.hg19 pathogenicity scores through AnnotationHub resources

The `cadd.v1.6.hg19` package provides metadata for the
`r Biocpkg("AnnotationHub")` resources associated with human CADD
pathogenicity scores [@kircher2014general]. The original data can be found at
the University of Washington download
[site](https://cadd.gs.washington.edu/download).
Details about how those original data were processed into
`r Biocpkg("AnnotationHub")` resources can be found in the source
file:

```
cadd.v1.6.hg19/scripts/make-metadata_cadd.v1.6.hg19.R
```

The pathogenicity scores for `cadd.v1.6.hg19` can be retrieved using
the `r Biocpkg("AnnotationHub")`,
which is a web resource that provides a central location where genomic files
(e.g., VCF, bed, wig) and other resources from standard (e.g., UCSC, Ensembl) and
distributed sites, can be found. A Bioconductor `r Biocpkg("AnnotationHub")` web
resource creates and manages a local cache of files retrieved by the user,
helping with quick and reproducible access.

While the `r Biocpkg("AnnotationHub")` API can be used to query those resources,
we encourage to use the `r Biocpkg("GenomicScores")` API
[@puigdevall2018genomicscores], as follows. The first step to retrieve genomic
scores is to check the ones available to download.

```{r, echo=FALSE}
avgs <- readRDS(system.file("extdata", "avgs.rds", package="GenomicScores"))
```
```{r retrieve2, message=FALSE, cache=FALSE, eval=FALSE}
availableGScores()
```
```{r, message=FALSE, cache=FALSE, echo=FALSE}
library(AnnotationHub)
library(GenomicScores)
setAnnotationHubOption("MAX_DOWNLOADS", 30)
avgs
```

The selected resource can be downloaded with the function getGScores().
After the resource is downloaded the first time, the cached copy will
enable a quicker retrieval later.

```{r retrieve3, message=FALSE, cache=FALSE}
cadd <- getGScores("cadd.v1.6.hg19")
cadd
citation(cadd)
```

Finally, the CADD pathogenicity score of a particular genomic position
is retrieved using the function 'gscores()'. Please consult the documentation
of the `r Biocpkg("GenomicScores")` package for details on how to use it. For
instance, @cheng2023accurate report likely pathogenic scores for variants in
the human glucose sensor GCK. If we would like to retrieve the CADD
score of the variant
[NM_000162.5(GCK):c.1174C>T (p.Arg392Cys)](https://www.ncbi.nlm.nih.gov/clinvar/variation/585909),
classified as pathogenic in the ClinVar database, we should call `gscores()`
as follows.

```{r retrieve4, message=FALSE}
gscores(cadd, GRanges("chr7:44185175"), ref="C", alt="T")
```

## Building an annotation package from a GScores object

Retrieving genomic scores through `AnnotationHub` resources requires an internet
connection and we may want to work with such resources offline. For that purpose,
we can create ourselves an annotation package, such as
[phastCons100way.UCSC.hg19](https://bioconductor.org/packages/phastCons100way.UCSC.hg19),
from a `GScores` object corresponding to a downloaded `AnnotationHub` resource.
To do that we use the function `makeGScoresPackage()` as follows:

```{r eval=FALSE}
makeGScoresPackage(cadd, maintainer="Me <me@example.com>", author="Me", version="1.0.0")
```
```{r echo=FALSE}
cat("Creating package in ./cadd.v1.6.hg19\n")
```

An argument, `destDir`, which by default points to the current working
directory, can be used to change where in the filesystem the package is created.
Afterwards, we should still build and install the package via, e.g.,
`R CMD build` and `R CMD INSTALL`, to be able to use it offline.

# Session information

```{r session_info, cache=FALSE}
sessionInfo()
```

# References