%\VignetteIndexEntry{AnnotationDbi: Introduction To Bioconductor Annotation Packages}
%\VignetteDepends{org.Hs.eg.db,TxDb.Hsapiens.UCSC.hg19.knownGene,hom.Hs.inp.db}

\documentclass[11pt]{article}


\usepackage{Sweave}
\usepackage[usenames,dvipsnames]{color}
\usepackage{graphics}
\usepackage{latexsym, amsmath, amssymb}
\usepackage{authblk}
\usepackage[colorlinks=true, linkcolor=Blue, urlcolor=Blue,
  citecolor=Blue]{hyperref}

%% Simple macros

\newcommand{\code}[1]{{\texttt{#1}}}
\newcommand{\file}[1]{{\texttt{#1}}}

\newcommand{\software}[1]{\textsl{#1}}
\newcommand\R{\textsl{R}}
\newcommand\Bioconductor{\textsl{Bioconductor}}
\newcommand\Rpackage[1]{{\textsl{#1}\index{#1 (package)}}}
\newcommand\Biocpkg[1]{%
  {\href{http://bioconductor.org/packages/devel/bioc/html/#1.html}%
    {\textsl{#1}}}%
  \index{#1 (package)}}
\newcommand\Rpkg[1]{%
  {\href{http://cran.fhcrc.org/web/devel/#1/index.html}%
    {\textsl{#1}}}%
  \index{#1 (package)}}
\newcommand\Biocdatapkg[1]{%
  {\href{http://bioconductor.org/packages/devel/data/experiment/html/#1.html}%
    {\textsl{#1}}}%
  \index{#1 (package)}}
\newcommand\Robject[1]{{\small\texttt{#1}}}
\newcommand\Rclass[1]{{\textit{#1}\index{#1 (class)}}}
\newcommand\Rfunction[1]{{{\small\texttt{#1}}\index{#1 (function)}}}
\newcommand\Rmethod[1]{{\texttt{#1}}}
\newcommand\Rfunarg[1]{{\small\texttt{#1}}}
\newcommand\Rcode[1]{{\small\texttt{#1}}}

%% Question, Exercise, Solution
\usepackage{theorem}
\theoremstyle{break}
\newtheorem{Ext}{Exercise}
\newtheorem{Question}{Question}


\newenvironment{Exercise}{
  \renewcommand{\labelenumi}{\alph{enumi}.}\begin{Ext}%
}{\end{Ext}}
\newenvironment{Solution}{%
  \noindent\textbf{Solution:}\renewcommand{\labelenumi}{\alph{enumi}.}%
}{\bigskip}




\title{AnnotationDbi: Introduction To Bioconductor Annotation Packages}
\author{Marc Carlson}

\SweaveOpts{keep.source=TRUE}
\begin{document}

\maketitle


\begin{figure}[ht]
\centering
\includegraphics[width=.6\textwidth]{databaseTypes.pdf}
\caption{Annotation Packages: the big picture}
\label{fig:dbtypes}
\end{figure}

\Bioconductor{} provides extensive annotation resources. These can be
\emph{gene centric}, or \emph{genome centric}. Annotations can be
provided in packages curated by \Bioconductor, or obtained from
web-based resources.  This vignette is primarily concerned with
describing the annotation resources that are available as packages.
This includes both how to extract data from them and also what steps
are required to expose other databases in a similar fashion.

Gene centric \Rpackage{AnnotationDbi} packages
include:
\begin{itemize}
  \item Organism level: e.g. \Rpackage{org.Mm.eg.db}.
  \item Platform level: e.g. \Rpackage{hgu133plus2.db},
        \Rpackage{hgu133plus2.probes}, \Rpackage{hgu133plus2.cdf}.
  \item Homology level: e.g. \Rpackage{hom.Dm.inp.db}.
  \item System-biology level: \Rpackage{GO.db} or \Rpackage{KEGG.db}.
\end{itemize}
Genome centric \Rpackage{GenomicFeatures} packages include
\begin{itemize}
  \item Transcriptome level: e.g. \Rpackage{TxDb.Hsapiens.UCSC.hg19.knownGene}
  \item Generic genome features: Can generate via \Rpackage{GenomicFeatures}
\end{itemize}
One web-based resource accesses
\href{http://www.biomart.org/}{biomart}, via the \Rpackage{biomaRt}
package:
\begin{itemize}
  \item Query web-based `biomart' resource for genes, sequence,
        SNPs, and etc.
\end{itemize}


The most popular annotation packages have been modified so that they
can make use of a new set of methods to more easily access their
contents.  These four methods are named: \Rmethod{cols},
\Rmethod{keytypes}, \Rmethod{keys} and \Rmethod{select}.  And they are
described in this vignette.  They can currently be used with all chip,
organism, and \Rclass{TranscriptDb} packages along with the popular
GO.db package.

For the older less popular packages, there are still conventient ways
to retrieve the data.  The \emph{How to use bimaps from the ".db"
annotation packages} vignette in the \Rpackage{AnnotationDbi} package
is a key reference for learnign about how to use bimap objects.

Finally, all of the `.db' (and most other \Bioconductor{} annotation
packages) are updated every 6 months corresponding to each release of
\Bioconductor{}.  Exceptions are made for packages where the actual
resources that the packages are based on have not themselves been
updated.


\subsection{AnnotationDb objects and the select method}

As previously mentioned, a new set of methods have been added that
allow a simpler way of extracting identifier based annotations.  All
the annotation packages that support these new methods expose an
object named exactly the same as the package itself.  These objects
are collectively called \Rclass{AnntoationDb} objects for the class
that they all inherit from.  The more specific classes (the ones that
you will actually see in the wild) have names like \Rclass{OrgDb},
\Rclass{ChipDb} or \Rclass{TranscriptDb} objects.  These names
correspond to the kind of package (and underlying schema) being
represented.  The methods that can be applied to all of these objects
are \Rmethod{cols}, \Rmethod{keys}, \Rmethod{keytypes} and
\Rmethod{select}.


\subsection{ChipDb objects and the select method}

An extremely common kind of Annotation package is the so called
platform based or chip based package type.  This package is intended
to make the manufacturer labels for a series of probes or probesets to
a wide range of gene-based features.  A package of this kind will load
an \Rclass{ChipDb} object.  Below is a set of examples to show how you
might use the standard 4 methods to interact with an object of this
type.

First we need to load the package:

<<loadChip>>=
library(hgu95av2.db)
@ 

If we list the contents of this package, we can see that one of the
many things loaded is an object named after the package "hgu95av2.db":

<<listContents>>=
ls("package:hgu95av2.db")
@ 

We can look at this object to learn more about it:

<<show>>=
hgu95av2.db
@ 

If we want to know what kinds of data are retriveable via
\Rmethod{select}, then we should use the \Rmethod{cols} method like
this:

<<cols>>=
cols(hgu95av2.db)
@ 

If we are further curious to know more about those values for cols, we
can consult the help pages.  Asking about any of these values will
pull up a manual page describing the different fields and what they
mean.

<<help, eval=FALSE>>=
help("SYMBOL")
@ 

If we are curious about what kinds of fields we could potentiall use
as keys to query the database, we can use the \Rmethod{keytypes}
method.  In a perfect world, this method will return values very
similar to what was returned by \Rmethod{cols}, but in reality, some
kinds of values make poor keys and so this list is often shorter.

<<keytypes>>=
keytypes(hgu95av2.db)
@ 

If we want to extract some sample keys of a particular type, we can
use the \Rmethod{keys} method.

<<keys>>=
head(keys(hgu95av2.db, keytype="SYMBOL"))
@ 

And finally, if we have some keys, we can use \Rmethod{select} to
extract them.  By simply using appropriate argument values with select
we can specify what keys we want to look up values for (keys), what we
want returned back (cols) and the type of keys that we are passing in
(keytype)

<<selectChip>>=
#1st get some example keys
k <- head(keys(hgu95av2.db,keytype="PROBEID"))
# then call select
select(hgu95av2.db, keys=k, cols=c("SYMBOL","GENENAME"), keytype="PROBEID")
@ 

And as you can see, when you call the code above, select will try to
return a data.frame with all the things you asked for matched up to
each other.

\subsection{OrgDb objects and the select method}
An organism level package (an `org' package) uses a central gene
identifier (e.g. Entrez Gene id) and contains mappings between this
identifier and other kinds of identifiers (e.g. GenBank or Uniprot
accession number, RefSeq id, etc.).  The name of an org package is
always of the form \emph{org.<Ab>.<id>.db}
(e.g. \Rpackage{org.Sc.sgd.db}) where \emph{<Ab>} is a 2-letter
abbreviation of the organism (e.g. \Rpackage{Sc} for
\emph{Saccharomyces cerevisiae}) and \emph{<id>} is an abbreviation
(in lower-case) describing the type of central identifier
(e.g. \Rpackage{sgd} for gene identifiers assigned by the
Saccharomyces Genome Database, or \Rpackage{eg} for Entrez Gene ids).

Just as the chip packages load a \Rclass{ChipDb} object, the org
packages will load a \Rclass{OrgDb} object.  The following exercise
should acquaint you with the use of these methods in the context of an
organism package.

%% select exercise
\begin{Exercise}
  Display the \Rclass{OrgDb} object for the \Biocpkg{org.Hs.eg.db} package.
  
  Use the \Rmethod{cols} method to discover which sorts of
  annotations can be extracted from it. Is this the same as the result
  from the \Rfunction{keytypes} method?  Use the \Rfunction{keytypes}
  method to find out.  
  
  Finally, use the \Rfunction{keys} method to extract UNIPROT
  identifiers and then pass those keys in to the \Rfunction{select}
  method in such a way that you extract the gene symbol and KEGG pathway
  information for each.  Use the help system as needed to learn which
  values to pass in to cols in order to achieve this.
\end{Exercise}
\begin{Solution}
<<selectOrg1>>=
library(org.Hs.eg.db)
cols(org.Hs.eg.db)
@ 
<<selectOrg2, eval=FALSE>>=
help("SYMBOL") ## for explanation of these cols and keytypes values
@
<<selectOrg3>>=
keytypes(org.Hs.eg.db)
uniKeys <- head(keys(org.Hs.eg.db, keytype="UNIPROT"))
cols <- c("SYMBOL", "PATH")
select(org.Hs.eg.db, keys=uniKeys, cols=cols, keytype="UNIPROT")
@
\end{Solution}

So how could you use select to annotate your results? This next
exercise should hlep you to understand how that should generally work.

\begin{Exercise}
  Please run the following code snippet (which will load a fake data
  result that I have provided for the purposes of illustration):
  
<<selectData>>=
load(system.file("extdata", "resultTable.Rda", package="AnnotationDbi"))
head(resultTable)
@

  The rownames of this table happen to provide entrez gene identifiers
  for each row (for human).  Find the gene symbol and gene name for each
  of the rows in resultTable and then use the merge method to attach
  those annotations to it.
\end{Exercise}
\begin{Solution}
<<selectOrgData>>=
annots <- select(org.Hs.eg.db, keys=rownames(resultTable),
                 cols=c("SYMBOL","GENENAME"), keytype="ENTREZID")
resultTable <- merge(resultTable, annots, by.x=0, by.y="ENTREZID")
head(resultTable)
@
\end{Solution}


\subsection{Using select with GO.db}

When you load the GO.db package, a \Rclass{GODb} object is also
loaded. This allows you to use the \Rmethod{cols}, \Rmethod{keys},
\Rmethod{keytypes} and \Rmethod{select} methods on the contents of the
GO ontology.  So if for example, you had a few GO IDs and wanted to know
more about it, you could do it like this:

<<selectGO>>=
library(GO.db)
GOIDs <- c("GO:0042254","GO:0044183")
select(GO.db, keys=GOIDs, cols="DEFINITION", keytype="GOID")
@



% This here is the point where I am planning to fork my talk to the GenomicFeatures vignette

%\subsection{Using select with HomDb packages}



\subsection{Using select with TranscriptDb packages}

A \Rclass{TranscriptDb} package (a 'TxDb' package) connects a set of genomic
coordinates to various transcript oriented features.  The package can
also contain Identifiers to features such as genes and transcripts,
and the internal schema describes the relationships between these
different elements.  All TranscriptDb containing packages follow a
specific naming scheme that tells where the data came from as well as
which build of the genome it comes from.


%% select exercise TranscriptDb
\begin{Exercise}
  Display the \Rclass{TranscriptDb} object for the 
  \Biocpkg{TxDb.Hsapiens.UCSC.hg19.knownGene} package.
  
  As before, use the \Rmethod{cols} and \Rmethod{keytypes} methods to
  discover which sorts of annotations can be extracted from it.
  
  Use the \Rmethod{keys} method to extract just a few gene identifiers and
  then pass those keys in to the \Rmethod{select} method in such a
  way that you extract the transcript ids and transcript starts for each.
\end{Exercise}
\begin{Solution}
<<selectTxDb>>=
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
txdb
cols(txdb)
keytypes(txdb)
keys <- head(keys(txdb, keytype="GENEID"))
cols <- c("TXID", "TXSTART")
select(txdb, keys=keys, cols=cols, keytype="GENEID")

@
\end{Solution}

As is widely known, in addition to providing access via the
\Rmethod{select} method, \Rclass{TranscriptDb} objects also provide
access via the more familiar \Rfunction{transcripts},
\Rfunction{exons}, \Rfunction{cds}, \Rfunction{transcriptsBy},
\Rfunction{exonsBy} and \Rfunction{cdsBy} methods.  For those who do
not yet know about these other methods, more can be learned by seeing
the vignette called: \emph{Making and Utilizing TranscriptDb Objects}
in the \Rpackage{GenomicFeatures} package.





\subsection{Advanced topic: Creating other kinds of Annotation packages}
%% It would not be a terrible idea to make a whole vignette from this
%% that starts with code from last years advanced course on how to make a
%% database in R.


A few options already exist for generating various kinds of annotation
packages.For users who seek to make custom chip packages, users should
see the \emph{SQLForge: An easy way to create a new annotation package
with a standard database schema.} in the \Rpackage{AnnotationDbi}
package.  And, for users who seek to make a probe package, there is
another vignette called \emph{Creating probe packages} that is also in
the AnnotationDbi package. And finally, for custom organism packages
users should look at the manual page for
\Rfunction{makeOrgPackageFromNCBI}.  This function will attempt to
make you an simplified organism package from NCBI resources.  However,
this function is not meant as a way to refresh annotation packages
between releases.  It is only meant for people who are working on less
popular model organisms (so that annotations can be made available in
this format).


But what if you had another kind of database resource and you wanted
to expose it to the world using something like this new
\Rmethod{select} method interface?  How could you go about this?


The 1st step would be to make a package that contains a SQLite
database.  For the sake of expediency, lets look at an existing example
of this in the \Rpackage{hom.Hs.inp.db} package.  If you download this
tarball from the website you can see that it contains a .sqlite
database inside of the inst/extdata directory.  There are a couple of
important details though about this database.  The 1st is that we
recommend that the database have the same name as the package, but end
with the extension .sqlite.  The second detail is that we recommend
that the metadata table contain some important fields.  This is the
metadata from the current \Rpackage{hom.Hs.inp.db} package.

<<getMetadata, echo=FALSE>>=
library(hom.Hs.inp.db)
hom.Hs.inp_dbInfo()
@ 

As you can see there are a number of very useful fields stored in the
metadata table and if you list the equivalent table for other packages
you will find even more useful information than you find here.  But
the most important fields here are actually the ones called "package"
and "Db type".  Those fields specify both the name of the package with
the expected class definition, and also the name of the object that
this database is expected to be represented by in the R session
respectively. If you fail to include this information in your metadata
table, then \Rmethod{loadDb} will not know what to do with the
database when it is called.  In this case, the class definition has
been stored in the \Rpackage{AnnotationDbi} package, but it could live
anywhere you need it too.  By specifying the metadata field, you
enable \Rmethod{loadDb} to find it.



Once you have set up the metadata you will need to create a class for
your package that extends the \Rclass{AnnotationDb} class.  In the
case of the hom.Hs.inp.db package, the class is defined to be a
\Rclass{InparanoidDb} class. This code is inside of
\Rpackage{AnnotationDbi}.

<<referenceClass,eval=FALSE>>=
.InparanoidDb <-
    setRefClass("InparanoidDb", contains="AnnotationDb")
@ 

Finally the \Rfunction{.onLoad} call for your package will have to
contain code that will call the \Rmethod{loadDb} method.  This is what
it currently looks like in the Rpackage{hom.Hs.inp.db} package.

<<onLoad,eval=FALSE>>=
sPkgname <- sub(".db$","",pkgname)
txdb <- loadDb(system.file("extdata", paste(sPkgname,
               ".sqlite",sep=""), package=pkgname, lib.loc=libname),
               packageName=pkgname)
dbNewname <- AnnotationDbi:::dbObjectName(pkgname,"InparanoidDb")
ns <- asNamespace(pkgname)
assign(dbNewname, txdb, envir=ns)
namespaceExport(ns, dbNewname)
@ 


When the code above is run (at load time) the name of the package (AKA
"pkgname", which is a parameter that will be passed into
\Rfunction{.onLoad}) is then used to derive the name for the object.
Then that name, is used by \Rmethod{onload} to create an
\Rclass{InparanoidDb} object.  This object is then assigned to the
namespace for this package so that it will be loaded for the user.



\subsection{Creating package accessors}

At this point, all that remains is to create the means for accessing
the data in the database.  This should prove a lot less difficult than
it may initially sound.  For the new interface, only the four methods
that were described earlier are really required:
\Rmethod{cols},\Rmethod{keytypes},\Rmethod{keys} and \Rmethod{select}.

In order to do this you need to know a small amount of SQL and a few
tricks for accessing the database from R.  The point of providing
these 4 accessors is to give users of these packages a more unified
experience when retrieving data from the database.  But other kinds of
accessors (such as those provided for the \Rclass{TranscriptDb}
objects) may also be warranted.



\subsubsection{Getting a connection}

If all you know is the name of the SQLite database, then to get a DB
connection you need to do something like this:

<<classicConn,results=hide>>=
drv <- SQLite()
library("org.Hs.eg.db")
con <- dbConnect(drv, dbname=system.file("extdata", "org.Hs.eg.sqlite",
                        package = "org.Hs.eg.db"))
con
dbDisconnect(con)
@

But in our case the connection is already here as part of the object:

<<ourConn>>=
str(hom.Hs.inp.db)
@

So we can do something like below:

<<ourConn2>>=
hom.Hs.inp.db$conn
## or better we can use a helper function to wrap this:
AnnotationDbi:::dbConn(hom.Hs.inp.db)
## or we can just call the provided convenience function 
## from when this package loads:
hom.Hs.inp_dbconn()
@


\subsubsection{Getting data out}

Now we just need to get our data out of the DB.  There are several
useful functions for doing this.  Most of these come from the RSQLite
or DBI packages.  For the sake of simplicity, I will only discuss
those that are immediately useful for exploring and extracting data
from a database in this vignette.  One pair of useful methods are the
\Rmethod{dbListTables} and \Rmethod{dbListFields} which are useful for
exploring the schema of a database.

<<dbListTables>>=
con <- AnnotationDbi:::dbConn(hom.Hs.inp.db)
head(dbListTables(con))
dbListFields(con, "Mus_musculus")
@ 

And for actually executing SQL to retrieve data, you probably want to
use something like \Rmethod{dbGetQuery}.  The only caveat is that this
will actually require you to know a little SQL.

<<dbGetQuery>>=
dbGetQuery(con, "SELECT * FROM metadata")
@ 


\subsubsection{Some basic SQL}

The good news is that SQL is pretty easy to learn.  Especially if you
are primarily interested in just retrieving data from an existing
database.  Here is a quick run-down to get you started on writing
simple SELECT statements.  Consider a table that looks like this:\\

Table sna:
\begin{tabular}{cc}
foo & bar \\\hline
1   & baz \\
2   & boo \\
\end{tabular}\\

\noindent This statement:\\

SELECT bar FROM sna;\\

\noindent Tells SQL to get the "bar" field from the "foo" table.  If
we wanted the other field called "sna" in addition to "bar", we could
have written it like this:\\

SELECT foo, bar FROM sna;\\

\noindent Or even this (* is a wildcard character here)\\

SELECT * FROM sna;\\

\noindent Now lets suppose that we wanted to filter the results.  We could also
have said something like this:\\

SELECT * FROM sna where bar='boo';\\

\noindent That query will only retrieve records from foo that match
the criteria for bar.  But there are two other things to notice.
First notice that a single = was used for testing equality.  Second
notice that I used single quotes to demarcate the string.  I could
have also used double quotes, but when working in R this will prove to
be less convenient as the whole SQL statement itself will frequently
have to be wrapped as a string. \\

\noindent What if we wanted to be more general?  Then you can use LIKE.  Like
this:\\

SELECT * FROM sna where bar LIKE 'boo\%';\\

\noindent That query will only return records where bar starts with "boo", (the
\% character is acting as another kind of wildcard in this context)\\

\noindent You will often find that you need to get things from two or
more different tables at once.  Or, you may even find that you need to
combine the results from two different queries.  Sometimes these two
queries may even come from the same table.  In any of these cases, you
want to do a join.  The simplest and most common kind of join is an
inner join.  Lets suppose that we have two tables:\\

Table sna:
\begin{tabular}{cc}
foo & bar \\\hline
1   & baz \\
2   & boo \\
\end{tabular}\\ \\

Table fu:
\begin{tabular}{cc}
foo & bo \\\hline
1   & hi \\
2   & ca \\
\end{tabular}\\

\noindent And we want to join them where the records match in their
corresponding "foo" columns.  We can do this query to join them:\\

SELECT * FROM sna,fu WHERE sna.foo=fu.foo;\\

\noindent Something else we can do is tidy this up by using aliases like so:\\

SELECT * FROM sna AS s,fu AS f WHERE s.foo=f.foo;\\

\noindent This last trick is not very useful in this particular
example since the query ended up being longer than we started with,
but is still great for other cases where queries can become really
long.


\subsubsection{Exploring the SQLite database from R}

Now that we know both some SQL and also about some of the methods in
\Rpackage{DBI} and \Rpackage{RSQLite} we can begin to explore the
underlying database from R.  How should we go about this?  Well the
1st thing we always want to know are what tables are present.  We
already know how to learn this:

<<dbListTables2>>=
head(dbListTables(con))
@ 

And we also know that once we have a table we are curious about, we
can then look up it's fields using \Rmethod{dbListFields}

<<dbListFields2>>=
dbListFields(con, "Apis_mellifera")
@ 

And once we know something about which fields are present in a table,
we can compose a SQL query.  perhaps the most straightforward query is
just to get all the results from a given table.  We know that the SQL
for that should look like: \\

SELECT * FROM Apis\_mellifera; \\

\noindent So we can now call a query like that from R by using \Rmethod{dbGetQuery}: 

<<dbGetQuery2>>=
head(dbGetQuery(con, "SELECT * FROM Apis_mellifera"))
@ 



\begin{Exercise}
Now use what you have learned to explore the \Rpackage{hom.Hs.inp.db}
database.  The formal scientific name for one of the mosquitos that
carry the malaria parasite is Anopheles gambiae.  Now find the table
for that organism in the \Rpackage{hom.Hs.inp.db} database and extract
it into R.  How many species are present in this table?  Inparanoid
uses a five letter designation for each species that is composed of
the 1st 2 letters of the genus followed by the 1st 3 letters of the
species.  Using this fact, write a SQL query that will retrieve only
records from this table that are from humans (Homo sapiens).
\end{Exercise}
\begin{Solution}
<<Anopheles,eval=FALSE>>=
head(dbGetQuery(con, "SELECT * FROM Anopheles_gambiae"))
## Then only retrieve human records
## Query: SELECT * FROM Anopheles_gambiae WHERE species='HOMSA'
head(dbGetQuery(con, "SELECT * FROM Anopheles_gambiae WHERE species='HOMSA'"))
dbDisconnect(con)
@
\end{Solution}



\subsubsection{Example: creating a cols method}

Now lets suppose that we want to define a \Rmethod{cols} method for
our \Rclass{hom.Hs.inp.db} object. And lets also suppose that we want
is for it to tell us about the actual organisms for which we can
extract identifiers.  How could we do that?

<<cols,eval=FALSE>>=
.cols <- function(x){
  con <- AnnotationDbi:::dbConn(x)
  list <- dbListTables(con)
  ## drop unwanted tables
  unwanted <- c("map_counts","map_metadata","metadata")
  list <- list[!list %in% unwanted]
  ## Then just to format things in the usual way
  list <- toupper(list)
  dbDisconnect(con)
  list
}

## Then make this into a method
setMethod("cols", "InparanoidDb", .cols(x))
## Then we can call it
cols(hom.Hs.inp.db)
@

Notice how I formatted the output to all uppercase characters?  This
is just done to make the interface look consistent with what has been
done before for the other \Rmethod{select} interfaces.  But doing this
means that we will have to do a tiny bit of extra work when we
implement out other methods.


\begin{Exercise}
Now use what you have learned to try and define a method for
\Rmethod{keytypes} on \Rclass{hom.Hs.inp.db}.  The keytypes method
should return the same results as cols (in this case).  What if you
needed to translate back to the lowercase table names?  Also write an
quick helper function to do that.
\end{Exercise}
\begin{Solution}
<<keytypes,eval=FALSE>>=
setMethod("keytypes", "InparanoidDb", .cols(x))
## Then we can call it
keytypes(hom.Hs.inp.db)

## refactor of .cols
.getLCcolnames <- function(x){
  con <- AnnotationDbi:::dbConn(x)
  list <- dbListTables(con)
  ## drop unwanted tables
  unwanted <- c("map_counts","map_metadata","metadata")
  list <- list[!list %in% unwanted]
  dbDisconnect(con)
  list
}
.cols <- function(x){
  list <- .getLCcolnames(x)
  ## Then just to format things in the usual way
  toupper(list)
}
## Test:
cols(hom.Hs.inp.db)

## new helper function:
.getTableNames <- function(x){
  LC <- .getLCcolnames(x)
  UC <- .cols(x)
  names(UC) <- LC
  UC
}
.getTableNames(hom.Hs.inp.db)
@
\end{Solution}


\begin{Exercise}
Now define a method for \Rmethod{keys} on \Rclass{hom.Hs.inp.db}.  The
keys method should return the keys from a given organism based on the
appropriate keytype.  Since each table has rows that correspond to
both human and non-human IDs, it will be necessary to filter out the
human rows from the result
\end{Exercise}
\begin{Solution}
<<keys,eval=FALSE>>=
.keys <- function(x, keytype){
  ## translate keytype back to table name
  tabNames <- .getTableNames(x)
  lckeytype <- names(tabNames[tabNames %in% keytype])
  ## get a connection
  con <- AnnotationDbi:::dbConn(x)
  sql <- paste("SELECT inp_id FROM",lckeytype, "WHERE species!='HOMSA'")
  res <- dbGetQuery(con, sql)
  res <- as.vector(t(res))
  dbDisconnect(con)
  res
}

setMethod("keys", "InparanoidDb", .keys(x, keytype))
## Then we can call it
keys(hom.Hs.inp.db, "TRICHOPLAX_ADHAERENS")
@
\end{Solution}



%% I will only have 45 minutes.


%% I need to now intro some basic SQL and some helpful functions like getDBQuery() and dbListTables, dbListFields etc.
%% Then I need to do an example by doing cols() for people



\end{document}