\name{generateSeeds-methods}
\docType{methods}
\alias{generateSeeds-methods}
\alias{generateSeeds}
\alias{generateSeeds,eSet-method}
\alias{generateSeeds,matrix-method}
\title{Generate seeds for biclustering}
\description{
  \code{generateSeeds} takes either matrix or
  an \code{\linkS4class{ExpressionSet}} object to generate seeds. Seeds
  are defined as pairs of genes (edges) which share coincident
  expression levels in samples. The higher the coincidence, the higher
  the score of the seeds will be. The seeds are generated by subsequent
  comparing each pair of genes. When all seeds have been produced, they
  are sorted by the coincidence scores and returned as an object. See
  the details section for notes on implementation.
}
\section{Methods}{
  In the \code{rqubic} package, \code{generateSeeds} currently supports
  two data types: \code{\linkS4class{ExpressionSet}} (an inherited type
  of \code{\linkS4class{eSet}}, or numeric matrix.

  Both methods requires in addition a parameter, \code{minColWidth},
  specifying the minimum number of conditions shared by the two genes of
  each seed. Its default value is 2. When this default value is used,
  the minimum coincidence score is defined as \eqn{max(2, ncol/20)},
  where \eqn{ncol} represents the number of conditions. When a
  non-default value is provided, the value is used to select seeds.
  
  \describe{
    \item{\code{signature(object = "eSet")}}{An object representing
      expression data. Note that the \code{exprs} must be a matrix of
      integers, otherwise the method warns and coerces the storage mode
      of matrix into integer.}
    
    \item{\code{signature(object = "matrix")}}{A matrix of integers. In case
      filled by non-integers, the method warns and coerces the storage mode
      into integer}
}}
\section{Details}{
  The function compares all pairs of genes, namely all edges of a
  complete graph composed by genes. The weight of each edge is defined
  as the number of samples, in which two genes have the same expression
  level. This weight, also known as the \emph{coincidence score},
  reflects the co-regulation relationship between two genes.

  The seed is chosen by picking edges with higher scores than the
  minimum score, provided by the \code{minColWidth} parameter (default:
  2).

  To implement such a selection algorithm, a \emph{Fibonacci heap} is
  constructed in the C codes. Its size is predefined as a constant,
  which should be reduced in case the gene number is too large to
  run the algorithm. A new seed, which was selected by having a higher
  coincidence score than the minimum, is inserted to the heap. And
  dependent on whether the heap is full or not, it is either inserted by
  squeezing the minimum seed out, or put into the heap directly.

  Once the heap is filled by examining all pairs of genes, it is dumped
  into an array of edge pointers, with decreasingly ordered edge
  pointers by their scores. This array is captured as an external
  pointer, attached as an attribute of an \code{rqubicSeeds} object.

  An \code{rqubicSeeds} object holds an integer, which records the
  height of the heap. It has (besides the class identifier) two
  attributes: one for the external pointer, and the other one for the
  threshold of the coincidence score.
}
\note{
  In the \code{rqubic} implementation, the variable \code{arr_c[i][j]}
  holds the level symbols (\eqn{-1, 0, 1} in the default case), whereas in
  the \code{QUBIC} implementation, this variable holds the index of
  level symbols, and the level symbols are saved in the global variable
  \code{symbols}.
}
\author{Jitao David Zhang <jitao_david.zhang@roche.com>}
\examples{
data(sample.ExpressionSet, package="Biobase")
sample.disc <- quantileDiscretize(sample.ExpressionSet)
sample.seeds <- generateSeeds(sample.disc)
sample.seeds

## with higher threshold of incidence score
sample.seeds.higher <- generateSeeds(sample.disc, minColWidth=5)
sample.seeds.higher
}