---
title: "lineagespot User Guide"
author:
- name: Nikolaos Pechlivanis
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
  email: nikosp41@certh.gr
- name: Maria Tsagiopoulou
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Maria Christina Maniou
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Anastasis Togkousidis
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Evangelia Mouchtaropoulou
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Taxiarchis Chassalevris
  affiliation: School of Veterinary Medicine, 
    Aristotle University of Thessaloniki, GR
- name: Serafeim Chaintoutis
  affiliation: School of Veterinary Medicine, 
    Aristotle University of Thessaloniki, GR
- name: Chrysostomos Dovas
  affiliation: School of Veterinary Medicine, 
    Aristotle University of Thessaloniki, GR
- name: Maria Petala
  affiliation: Dept. of Civil Engineering, 
    Aristotle University of Thessaloniki, GR
- name: Margaritis Kostoglou
  affiliation: Dept. of Chemistry, 
    Aristotle University of Thessaloniki, GR
- name: Thodoris Karapantsios
  affiliation: Dept. of Chemistry, 
    Aristotle University of Thessaloniki, GR
- name: Stamatia Laidou
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Elisavet Vlachonikola
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Anastasia Chatzidimitriou
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Agis Papadopoulos
  affiliation: EYATH S.A., 
    Thessaloniki Water Supply and Sewerage Company S.A., Thessaloniki, GR
- name: Nikolaos Papaioannou
  affiliation: School of Veterinary Medicine, 
    Aristotle University of Thessaloniki, GR
- name: Anagnostis Argiriou
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR
- name: Fotis E. Psomopoulos
  affiliation: Institute of Applied Biosciences, 
    Centre for Research and Technology Hellas, Thessaloniki, GR

date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('lineagespot')`"
output: 
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{lineagespot User Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## For links
library("BiocStyle")
## Bib setup
library("RefManageR")
## Write bibliography information
bib <- c(
    R = citation(),
    BiocStyle = citation("BiocStyle")[1],
    data.table = citation("data.table")[1],
    stringr = citation("stringr")[1]
)
```


<style>
<!-- h1, h2, h3, h4 { -->
<!--   color:#17247a; -->
<!-- } -->

strong {
    color:#eb6b1c;
}

</style>

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
    comment = "#>",
    error = FALSE,
    warning = FALSE,
    message = FALSE,
    crop = NULL
)
```
# Introduction

`lineagespot` is a framework written in R, and aims to identify 
SARS-CoV-2 related mutations based on a single (or a list) of variant(s) 
file(s) (i.e., variant calling format). The method can facilitate the 
detection of SARS-CoV-2 lineages in wastewater samples using next 
generation sequencing, and attempts to infer the potential distribution 
of the SARS-CoV-2 lineages. 

# Quick start

## Installation

`lineagespot` is distributed as a [Bioconductor](https://www.bioconductor.org/) 
package and requires `R` (version "4.1"), which can be installed on any 
operating system from [CRAN](https://cran.r-project.org/), and 
Bioconductor (version "3.14").

To install `lineagespot` package enter the following commands in 
your `R` session:

```{r eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("lineagespot")

## Check that you have a valid Bioconductor installation
BiocManager::valid()
```

## Raw data analysis

Example fastq files are provided through 
[zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.4564182.svg). 
For the pre processing steps of them, the bioinformatics analysis pipeline is 
provided [here](../inst/scripts/raw-data-analysis.md).

## Running lineagespot 

Once `lineagespot` is successfully installed, it can be loaded as follow:

```{r setup}
library(lineagespot)
```

`lineagespot` can be run by calling one function that implements the overall 
pipeline:


```{r eval=TRUE}
results <- lineagespot(vcf_folder = system.file("extdata", "vcf-files", 
                                                package = "lineagespot"),

                      gff3_path = system.file("extdata", 
                                              "NC_045512.2_annot.gff3", 
                                              package = "lineagespot"),

                      ref_folder = system.file("extdata", "ref", 
                                               package = "lineagespot"))
```

## Explore the results

The function returns three tables:

- An overall variant table containing all variants included in the input VCF 
files, along with the related information (gene, location etc.)  

```{r}
# overall table
head(results$variants.table)
```
- A table with the identified overlaps/hits between the variant table and the 
given lineage reports.

```{r}
# lineages' hits
head(results$lineage.hits)
```

- A lineage report table where metrics for the abundance of each lineage are 
computed. 
  To this end, the mean AF (allele frequence) of the variants per lineage, 
  the mean AF of the unique variants per lineage and the non-zero min. AF of the
  lineage's unique variants are computed.
  Moreover given a AF threshold the number of variants in each sample is 
  computed along with the resulting proportion (Number of variants to the number
  of lineage rules).

```{r}
# lineagespot report
head(results$lineage.report)
```

# Session info {.unnumbered}

Here is the output of `sessionInfo()` on the system on which this document was
compiled running pandoc ``r rmarkdown::pandoc_version()``:

```{r sessionInfo, echo=FALSE}
sessionInfo()
```