Document Type

Dissertation - Open Access

Award Date

2018

Degree Name

Doctor of Philosophy (PhD)

Department / School

Mathematics and Statistics

First Advisor

Qin Ma

Second Advisor

Anne Fennell

Keywords

algorithm development, computational tools, RNA-Sequencing

Abstract

Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods.

Library of Congress Subject Headings

Algorithms.
Bioinformatics -- Mathematics.
Nucleotide sequence -- Statistical methods.
Biometry.
Genetics -- Statistics.
RNA -- Analysis.

Description

Includes bibliographical references

Format

application/pdf

Number of Pages

137

Publisher

South Dakota State University

Share

COinS
 

Rights Statement

In Copyright