Document Type
Dissertation - University Access Only
Award Date
2011
Degree Name
Doctor of Philosophy (PhD)
Department / School
Mathematics and Statistics
First Advisor
Xijin Ge
Abstract
Many plant genes have been identified through whole genome and deep transcriptome sequencing and other methods; yet our knowledge on the function of many of these genes remains limited. The integration and analysis of large gene-expression datasets gives researchers the ability to formalize hypotheses concerning the functionality and interaction between different groups of correlated genes. Two paths to the analysis and study of gene co-expression are described in this dissertation. In the first approach, the non-negative matrix factorization (KMF) algorithm was applied to the AtGenExpress dataset which consists of 783 microarray samples (29 separate experimental series) conducted on the model plant Arabidopsis thaliana. 15 metagenes (groups of genes sharing correlated expression patterns) were identified. Functional roles of these metagenes were established by observing the enriched gene ontology (GO) categories using Gene Set Enrichment Analysis (GSEA). Activity levels of these metagenes in various experimental conditions are also analysed to associate metagenes with stimuli/conditions. A metagene correlation network, constructed based on the results of NMF analysis, revealed many new interactions between the metagenes. The second approach was the development of a web-based genomic search engine (http://www. arraysearch. org) to empower researchers to explore Arabidopsis gene expression datasc:ts using queries derived from their own experiments. It achieves this by finding statistical correlations between newly observed gene expression profiles and a database of curated expression profiles. In contrast to other search tools which store a compiled list of gene expression profiles, ArraySearch provides researchers with a way to query t 1u-: database using expression profiles derived from their own experiments. It is hoped that this study will provide a guide for other researchers looking to apply advanced data mining techniques to the gene expression data from Arabidopsis, or other species. Although the NMF method was used in this analysis, there is no reason other methods such as Principal Component Analysis could not be used to calculate the metagenes. Additionally, future versions of ArraySearch will provide advanced user functionality as well as gene expression datasets from species other than Arabidopsis. Hopefully the tool will be of benefit to the bioinformatics community.
Library of Congress Subject Headings
Arabidopsiss thaalina -- Genetic aspects
Plant gene expression
Publisher
South Dakota State University
Recommended Citation
Wilson, Tyler James, "Gene Co-Expression Analysis of Arabidopsis thaliana" (2011). Electronic Theses and Dissertations. 2107.
https://openprairie.sdstate.edu/etd2/2107