Document Type

Thesis - Open Access

Award Date


Degree Name

Master of Science (MS)

Department / School

Agronomy, Horticulture, and Plant Science

First Advisor

Anne Fennell


cell type specific regulon, cis-regulatory elements, deep learning NGS, data analyses, regulatory motif identification, single cell rna seq


Identifying precise transcription factor binding sites (TFBS) or regulatory DNA motif (motif) plays a fundamental role in researching transcriptional regulatory mechanism in cells and helping construct regulatory networks for biological investigation. Chromatin immunoprecipitation combined with sequencing (ChIP-seq) and lambda exonuclease digestion followed by high-throughput sequencing (ChIP-exo) enables researchers to identify TFBS on a genome-scale with improved resolution. Several algorithms have been developed to perform motif identification, employing widely different methods and often giving divergent results. In addition, these existing methods still suffer from prediction accuracy. Thesis focuses on the development of improved regulatory DNA motif identification techniques. We designed an integrated framework, WTSA, that can reliably combine the experimental signals from ChIP-exo data in base pair (bp) resolution to predict the statistically significant DNA motifs. The algorithm improves the prediction accuracy and extends the scope of applicability of the existing methods. We have applied the framework to Escherichia coli k12 genome and evaluated WTSA prediction performance through comparison with seven existing programs. The performance evaluation indicated that WTSA provides reliable predictive power for regulatory motifs using ChIP-exo data. An important application of DNA motif identification is to identify transcriptional regulatory mechanisms. The rapid development of single-cell RNA-Sequencing (scRNAseq) technologies provides an unprecedented opportunity to discover the gene transcriptional regulation at the single-cell level. In the scRNA-seq analyses, a critical step is to identify the cell-type-specific regulons (CTS-Rs), each of which is a group of genes co-regulated by the same transcription regulator in a specific cell type. We developed a web server, IRIS3 (Integrated Cell-type-specific Regulon Inference Server from Single-cell RNA-Seq), to solve this problem by the integration of data preprocessing, cell type prediction, gene module identification, and cis-regulatory motif analyses. Compared with other packages, IRIS3 predicts more efficiently and provides more accurate regulon from scRNA-seq data. These CTS-Rs can substantially improve the elucidation of heterogeneous regulatory mechanisms among various cell types and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. Also presented in this thesis is DESSO (DEep Sequence and Shape mOtif (DESSO), using deep neural networks and the binomial distribution model to identify DNA motifs, DESSO outperformed existing tools, including DeepBind, in 690 human ENCODE ChIP-Sequencing datasets. DESSO also further expanded motif identification power by integrating the detection of DNA shape features.

Library of Congress Subject Headings

DNA -- Analysis.
Nucleotide sequence -- Identification.
Genetic regulation.



Number of Pages



South Dakota State University



Rights Statement

In Copyright