Document Type

Dissertation - Open Access

Award Date


Degree Name

Doctor of Philosophy (PhD)

Department / School

Mathematics and Statistics

First Advisor

Xijin Ge


Repetitive DNA elements are abundant in the genome of a wide range of organisms. In mammals, repetitive elements comprise about 40-50% of the total genomes. However, their biological functions remain largely unknown. Analysis of their abundance and distribution may shed some light on how they affect genome structure, function, and evolution. We conducted a detailed comparative analysis of repetitive DNA elements across ten different eukaryotic organisms, including chicken (G. gallus), zebrafish (D. rerio), Fugu (T. rubripes), fruit fly (D. melanogaster), and nematode worm (C. elegans), along with five mammalian organisms: human (H. sapiens), mouse (M. musculus), cow (B. taurus), rat (R. norvegicus), and rhesus (M. mulatta). Our results show that repetitive DNA content varies widely, from 7.3% in the Fugu genome to 52% in the zebrafish, based on RepeatMasker data. The most frequently observed transposable elements (TEs) in mammals are SINEs (Short Interspersed Nuclear Elements), followed by LINEs (Long Interspersed Nuclear Elements). In contrast, LINEs, DNA transposons, simple repeats, and low complexity repeats are the most frequently observed repeat classes in the chicken, zebrafish, fruit fly, and nematode worm genomes, respectively. LTRs (Long Terminal Repeats) have significant genomic coverage and diversity, which may make them suitable for regulatory roles. With the exception of the nematode worm and fruit fly, the frequency of the repetitive elements follows a log-normal distribution, characterized by a few highly prevalent repeats in each organism. In mammals, SINEs are enriched near genic regions, and LINEs are often found away from genes. We also identified many LTRs that are specifically enriched in promoter regions, some with a strong bias towards the same strand as the nearby gene. This raises the possibility that the LTRs may play a regulatory role. Surprisingly, most intronic repeats, with the exception of DNA transposons, have a strong tendency to be on the opposite DNA strand as the host gene. One possible explanation is that intronic RNAs which result from splicing may contribute to retrotransposition to the original intronic loci. Moreover, our observations of repetitive DNA elements enrichment near genic regions and, specifically, the promoter region of genes, raise the question as to whether repetitive DNA elements have a significant impact on gene expression in both human and mouse genomes. In order to investigate the impact of these repeats on gene expression, we calculate the total number of base pairs (bp) for these repeats in two different locations upstream from the genes — namely, the 2kbp and 20kbp promoter regions. In addition to that, we quantified the gene expression levels in both human and mouse tissues using RNAseq analysis. Then, we used different statistical modeling approaches to investigate the association between repetitive DNA elements and gene expression in two different promoter regions. Although most transposable elements are primarily involved in reduced gene expression, our model's results showed that Alu elements in both human and mouse are significantly associated with higher average expression in the promoter region. Furthermore, we found that the B2 in both mouse 2kbp and 20kbp and hAT.Charlie elements in the human 20kbp, are also significantly associated with up-regulated gene expression in the 2kpb promoter. In addition to Alu and B2 in 2kbp, we found that the ERV1 have a significant association with higher average expression in the 20kbp promoter in mouse tissues. We also found that L1 and Simple_repeat elements are significantly associated with lower average expression in both human and mouse tissues. Furthermore, in the human, we found that the MIR is also associated with lower average expression. The effects of Alu elements in both human and mouse are stronger at 2kbp than at 20kbp. In contrast, the L1 effect at 20kbp is stronger than at 2kbp. Our results indicate that comparative studies of repetitive DNA elements in multiple organisms can provide insights into their evolution and expansion, and lead to the elucidation of their potential functions. The non-random distribution of repeats across multiple organisms adds to the existing evidence that some repetitive DNA elements are drivers of genome evolution, rather than just “junk” DNA.

Library of Congress Subject Headings

DNA -- Analysis.
Gene expression.
Nucleotide sequence.


Includes bibliographical references (pages 161-164)



Number of Pages



South Dakota State University



Rights Statement

In Copyright