De novo heterozygous assembly is an ongoing challenge requiring improved assembly approaches. In this study, three strategies were used to develop de novo Vitis vinifera ‘Sultanina’ genome assemblies for comparison with the inbred V. vinifera (PN40024 12X.v2) reference genome and a published Sultanina ALLPATHS-LG assembly (AP). The strategies were: 1) a default PLATANUS assembly (PLAT_d) for direct comparison with AP assembly, 2) an iterative merging strategy using METASSEMBLER to combine PLAT_d and AP assemblies (MERGE) and 3) PLATANUS parameter modifications plus GapCloser (PLAT*_GC).
The three new assemblies were greater in size than the AP assembly. PLAT*_GC had the greatest number of scaffolds aligning with a minimum of 95% identity and ≥1000 bp alignment length to V. vinifera (PN40024 12X.v2) reference genome. SNP analysis also identified additional high quality SNPs. A greater number of sequence reads mapped back with zero-mismatch to the PLAT_d, MERGE, and PLAT*_GC (>94%) than was found in the AP assembly (87%) indicating a greater fidelity to the original sequence data in the new assemblies than in AP assembly. A de novo gene prediction conducted using seedless RNA-seq data predicted > 30,000 coding sequences for the three new de novo assemblies, with the greatest number (30,544) in PLAT*_GC and only 26,515 for the AP assembly. Transcription factor analysis indicated good family coverage, but some genes found in the VCOST.v3 annotation were not identified in any of the de novo assemblies, particularly some from the MYB and ERF families.
The PLAT_d and PLAT*_GC had a greater number of synteny blocks with the V. vinifera (PN40024 12X.v2) reference genome than AP or MERGE. PLAT*_GC provided the most contiguous assembly with only 1.2% scaffold N, in contrast to AP (10.7% N), PLAT_d (6.6% N) and Merge (6.4% N). A PLAT*_GC pseudo-chromosome assembly with chromosome alignment to the reference genome V. vinifera, (PN40024 12X.v2) provides new information for use in seedless grape genetic mapping studies. An annotated de novo gene prediction for the PLAT*_GC assembly, aligned with VitisNet pathways provides new seedless grapevine specific transcriptomic resource that has excellent fidelity with the seedless short read sequence data.
DOI of Published Version
Copyright © 2018 the Author(s)
Patel, S., Lu, Z., Jin, X. et al. BMC Genomics (2018) 19: 57. https://doi.org/10.1186/s12864-018-4434-2
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Assembly statistics for four Vitis vinifera ‘Sultanina’ de novo assemblies. a: Assembly statistics for four Vitis vinifera ‘Sultanina’ de novo assemblies. All assemblies evaluated using Assemblathon metrics and scaffold size limited to 1 kbp. b: Assembly statistics for PLAT*_GC assembly steps. All assemblies were evaluated using Assemblathon metrics and scaffold size limited to 1kbp. c: Assembly statistics for four Vitis vinifera ‘Sultanina’ de novo assemblies. All full assemblies (scaffold ≥ 500 nt) were evaluated using Assemblathon metrics. (XLSX 21 kb)
12864_2018_4434_MOESM2_ESM.jpg (107 kB)
Figure S1. Protein alignment with V. vinifera (PN40024 12X.v2, VCOSTv.3 proteins. a. Orthologous proteins for all seedless grape assemblies in relation to the V. vinifera VCOST.v3 (V. vinifera V3). b. Comparison of AP with the three de novo seedless assmemblies. (JPEG 107 kb)
12864_2018_4434_MOESM3_ESM.xlsx (11 kB)
Functional characterization of predicted genes for the four assemblies using Blast2GO, BLASTX and BLASTP. (XLSX 11 kb)
12864_2018_4434_MOESM4_ESM.xlsx (15 kB)
Plant transcription factor identification for all four assemblies and V. vinifera (VCOST.v3) using PlantTFDB. (XLSX 15 kb)