Files
Download Full Text (449 KB)
Description
Tools surrounding bioinformatics have increased data acquisition and accuracy significantly, especially with near-real time results using nanopore DNA sequencing. With large amounts of data, reproducibility is of high importance, and long workflows can become convoluted. Snakemake, built on the Common Workflow Language and Python, aims to alleviate this with readable formatting, reproducibility, and portability for any machine. Using 97 fastq files, the usability of these three traits were compared between a Bash and Snakemake workflow using a range of one to twelve threads. In every test, Snakemake was faster than Bash. At its fastest, Snakemake was 27% faster than Bash. Reproducibility of both workflows was verified using an MD5 hash of results. The hashes differed between the workflows; this may be a result of executing the workflows in two different terminal environments. Despite this, it is a valid method of validating reproducibility between tests within individual workflows. Outside speed tests, Snakemake offers quality of life features that allow it to pull ahead from Bash. Containerization of workflows using Conda is one example of this. The ability to require specific versions of software within a workflow boosts reproducibility. Additionally, portability is increased because the container can be deployed almost anywhere, and the required software can be downloaded on an as-needed basis. With readability comes maintainability. Snakemake will almost always pull ahead of Bash in this regard with its simple input, output, and shell fields. The field of Bioinformatics is moving very quickly, and it can be difficult for traditional Bash scripts to keep up in certain aspects. While Bash is paramount in the execution of some software, more powerful tools like Snakemake are required to handle the execution of an entire, complex workflow.
Publication Date
2021
Publisher
South Dakota State University
Recommended Citation
Loecker, Josh and Ewing, Patrick, "Benefits of the Snakemake Workflow Management Software in Comparison to Traditional Programming (Paper)" (2021). Honors Capstone Projects. 8.
https://openprairie.sdstate.edu/honors_isp/8