About OpenKnowledge@NAU | For NAU Authors

Improving in silico scientific reproducibility with Provenance Replay software

Keefe, Christopher (2022) Improving in silico scientific reproducibility with Provenance Replay software. Masters thesis, Northern Arizona University.

[thumbnail of Keefe_2022_improving_silico_scientific_reproducibility_with_provenance.pdf] Text
Keefe_2022_improving_silico_scientific_reproducibility_with_provenance.pdf - Published Version
Restricted to Repository staff only

Download (2MB) | Request a copy


Bioinformatics workflows are often complex, consisting of dozens or hundreds of processes, withsignificant variation possible in computer hardware and software systems, input data, method se‐ lection and parameterization during each step. This complexity creates known challenges for study organization, reporting, and reproducibility which may prevent study replication. Here I present Provenance Replay ‐ software for the documentation and enactment of in silico reproducibility in QIIME 2, a prominent free and open platform for microbiome science. QIIME 2 packages the full history (i.e. “provenance”) of every analysis result within the result itself, including software versions, methods, parameters, and user‐provided metadata. Provenance Replay parses this captured provenance data into directed acyclic graphs and generates reproducibility documen‐ tation including full‐analysis citation lists and executable scripts capable of replicating the Result(s) in question from the original input data, providing a robust tool for methods reproducibility. These executables may also be applied directly to similarly structured data, modified, or extended, sup‐ porting results reproducibility and generalization. This reproducibility documentation can be used in the automation of repeated analyses, and has potential to reduce record‐keeping, training, and communication burdens in collaborative research contexts. Demonstrations, surveys, and focus groups were conducted with an alpha version of the soft‐ ware, targeting feature elicitation and requirements verification. In survey results, demonstration participants reported high perceived ease of use (mean PEOU 5.82 of 7) and high perceived use‐ fulness (mean PU 5.96 of 7), and a net promoter score of +78.95%. Overall, respondents report a positive general attitude toward using Provenance Replay, and a high likelihood of recommending the software to others.

Item Type: Thesis (Masters)
Publisher’s Statement: © Copyright is held by the author. Digital access to this material is made possible by the Cline Library, Northern Arizona University. Further transmission, reproduction or presentation of protected items is prohibited except with permission of the author.
Keywords: microbiome; provenance; QIIME 2; scientific reproducibility; software engineering; technology assessment model
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Q Science > QP Physiology
NAU Depositing Author Academic Status: Student
Department/Unit: Graduate College > Theses and Dissertations
College of Engineering, Informatics, and Applied Sciences > School of Informatics, Computing, and Cyber Systems
Date Deposited: 16 May 2023 22:49
Last Modified: 16 May 2023 22:49
URI: https://openknowledge.nau.edu/id/eprint/5890

Actions (login required)

IR Staff Record View IR Staff Record View


Downloads per month over past year