Reproducibility Tools

The Moore-Sloan Data Science Environment Open Science & Reproducibility Working Group at NYU has been in development of a suite of tools to simplify the process of making reproducible experiments, including:
  • ReproMatch
    • ReproMatch stands for Reproducibility Match and it was designed to help you find the tool (or tools) that best matches your reproduciblity needs. The tools in the ReproMatch catalog are classified according to different reproducibility tasks, which we organized in a taxonomy. Please see Reproducibility Tasks for a detailed description of this taxonomy.
    • You can browse all the tools here.
  • ReproZip
    • ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions. It tracks operating system calls and creates a package that contains all the binaries, files, and dependencies required to run a given command on the author’s computational environment. A reviewer can then extract the experiment in their own environment to reproduce the results, even if the environment has a different operating system from the original one. Currently, ReproZip can only pack experiments that originally run on Linux.
  • VisTrails
    • VisTrails is an open-source scientific workflow and provenance management system that provides support for simulations, data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, such as simulations, data analysis and visualization, very little is repeated---change is the norm. As an engineer or scientist generates and evaluates hypotheses about data under study, a series of different, albeit related, workflows are created while a workflow is adjusted in an interactive process. VisTrails was designed to manage these rapidly-evolving workflows.
  • noWorkflow
    • noWorkflow is a non-intrusive tool that doesn’t require researchers to change the way in which they work, but instead allows them to capture a variety of provenance information and utilize the analyses it supports, including graph-based visualization, differencing over provenance trails, and inference queries. noWorkflow was developed in Python and it currently is able to capture provenance of Python scripts using Software Engineering techniques such as abstract syntax tree (AST) analysis, reflection, and profiling, to collect provenance without the need of a version control system or any other environment.