Reproducible Science with Python

Oct. 29, 2016, 3:45 p.m. - 4:10 p.m.

In science, results that are not reproducible by peer scientists are valueless and of no significance. Good practices for reproducible science are to publish used codes under Open Source licenses, perform code reviews, save the computational environments with containers (e.g., Docker), use open data formats, use a data management system, and record the provenance of all actions.

This talk shows how to record the provenance of code development, code execution, and data management using a standard format for provenance and accompanying Python libraries. In particular, how to gather the provenance of an development process based on Git, how to gather provenance of any Python script and of any IPython/Jupyter notebook, and how to gather provenance of a paper written in LaTeX. Finally, the talk shows how use Python to analyze and explore the provenance, which is stored in a graph database (Neo4J).

Andreas Schreiber

Data and rocket scientist at DLR, leading DLR’s Intelligent and Distributed Systems department. Python user since 1991. Chair of PyCon DE, PyHPC, and PyData Cologne.

Get our Python announcements