Skip to content

Season of Docs 2021 Proposal

Chris Fonnesbeck edited this page Mar 26, 2021 · 4 revisions

Bringing Probabilistic Programming to the Masses: Fundamental Updates to PyMC3's Teaching Resources

About PyMC3

PyMC3 (Salvatier, Wiecki, & Fonnesbeck, 2016) is a prominent open source probabilistic programming framework for Python, having been forked over 1,300 times and starred over 5,400 times on GitHub. It provides a high-level user API for specifying and fitting Bayesian models that allows applied users to focus on solving the their particular scientific problem, rather than on the mechanics of Bayesian statistical computation. Being a general-purpose probabilistic programming library, PyMC3 is used across a range of scientific domains in both industry and academic settings. Given that Python is now the dominant scientific programming language, and that PyMC3 is designed in the style and philosophy of Python (i.e. Pythonic), it lowers barriers to entry for using Bayesian modeling by offering familiar data structures, conventions and execution environments, while still providing state-of-the-art inference methods. PyMC3 makes an impact largely because it allows scientists of varying backgrounds to apply Bayesian methods to their data without having to be either expert programmers or statisticians.

About the project

The current documentation for PyMC3 is largely the product of a slow accumulation of high-quality examples and case studies of how the software can be used. The advent of Jupyter notebooks over the past 10 years has made it extremely easy for developers and users to summarize and present data science applications. Thus, the project has expediently constructed a very large corpus of notebooks that serves as the primary documentation. While these notebooks are generally of high quality and cover most of PyMC's important functionality, the result is a somewhat ad hoc set of documentation that is lacking in consistency and cohesion. We seek documentation that is better integrated, and designed to satisfy a range of learning goals, in addition to being a general-purpose reference for users.

A large section of the PyMC3 current (and hopefully future) user base consists of scientists, engineers, and other applied users who are looking to extract inference from their data, using the most effective tools available to them. Our project prides itself on making relatively advanced analytical methods available to a diverse set of users, primarily due to an intuitive, expressive, and powerful software API that makes model building and fitting as simple as possible. We believe that having effective documentation will dramatically increase the adoption of PyMC3 by new users, and allow all users to maximize the utility of the software by teaching them to use it optimally.

Season of Docs Project Scope

We are confident that the 2021 Season of Docs will fundamentally improve the usability and effectiveness of PyMC3's documentation. There are three main aspects to the project that will help to realize this goal:

  • Integration of the existing standalone content into learner-focused guides that link components to one another in order to help users make sound decisions regarding the use of the software.
  • Revision of the tutorials and guides to reflect the important changes to the library that are currently underway, and to give them consistency, in terms of notation and language.
  • Expansion of the scope of the documentation in two ways: 1) toward helping users better understand the methods that PyMC3 offers, including how and when to use them effectively, and 2) supporting developers by properly documenting the underlying software architecture.

Thus, we are not looking for a complete re-write of the PyMC3 documentation, but rather a re-application of the existing material, with augmentation in the identified areas of weakness.

We anticipate this project to require 3-4 months to complete.

Measuring Project Success

The effectiveness of Season of Docs updates to the PyMC3 documentation will be measured qualitatively and quantitatively. Qualitative information will include:

  • questions and discussion on the PyMC3 Discourse page related to software functionality not previously covered by the documentation.
  • referrals of Discourse questions to sections of the improved documentation.
  • feedback from users and contributors that directly references the documentation updates.

Quantitative measures of the impact of the new documentation will include:

  • change in number of visits to the documentation website year over year, as measured by Google Analytics.
  • change in the number of downloads and forks of the project on GitHib, relative to the change in past years.
  • change in the number of active participants on the PyMC3 Discourse site, relative to that in past years.

Clearly, each of these are noisy, indirect measurements of the success of any new documentation, as it is confounded with any number of other factors that may influence each of these metrics. However, given the demonstrated importance of documentation on the success of other projects, we feel confident that some signal will be correlated with a successful (or unsuccessful, we hope not) rollout of an upgraded set of documentation. Thus, in this sense, we are even evaluating quantitative information qualitatively.

Project Budget

Most of the budget is dedicated to compensating the writer that will be hired to execute this project. We will also have two mentors from the PyMC3 core dev team to provide the writer with information, guidance and other resources to maximize their productivity during the course of the project. We would also like to recognize external contributors that we anticipate will be assisting informally as the documentation revision takes place; for this we will give them coffee mugs inscribed with the PyMC3 logo.

Item Cost Cumulative Notes
Writer Compensation $8000 $8000
2 Mentors $1000 $9000
5 PyMC3 Coffee Mugs $120 $9120 Appreciation for active documentation contributors