Skip to content

GSoD 2021 Meeting Minutes

Oriol Abril-Pla edited this page Jan 6, 2022 · 16 revisions

Thursday, January 6th

  • Update and polish website before 4.0 release
    • There are TODOs written down and visible in the website that need to be addressed
    • The API docs need to be fixed and reviewed to ensure they are complete. See issue
      • NOTE: the priority now is updating the layout and structure of the api pages, if the docstrings need work too this will come later.
      • Meenal will take care of some modules
    • We need to list the most relevant tasks as issues and add them to the milestones to attract volunteers and indicate they are blockers
  • pymc-examples
    • We need a call for reviewers. Will try that on the project meeting on January 7th
    • Sayam will tackle a notebook to update it to v4 and write a "update a notebook" to v4 guide

Thursday, November 25th

  • Announcement: Now that GSoD is over, Martina will take December off. She will create TODOs for the Documentation team and coordinate with Oriol to close any remaining tasks.

  • New website updates and TODOs

    • Martina did an overview of the new website style and structure
    • ToDos will be created during the last days of November, priority will be specified so the Docs team can continue the work.
      • Lorenzo is available to work on Examples or other small tasks. He can pick tasks from the ToDos pool.
  • PyMC examples repo

    • Who is reviewing?
      • Improve speed of reviews and merging of PRs
      • Particularly, we need more people reviewing PyMC examples
      • We could have “specialized” reviewers, focusing on different aspects of PRs
  • PyMC examples still not clear on the level: might have advanced sections mixed with sections that are very basic

    • Use categories to classify material according to:
      • Level (basic/intermediate/advanced)
      • Diataxis type of material (tutorial, reference..)
  • Oriol will work with Eric Ma to get autoexecution of docs (approximately once a week for versioned docs, and once a month for pymc-examples)

  • TODOs: https://app.clickup.com/t/1mew60u

    • Reduce front page width
  • Use includes in the learning section

  • Use more toctrees in the learning section, to display the learning levels

  • Next lab meeting: request pymc version 4 devs to review the api documentation and check everything is correct and nothing is missing (we asked Larry to bring this request to the lab meeting).

  • Add section to Contributing similar to this one https://arviz--1903.org.readthedocs.build/en/1903/contributing/content_structure.html

Thursday, October 14

Agenda

  • Prepare for PyData sprint
  • New website updates
  • Questions for devs regarding beta
    • What changes should we do to the API documentation?
    • Coordinating timeline until release is out

Notes

Documentation team needs to coordinate with Developers team to get materials ready for PyMC 4 release. Milestones are:

  • Before beta release comes out
    • Migration guide
    • installation guides
    • API docs
  • Before alpha comes out:
    • updating as many notebooks to PyMC 4

Current time estimation: 1 month till beta 1 release (development milestones: https://github.com/pymc-devs/pymc/milestones)

Beta release announcement: https://hackmd.io/I5F8t9swRqKPfGQPYsR_cg?both

Migration guide

Ravin has been doing work on the "migration guide" (more of a renaming guide actually https://hackmd.io/oB_ahfcIRKegO6hR0OW8eg)

API documentation

https://hackmd.io/7_BJXPrDT1ShtDMxghGfHA

  • This “release notes” hackMD will be the main source of truth for the docs team regarding api changes. (documenting in github would cause conflicts due to constant updates)
  • Everything in these release notes should be documented in the migration guide.
  • Everything that is called out as a change in the release notes and migration guide needs to appear in the API documentation. For example if there's a new function, or a function that works differently from before, it needs to show up in the API docs.
  • assumption: we make 3.11.5 release and these release notes call out what changed between that and 4.0 beta.
  • all breaking changes need to be called out. Breaking changes can be reduced by backporting functionality and adding compatibility so things can be moved from unexpected breaking changes to expected breaks.

Martina tasks:

  • break down work until completion of the docs
  • create issues
  • choose issues for the pydata sprint
  • Solve requirements issues

requirements-tests are related to CI, this was breaking a bit? documentation doesn't touch those requirements-devs see if there's anything that needs to be fixed here.

Documentation team

Olga: working on quickstart guide with realistic example (lake water volume prediction). Meenal: working on solving issues with decorator documentation using Sphinx. Could use some help.

Also preparing Data umbrella sprint for people who are new to open source contributing. It's mostly dev focused but we could add some documentation contribution tips. One session a week, probably 3 or 4 sessions, an hour each. First session tentatively planned on 13th november, it will be a pre-session, introduction to python basics. Clear and accessible documentation would be important to help participants understand the basics. Oriol will probably host one session about sphinx and documentation. He's putting together a read the docs site for the occasion.

Martina: hosting October 30 pydata sprint focused on documentation. Coreteam dev help appreciated!

Docstrings: devs need guidelines on how to format them for compatibility with docs, i.e. single quotes or double quotes? Answer: use double quotes to wrap the docstrings. Format can be added following reStructuredText and Sphinx markup syntax.

September 9, 2021

Agenda

  • Review progress of last two weeks
    • Blockers?
  • Discuss documentation team goals, see also https://app.clickup.com/t/nnduaf and subtasks
  • Discuss the notebook style guide:
    • Recent changes
    • Future changes
      • Notation guidance: https://github.com/pymc-devs/pymc3/issues/4820 and https://github.com/pymc-devs/pymc3/issues/4821. Martina already did some updates to the style guide related to that which may be enough to close the first issue, the second needs some work. Some questions that could help the discussion I think (Oriol)
        • Do we consider “weights” or “intercept” too long to be variable names?
        • Would it have helped you if different notebooks used the same names for key objects/variables?
        • Do we maybe want to provide some suggestions for common things to be used only if the writer doesn’t have a better variable name? i.e. whenever we gather # observations synthetically or without any order (known to the reader), I always use obs_id for this dimension.
      • Authorship attribution: https://github.com/pymc-devs/pymc-examples/issues/198. Some of you have authored notebooks while others have made changes to notebooks of varying depth so I think it is a good group to find some solution that considers everyone. Also related to the next point about the frontmatter.
      • “Frontmatter”. https://github.com/pymc-devs/pymc-examples/issues/200. We have already added the title+post directive rule, but I (Oriol) feel there is more information that would be good for all notebooks to have:
        • dependencies that are not in pymc3 requirements (seaaborn, bambi…),
          • One simple solution for extra dependencies - conda/pip install at the start of notebooks
        • binder/colab badges (maybe we could automate that 🤔 )
        • assumed knowledge/”required” reading, authors (see above),
        • or maybe only something as simple as adding a myst anchor at the beginning of each notebook.
        • We could also add a common preamble as a dropdown with instructions on how to run the notebooks i begginer level notebooks. This is technically easy as we could simply make users add a single line with the include directive and have a markdown file with the dropdown content.
      • Writing guidance: https://github.com/pymc-devs/pymc-examples/issues/199. I (Oriol) never know how should I write, which person or voice for example, how to structure code, markdown and comments (i.e. I find very unpleasing to have a code cell with 2 lines of comment and 1 line of code, why aren’t the comments text in a preceding markdown cell?). I also think it would be good to have some links to writing resources and antipatterns/discouraged language. Adding captions to images that get automatically added as alt text in the website.

Notes

  • New contributors updates:
    • Olga and Raúl working on some Glossary terms, Lorenzo focused contribution on notebooks.
    • Another external contributor started working on terms after issue regarding Glossary terms was publicly posted on github.
    • TODO: Merge https://github.com/pymc-devs/pymc-examples/issues/222 and https://github.com/pymc-devs/pymc3/issues/4899
    • Idea: add “hover” feature so that, for terms that are linked to the Glossary, a mouse hover will show the definition.
    • What is a good enough definition? Hard to get right the first time, some back and forth is expected as part of the process. Should be brief but sufficient. We can link to external resources for further reading. Wikipedia is not always useful. We have books in our documentation that may be of use.
  • Discuss documentation team goals, see also https://app.clickup.com/t/nnduaf and subtasks
    • Docs team governance overview
    • How do we measure success of the docs team?
      • Page views? Not necessarily related to quality of content (could spike, for example, due to a new release coming out)
      • Reduction of the amount of questions in Discourse that are answered by a link to a relevant page already in the documentation → would mean that users are finding what they need.
    • Documentation PR guidelines
      • API Style guide is already provided by Numpy docs
      • Notebook Style guide should be moved out of wiki and into main
      • We need to do a bit of promoting of the style guide so that all reviewers are aware of it and it's actually serving its purpose.
  • Discuss the notebook style guide:
    • Recent changes
    • Future changes
      • Notation guidance: https://github.com/pymc-devs/pymc3/issues/4820 and https://github.com/pymc-devs/pymc3/issues/4821.
        • Further conversation needs to happen on notation conventions.
      • Authorship attribution: https://github.com/pymc-devs/pymc-examples/issues/198.
        • Better at the end than at the beginning, to allow for all contributors to be mentioned without distracting the reader. Could be list or table format.
        • TODO: track and add authorship attribution of repos that don't have it by looking at PR history.
      • “Frontmatter”. https://github.com/pymc-devs/pymc-examples/issues/200. We have already added the title+post directive rule, but I (Oriol) feel there is more information that would be good for all notebooks to have:
        • dependencies that are not in pymc3 requirements (seaaborn, bambi…),
          • One simple solution for extra dependencies - conda/pip install at the start of notebooks
          • There is only a small number of dependencies that are outside requirements
        • binder/colab badges (maybe we could automate that 🤔 )
          • probably more work to automate than to do manually
        • assumed knowledge/”required” reading, authors (see above),
          • helpful for beginner notebooks
        • or maybe only something as simple as adding a myst anchor at the beginning of each notebook.
        • We could also add a common preamble as a dropdown with instructions on how to run the notebooks i begginer level notebooks. This is technically easy as we could simply make users add a single line with the include directive and have a markdown file with the dropdown content.
        • is it possible to add doi? zenodo is apparently already integrated?
      • Writing guidance: https://github.com/pymc-devs/pymc-examples/issues/199.
        • Research accessibility/inclusivity best practices
        • Addressing reader in first person is a widespread convention
        • There seems to be consensus on not crowding code cells with comments, and favor use of markdown cells for text

Friday, July 30

Agenda

Friday, June 18

Agenda

  • Documentation rendering and hosting

    • Migration to Read the Docs
    • Explain changes that are being introduced. More detail
    • Keep building the examples and the docs together or separate them? This affects the configuration of ablog and the use of javascript.
      • Pros and cons: do we want to version notebooks like we version docs?
      • Proposal: move away from javascript in the pymc-example and use pure rST/MyST with sphinx_panels
    • Versioned docs
      • Estimating the amount of work involved and the availability to do it.
      • Potential obstacles:
        • Configuration can take a long time to get right,
        • Configuring custom URL
        • Using RTD server (Aesara?)
        • Unknown unknowns
    • Search (new version should fix this with no additional work, but let's make sure we have that)
  • Overall structure of the docs

    • Cementing the structure tree
    • Tags and categories
      • Tags: topics (i.e. Linear regression, A/B testing)
      • Categories: levels (beginner, intermediate, advanced)
    • Getting started
    • Style guide
  • Feedback

    • Reviewing options

After-meeting Notes

Developmnent tasks

Oriol will focus on finishing the changes he already started, while Ravin does a quick test in the next couple of weeks to try to see exactly how difficult it might be to set up Read the Docs in order to have versioned docs. We can get in touch with Read the Docs devs who are willing to lend a hand.

Regarding this item:

Keep building the examples and the docs together or separate them?

Building them together is unnecessary since they don't need to be versioned and some take a very long time to build (up to three days).

The search bar is working fine now.

Non-development tasks

We agreed on using categories to label the different levels of documentation and tags to label the topics.

We're playing around with some ideas for site redesign.

  • Sections: Home, Installation, Learn, API, Developers, Community
    • Home: includes more information on what is PyMC3 and why use it, sponsors, marketing (PyMC for enterprise), governance (if updated), info on the difference between PyMC3 and PyMC3 V4.
    • Installation: info on the home page is broken and could be more detailed (i.e. troubleshooting)
    • API: nothing changes, we keep the automatically generated API documentation.
    • Learn: contains getting started and the entire tree structure for users.
    • Developers: contains the entire tree structure for developers
    • Community: Discourse, conferences, meetups, community guidelines.
  • The current about section can be deleted and its content placed in the other sections.
  • A footer will be visible in every page with links to find help (like Discourse) and socials.

We're not yet focused on this, but when we get to plan out the developer branch of documentation, we need to make sure people understand how to become PyMC3 developers so that more people feel welcome to join.

Idea: create a group focused on documentation. There are people that have been doing many contributions who are not part of the PyMC core team. We could start by inviting them. Martina will write a proposal for this.

Guidelines for deprecating notebooks (since it's hard to maintain them all): we can check if nobody uses them using Google Analytics. We should not have redundant notebooks.

Martina will write the style guide for notebooks over the weekend so that Abhipsha can use those guidelines in the work she's doing. The style guide will be included in the PR template.

Abhipsha will call out any duplicate/obsolete notebook she sees and flag it for deprecation, and also check out opportunities for tagging and categorizing notebooks by level of difficulty.

To think about: Some case studies that are intended to showcase the power of pymc and are useful in different ways at all levels - where's the best place to put them?


Friday May 21, 2021

These are the topics that were discussed during the first meeting, plus some notes taken after the meeting.

Sources

These are the sources I’m looking at. Am I missing anything?

  • docs.pymc.io
  • videos+books (linked in the docs)
  • GitHub readmes
  • Discourse

Integration

Integration of the existing standalone content into learner-focused guides that link components to one another in order to help users make sound decisions regarding the use of the software.

Step 1 for everyone who is getting started with PyMC3 is currently this quickstart guide.

  • Is there anything you would add/do differently to that guide?

I envision a tree where the trunk is the Step 1 guide (everybody starts there) and Step 2 might not be the same for everyone since they have different goals. As people make use of more specific techniques, their paths branch out further.

  • How can we help beginners progress from step 1 to step 2, assisting them in choosing the right path? How many “branches” would you say we need to document? Is there existing information we can refactor?

Revision

Revision of the tutorials and guides to reflect the important changes to the library that are currently underway, and to give them consistency, in terms of notation and language.

  • The first part might overlap partially/completely with the scope of Abhipsha’s work. How should we proceed?
  • As for the second part (consistency in terms of notation and language), are there specific examples that come to mind?

Expansion

  • Do you think a brief introduction to basic concepts would be helpful? These could include:
    • What is Probabilistic Programming?
    • What is MCMC?
    • What are variational inference algorithms?
    • What is a Bayesian model?
    • What is ArviZ?
  • What other concepts do beginners tend to need in order to use PyMC3?

Developer guide

As the current developer documentation consists of an API reference and a single developer guide notebook, we also view this as an opportunity to make PyMC3’s developer resources more robust, with the ultimate goal of attracting and retaining more contributors, and allowing all users the opportunity to better understand the underlying implementation of their favorite probabilistic programming methods.

  • Who should I interact with on this topic?

After-meeting Notes

All notes regarding the docs will be stored in the GSoD wiki

https://github.com/pymc-devs/pymc3/wiki/Season-of-Docs-2021-Proposal

Integration

Right now, creating a good structure is the main goal. Once the structure is created, people will fill in the gaps as they go. We just need to create the proper space for it to happen.

We can get inspiration from the Scikit-learn model in order to structure the documentation.

There are currently two starting points for beginners: the Quickstart guide (big button in the frontpage) and the Getting Started guide (which is the one that recieves more visits, maybe because it's linked in the paper). We agreed to unify both. That will be step 1 for beginners.

As for Step 2 (the branches of the tree), I (Martina) will look into the Discourse forum and see what users try to do after getting started, so we can orient them better. I also encourage PyMC3 collaborators to try and think what "categories” we can divide users into, according to their needs.

Revision

  • Revision of the tutorials and guides to reflect the important changes to the library that are currently underway
    • Abhipsha is in charge of this task and we will communicate as we move forwards to see where we need to collaborate.
  • Consistency in terms of notation and language
    • Most examples were created by mathematicians/statisticians (lots of equations) or computer scientists (lots of code)(*). For the most advanced notebooks, this is fine, because advanced users will know how to interpret them. Notebooks intended for beginners should be friendlier and more careful when introducing technical terms (maybe provide a quick definition or link to a useful explanation).
    • (*) Some of the notebooks produced by the latter are just code and markdown text, it would be neater to have plain text instead of markdown, but that's not something we need to solve right away, we can create an issue for the time being).

###Expansion

  • There are blog posts by Oriol, Thomas Wiecki, Ravin and Colin that provide explanations for many topics that might be helpful to link or work into the documentation. The same goes for some books, for example Bayesian Methods for Hackers. I will see if the open source policy of these books allows us to use examples or paragraphs in PyMC3's docs. Videos could be written down if we need.
  • A proper place needs to be created for people to add developer documentation. This is not only aimed at advanced users, anyone should be able to contribute according to their skills.
  • There's a a plan to move documentation to https://readthedocs.org/. If this happens, there will be “multi-version” documentation that is updated immediately, since it's not version-dependent, and version-controlled documentation. For the time being, this is not 100% certain.

Next steps

  • Go through the Discourse and Github issues to find common questions from beginners, and try to map out the tree structure we discussed. This will be a first draft that will most likely need multiple iterations before we're happy with it.
  • Research the open source policies of the books to see whether we can quote big chunks of them in the docs if we need to.
  • Scan the information contained in the blogs created by the community