-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
GSoD 2021 Meeting Minutes
- Review progress of last two weeks
- Blockers?
- Discuss documentation team goals, see also https://app.clickup.com/t/nnduaf and subtasks
- Discuss the notebook style guide:
- Recent changes
- tags and categories: list of tags and categories we're using https://github.com/pymc-devs/pymc3/wiki/Categories-and-Tags-for-PyMC-Examples
- bibliography: https://github.com/pymc-devs/pymc-examples/issues/203
- Future changes
- Notation guidance: https://github.com/pymc-devs/pymc3/issues/4820 and https://github.com/pymc-devs/pymc3/issues/4821. Martina already did some updates to the style guide related to that which may be enough to close the first issue, the second needs some work. Some questions that could help the discussion I think (Oriol)
- Do we consider “weights” or “intercept” too long to be variable names?
- Would it have helped you if different notebooks used the same names for key objects/variables?
- Do we maybe want to provide some suggestions for common things to be used only if the writer doesn’t have a better variable name? i.e. whenever we gather # observations synthetically or without any order (known to the reader), I always use
obs_id
for this dimension.
- Authorship attribution: https://github.com/pymc-devs/pymc-examples/issues/198. Some of you have authored notebooks while others have made changes to notebooks of varying depth so I think it is a good group to find some solution that considers everyone. Also related to the next point about the frontmatter.
- “Frontmatter”. https://github.com/pymc-devs/pymc-examples/issues/200. We have already added the title+post directive rule, but I (Oriol) feel there is more information that would be good for all notebooks to have:
- dependencies that are not in pymc3 requirements (seaaborn, bambi…),
- One simple solution for extra dependencies -
conda/pip install
at the start of notebooks
- One simple solution for extra dependencies -
- binder/colab badges (maybe we could automate that 🤔 )
- assumed knowledge/”required” reading, authors (see above),
- or maybe only something as simple as adding a myst anchor at the beginning of each notebook.
- We could also add a common preamble as a dropdown with instructions on how to run the notebooks i begginer level notebooks. This is technically easy as we could simply make users add a single line with the include directive and have a markdown file with the dropdown content.
- dependencies that are not in pymc3 requirements (seaaborn, bambi…),
- Writing guidance: https://github.com/pymc-devs/pymc-examples/issues/199. I (Oriol) never know how should I write, which person or voice for example, how to structure code, markdown and comments (i.e. I find very unpleasing to have a code cell with 2 lines of comment and 1 line of code, why aren’t the comments text in a preceding markdown cell?). I also think it would be good to have some links to writing resources and antipatterns/discouraged language. Adding captions to images that get automatically added as alt text in the website.
- Notation guidance: https://github.com/pymc-devs/pymc3/issues/4820 and https://github.com/pymc-devs/pymc3/issues/4821. Martina already did some updates to the style guide related to that which may be enough to close the first issue, the second needs some work. Some questions that could help the discussion I think (Oriol)
- Recent changes
- New contributors updates:
- Olga and Raúl working on some Glossary terms, Lorenzo focused contribution on notebooks.
- Another external contributor started working on terms after issue regarding Glossary terms was publicly posted on github.
- TODO: Merge https://github.com/pymc-devs/pymc-examples/issues/222 and https://github.com/pymc-devs/pymc3/issues/4899
- Idea: add “hover” feature so that, for terms that are linked to the Glossary, a mouse hover will show the definition.
- What is a good enough definition? Hard to get right the first time, some back and forth is expected as part of the process. Should be brief but sufficient. We can link to external resources for further reading. Wikipedia is not always useful. We have books in our documentation that may be of use.
- Discuss documentation team goals, see also https://app.clickup.com/t/nnduaf and subtasks
- Docs team governance overview
- How do we measure success of the docs team?
- Page views? Not necessarily related to quality of content (could spike, for example, due to a new release coming out)
- Reduction of the amount of questions in Discourse that are answered by a link to a relevant page already in the documentation → would mean that users are finding what they need.
- Documentation PR guidelines
- API Style guide is already provided by Numpy docs
- Notebook Style guide should be moved out of wiki and into main
- We need to do a bit of promoting of the style guide so that all reviewers are aware of it and it's actually serving its purpose.
- Discuss the notebook style guide:
- Recent changes
- tags and categories: list of tags and categories we're using https://github.com/pymc-devs/pymc3/wiki/Categories-and-Tags-for-PyMC-Examples
- bibliography: https://github.com/pymc-devs/pymc-examples/issues/203
- Future changes
- Notation guidance: https://github.com/pymc-devs/pymc3/issues/4820 and https://github.com/pymc-devs/pymc3/issues/4821.
- Further conversation needs to happen on notation conventions.
- Authorship attribution: https://github.com/pymc-devs/pymc-examples/issues/198.
- Better at the end than at the beginning, to allow for all contributors to be mentioned without distracting the reader. Could be list or table format.
- TODO: track and add authorship attribution of repos that don't have it by looking at PR history.
- “Frontmatter”. https://github.com/pymc-devs/pymc-examples/issues/200. We have already added the title+post directive rule, but I (Oriol) feel there is more information that would be good for all notebooks to have:
- dependencies that are not in pymc3 requirements (seaaborn, bambi…),
- One simple solution for extra dependencies -
conda/pip install
at the start of notebooks - There is only a small number of dependencies that are outside requirements
- One simple solution for extra dependencies -
- binder/colab badges (maybe we could automate that 🤔 )
- probably more work to automate than to do manually
- assumed knowledge/”required” reading, authors (see above),
- helpful for beginner notebooks
- or maybe only something as simple as adding a myst anchor at the beginning of each notebook.
- We could also add a common preamble as a dropdown with instructions on how to run the notebooks i begginer level notebooks. This is technically easy as we could simply make users add a single line with the include directive and have a markdown file with the dropdown content.
- is it possible to add doi? zenodo is apparently already integrated?
- dependencies that are not in pymc3 requirements (seaaborn, bambi…),
- Writing guidance: https://github.com/pymc-devs/pymc-examples/issues/199.
- Research accessibility/inclusivity best practices
- Addressing reader in first person is a widespread convention
- There seems to be consensus on not crowding code cells with comments, and favor use of markdown cells for text
- Notation guidance: https://github.com/pymc-devs/pymc3/issues/4820 and https://github.com/pymc-devs/pymc3/issues/4821.
- Recent changes
- I’m looking at how to organize info in the new versioned docs site, This is a draft of how it could be organized. Not everything is there (I used up all the 200 elements the free version has to offer) but it’s something we can use to discuss around and visualize how we want information to be organized. https://app.moqups.com/eS8Ttyrfsp/edit/page/aa9df7b72
- Creation of a docs team. Basic operational guidelines. Steps for new people to join in.
- API documentation: How is it currently structured and can we improve that?
- Space for any updates @Abhipsha wants to make on notebooks
- Can we agree on guidelines for standardizing notation? See:
- https://github.com/pymc-devs/pymc3/issues/4820
- https://github.com/pymc-devs/pymc3/issues/4821
- https://github.com/pymc-devs/pymc-examples/issues/198
- https://github.com/pymc-devs/pymc-examples/issues/199
- https://github.com/pymc-devs/pymc-examples/issues/200
- https://github.com/pymc-devs/pymc-examples/issues/203
- Clickup task overview and next steps
-
Documentation rendering and hosting
- Migration to Read the Docs
- Explain changes that are being introduced. More detail
- Keep building the examples and the docs together or separate them? This affects the configuration of ablog and the use of javascript.
- Pros and cons: do we want to version notebooks like we version docs?
- Proposal: move away from javascript in the pymc-example and use pure rST/MyST with sphinx_panels
- Versioned docs
- Estimating the amount of work involved and the availability to do it.
- Potential obstacles:
- Configuration can take a long time to get right,
- Configuring custom URL
- Using RTD server (Aesara?)
- Unknown unknowns
- Search (new version should fix this with no additional work, but let's make sure we have that)
-
Overall structure of the docs
- Cementing the structure tree
- Tags and categories
- Tags: topics (i.e. Linear regression, A/B testing)
- Categories: levels (beginner, intermediate, advanced)
- Getting started
- Style guide
-
Feedback
- Reviewing options
Oriol will focus on finishing the changes he already started, while Ravin does a quick test in the next couple of weeks to try to see exactly how difficult it might be to set up Read the Docs in order to have versioned docs. We can get in touch with Read the Docs devs who are willing to lend a hand.
Regarding this item:
Keep building the examples and the docs together or separate them?
Building them together is unnecessary since they don't need to be versioned and some take a very long time to build (up to three days).
The search bar is working fine now.
We agreed on using categories to label the different levels of documentation and tags to label the topics.
We're playing around with some ideas for site redesign.
- Sections: Home, Installation, Learn, API, Developers, Community
- Home: includes more information on what is PyMC3 and why use it, sponsors, marketing (PyMC for enterprise), governance (if updated), info on the difference between PyMC3 and PyMC3 V4.
- Installation: info on the home page is broken and could be more detailed (i.e. troubleshooting)
- API: nothing changes, we keep the automatically generated API documentation.
- Learn: contains getting started and the entire tree structure for users.
- Developers: contains the entire tree structure for developers
- Community: Discourse, conferences, meetups, community guidelines.
- The current about section can be deleted and its content placed in the other sections.
- A footer will be visible in every page with links to find help (like Discourse) and socials.
We're not yet focused on this, but when we get to plan out the developer branch of documentation, we need to make sure people understand how to become PyMC3 developers so that more people feel welcome to join.
Idea: create a group focused on documentation. There are people that have been doing many contributions who are not part of the PyMC core team. We could start by inviting them. Martina will write a proposal for this.
Guidelines for deprecating notebooks (since it's hard to maintain them all): we can check if nobody uses them using Google Analytics. We should not have redundant notebooks.
Martina will write the style guide for notebooks over the weekend so that Abhipsha can use those guidelines in the work she's doing. The style guide will be included in the PR template.
Abhipsha will call out any duplicate/obsolete notebook she sees and flag it for deprecation, and also check out opportunities for tagging and categorizing notebooks by level of difficulty.
To think about: Some case studies that are intended to showcase the power of pymc and are useful in different ways at all levels - where's the best place to put them?
These are the topics that were discussed during the first meeting, plus some notes taken after the meeting.
These are the sources I’m looking at. Am I missing anything?
- docs.pymc.io
- videos+books (linked in the docs)
- GitHub readmes
- Discourse
Integration of the existing standalone content into learner-focused guides that link components to one another in order to help users make sound decisions regarding the use of the software.
Step 1 for everyone who is getting started with PyMC3 is currently this quickstart guide.
- Is there anything you would add/do differently to that guide?
I envision a tree where the trunk is the Step 1 guide (everybody starts there) and Step 2 might not be the same for everyone since they have different goals. As people make use of more specific techniques, their paths branch out further.
- How can we help beginners progress from step 1 to step 2, assisting them in choosing the right path? How many “branches” would you say we need to document? Is there existing information we can refactor?
Revision of the tutorials and guides to reflect the important changes to the library that are currently underway, and to give them consistency, in terms of notation and language.
- The first part might overlap partially/completely with the scope of Abhipsha’s work. How should we proceed?
- As for the second part (consistency in terms of notation and language), are there specific examples that come to mind?
- Do you think a brief introduction to basic concepts would be helpful? These could include:
- What is Probabilistic Programming?
- What is MCMC?
- What are variational inference algorithms?
- What is a Bayesian model?
- What is ArviZ?
- What other concepts do beginners tend to need in order to use PyMC3?
As the current developer documentation consists of an API reference and a single developer guide notebook, we also view this as an opportunity to make PyMC3’s developer resources more robust, with the ultimate goal of attracting and retaining more contributors, and allowing all users the opportunity to better understand the underlying implementation of their favorite probabilistic programming methods.
- Who should I interact with on this topic?
All notes regarding the docs will be stored in the GSoD wiki
https://github.com/pymc-devs/pymc3/wiki/Season-of-Docs-2021-Proposal
Right now, creating a good structure is the main goal. Once the structure is created, people will fill in the gaps as they go. We just need to create the proper space for it to happen.
We can get inspiration from the Scikit-learn model in order to structure the documentation.
There are currently two starting points for beginners: the Quickstart guide (big button in the frontpage) and the Getting Started guide (which is the one that recieves more visits, maybe because it's linked in the paper). We agreed to unify both. That will be step 1 for beginners.
As for Step 2 (the branches of the tree), I (Martina) will look into the Discourse forum and see what users try to do after getting started, so we can orient them better. I also encourage PyMC3 collaborators to try and think what "categories” we can divide users into, according to their needs.
- Revision of the tutorials and guides to reflect the important changes to the library that are currently underway
- Abhipsha is in charge of this task and we will communicate as we move forwards to see where we need to collaborate.
- Consistency in terms of notation and language
- Most examples were created by mathematicians/statisticians (lots of equations) or computer scientists (lots of code)(*). For the most advanced notebooks, this is fine, because advanced users will know how to interpret them. Notebooks intended for beginners should be friendlier and more careful when introducing technical terms (maybe provide a quick definition or link to a useful explanation).
- (*) Some of the notebooks produced by the latter are just code and markdown text, it would be neater to have plain text instead of markdown, but that's not something we need to solve right away, we can create an issue for the time being).
###Expansion
- There are blog posts by Oriol, Thomas Wiecki, Ravin and Colin that provide explanations for many topics that might be helpful to link or work into the documentation. The same goes for some books, for example Bayesian Methods for Hackers. I will see if the open source policy of these books allows us to use examples or paragraphs in PyMC3's docs. Videos could be written down if we need.
- A proper place needs to be created for people to add developer documentation. This is not only aimed at advanced users, anyone should be able to contribute according to their skills.
- There's a a plan to move documentation to https://readthedocs.org/. If this happens, there will be “multi-version” documentation that is updated immediately, since it's not version-dependent, and version-controlled documentation. For the time being, this is not 100% certain.
- Go through the Discourse and Github issues to find common questions from beginners, and try to map out the tree structure we discussed. This will be a first draft that will most likely need multiple iterations before we're happy with it.
- Research the open source policies of the books to see whether we can quote big chunks of them in the docs if we need to.
- Scan the information contained in the blogs created by the community