-
Notifications
You must be signed in to change notification settings - Fork 6
Documentation revamp #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Documentation revamp #232
Conversation
Probably is worth to add Experiment analysis scorecards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @luizhsuperti , thanks for the PR! I've had a quick read, let me know what you think
README.md
Outdated
- 🏢 **Cluster randomization** | ||
- 🔄 **Switchback experiments** | ||
|
||
### 🛠 **Data Preprocessing** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this section, I don't think the pandas integration is very relevant nor there are tools for data preparation in the lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 1st comment
README.md
Outdated
|
||
### 📊 **Comprehensive Experiment Analysis** | ||
##### **✅ Metrics** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop the metrics one for now since it looks like we have a bug (see last issue)
README.md
Outdated
- 📌 **Generalized Estimating Equations (GEE)** | ||
- 📌 **Mixed Linear Models** for robust inference | ||
- 📌 **Ordinary Least Squares (OLS)** and **Clustered OLS** with covariates | ||
- 📌 **T-tests** with variance reduction techniques (**CUPED, CUPAC**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd merge this and the one above, and not mention t-tests since mostly its OLS with covariates, cuped, cupac
📦 **Installation:** | ||
```sh | ||
pip install cluster-experiments | ||
======= | ||
# MDE calculation | ||
mde = npw.mde(df, power=0.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the MDE example, I have to asks: needs to be reproducible (so dataframe needs to be created), and show the methods power_analysis, mde, power_line and mde_line. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely what we should do. I'm thinking that, for a 1st time user, we should show the MDE calculation process and scorecard. wdyt?
The other question is where this example should be, in Readme (Home) or quickstart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely what we should do. I'm thinking that, for a 1st time user, we should show the MDE calculation process and scorecard. wdyt?
yes! in the simplest set-up but yes.
The other question is where this example should be, in Readme (Home) or quickstart.
I really think there's value in a reproducible hello-world example in what all the users see, which is the readme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the variance reduction example can go to quickstart instead of readme
``` | ||
|
||
!!! info "Python Version Support" | ||
**Cluster Experiments** requires **Python 3.9 or higher**. Make sure your environment meets this requirement before proceeding with the installation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's 3.8 I think
README.md
Outdated
## Quick Start | ||
|
||
### Power Analysis Example | ||
**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows, designed for seamless integration with Pandas in production environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
designed for seamless integration with Pandas in production environments.
I'd remove any production mention, I don't think it's fair to call this production. "seamless integration" sounds generated by an LLM, do you have a more natural equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Finally had time to return. Thank you for the feedback! I'm adding the answers of the other suggestions you made in the Readme file. I modified the following as suggested:
i.I added at the beginning the simulation based / normal approximation support.
ii.For support of complex experimental designs, I kept as it is; I'd keep Variance Reduction Techniques under statistical methods of analysis, to clarify the diff between experimental design and how we evaluate effects.
iii. Deleted t-tests
iv. Deleted metrics
v. Added Scorecards feature, it's a neat feature that it's definitely a plus
vi. Removed pre-processing
vii. Droped the why used it? section (How did you know it was LLM adjusted? :D )
- the readme section should be already clear with the examples
It's been a while so the pkg got updated let me know of additional changes
Designing and analyzing experiments can feel overwhelming at times. After formulating a testable hypothesis, | ||
you're faced with a series of routine tasks. From collecting and transforming raw data to measuring the statistical significance of your experiment results and constructing confidence intervals, | ||
it can quickly become a repetitive and error-prone process. | ||
*Cluster Experiments* is here to change that. Built on top of well-known packages like `pandas`, `numpy`, `scipy` and `statsmodels`, it automates the core steps of an experiment, streamlining your workflow, saving you time and effort, while maintaining statistical rigor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make the paragraph shorter and stress what it automates: being MDE/power calculation and inference scorecards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the next examples, I think it's worth mentioning that you're describing the simulaiton-based power analysis, and there are other pipelines like power analysis based on normal approximation and scorecard generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the explanation style, maybe you could write a similar thing for NormalPowerAnalysis and AnalysisPlan
```python | ||
from cluster_experiments import TTestClusteredAnalysis | ||
|
||
analysis = TTestClusteredAnalysis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use ClusteredOLS, I think this analysis method is a bit weird
Hey @luizhsuperti, I was playing with this and found an issue, all examples are under switchback, whenever you can have a look please :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Added comments on the suggestions in the Review doc. After that I'm jumping in the QuickStart and finally in the Examples we should have in the pkg
README.md
Outdated
## Quick Start | ||
|
||
### Power Analysis Example | ||
**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows, designed for seamless integration with Pandas in production environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Finally had time to return. Thank you for the feedback! I'm adding the answers of the other suggestions you made in the Readme file. I modified the following as suggested:
i.I added at the beginning the simulation based / normal approximation support.
ii.For support of complex experimental designs, I kept as it is; I'd keep Variance Reduction Techniques under statistical methods of analysis, to clarify the diff between experimental design and how we evaluate effects.
iii. Deleted t-tests
iv. Deleted metrics
v. Added Scorecards feature, it's a neat feature that it's definitely a plus
vi. Removed pre-processing
vii. Droped the why used it? section (How did you know it was LLM adjusted? :D )
- the readme section should be already clear with the examples
It's been a while so the pkg got updated let me know of additional changes
README.md
Outdated
- 🏢 **Cluster randomization** | ||
- 🔄 **Switchback experiments** | ||
|
||
### 🛠 **Data Preprocessing** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 1st comment
📦 **Installation:** | ||
```sh | ||
pip install cluster-experiments | ||
======= | ||
# MDE calculation | ||
mde = npw.mde(df, power=0.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely what we should do. I'm thinking that, for a 1st time user, we should show the MDE calculation process and scorecard. wdyt?
The other question is where this example should be, in Readme (Home) or quickstart.
thanks @luizhsuperti , I'll have a look later! be careful with the conflict of mkdocs.yml, you'll need to resolve them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked for a general iteration. Keep in mind about the conflicts with main branch!
power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3]) | ||
### 📌 Experiment Design & Planning | ||
- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation | ||
- **Normal Approximation (CLT-based)**: Fast, analytical formulas assuming approximate normality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Normal Approximation (CLT-based)**: Fast, analytical formulas assuming approximate normality | ||
- Best for large sample sizes and standard A/B tests | ||
- **Monte Carlo Simulation**: Empirically estimate power or MDE by simulating many experiments | ||
- Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes) | |
- Ideal for complex or non-standard designs (e.g., small samples, complex effect distribution) |
|
||
`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows. | ||
|
||
🔗 **Get Started:** [Documentation Link] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a link?
📦 **Installation:** | ||
```sh | ||
pip install cluster-experiments | ||
======= | ||
# MDE calculation | ||
mde = npw.mde(df, power=0.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely what we should do. I'm thinking that, for a 1st time user, we should show the MDE calculation process and scorecard. wdyt?
yes! in the simplest set-up but yes.
The other question is where this example should be, in Readme (Home) or quickstart.
I really think there's value in a reproducible hello-world example in what all the users see, which is the readme
📦 **Installation:** | ||
```sh | ||
pip install cluster-experiments | ||
======= | ||
# MDE calculation | ||
mde = npw.mde(df, power=0.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the variance reduction example can go to quickstart instead of readme
- Dimension: api/dimension.md | ||
- Hypothesis Test: api/hypothesis_test.md | ||
- Analysis Plan: api/analysis_plan.md | ||
- Home: ../README.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
You can install **Cluster Experiments** via pip: | ||
|
||
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding simpler examples, like dictionary-based inputs in the quickstart. Not saying that we should remove what we have, but I'd also add the simplest use of the library
- Pre experiment outcome model: api/cupac_model.md | ||
- Power config: api/power_config.md | ||
- Power analysis: api/power_analysis.md | ||
- Washover: api/washover.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this move into switchback?
- Power config: api/power_config.md | ||
- Power analysis: api/power_analysis.md | ||
- Washover: api/washover.md | ||
- Metric: api/metric.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you group all of these into experiment analysis?
- Synthetic control: synthetic_control.ipynb | ||
- Delta Method Analysis: delta_method.ipynb | ||
- Experiment analysis workflow: experiment_analysis.ipynb | ||
- Contribute: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either we create this docs or we remove this, right?
Proposal for Documentation Reorganization
linked issue
I was thinking about reorganizing the documentation to improve clarity and usability. Here’s the proposed structure:
Home (README)
Quickstart (Installation + Package Overview: Splitter, Perturbator, and Analysis classes)
Documentation (APIs, details on classes and methods)
Usage Examples (Notebooks with practical examples)
Contribution (Guidelines for contributing)
In my fork,
👉 [I revamped the README to be more appealing to a broader audience
I also modified the Quickstart section to align with this idea
I’d love your feedback on whether the Quickstart section information is accurate—I’m still new to the package, so there might be some errors in definitions or concepts. (The Python notebooks currently in the docs are safe and should still be included in some form.)
For inspiration, I looked at the documentation structures of:
Ambrosia
pysurvival
Why This Change?
Makes it easier for new users to navigate and understand the package.
Provides a clearer structure for future contributors.
The examples section can be refined over time for better clarity and accuracy.
In the future, we could also add a "Stats 101" section for foundational concepts.
Let me know what you think!