-
Notifications
You must be signed in to change notification settings - Fork 6
Documentation revamp #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Documentation revamp #232
Changes from all commits
5c6558c
b6babcc
6363d80
aa04ca8
a9673c0
ba44999
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -170,3 +170,4 @@ todos.txt | |
|
||
# | ||
experiments/ | ||
cluster-experiments.code-workspace |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -11,89 +11,40 @@ https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg | |||||
 | ||||||
[](https://pypi.python.org/pypi/cluster-experiments) | ||||||
|
||||||
A Python library for end-to-end A/B testing workflows, featuring: | ||||||
- Experiment analysis and scorecards | ||||||
- Power analysis (simulation-based and normal approximation) | ||||||
- Variance reduction techniques (CUPED, CUPAC) | ||||||
- Support for complex experimental designs (cluster randomization, switchback experiments) | ||||||
|
||||||
## Key Features | ||||||
|
||||||
### 1. Power Analysis | ||||||
- **Simulation-based**: Run Monte Carlo simulations to estimate power | ||||||
- **Normal approximation**: Fast power estimation using CLT | ||||||
- **Minimum Detectable Effect**: Calculate required effect sizes | ||||||
- **Multiple designs**: Support for: | ||||||
- Simple randomization | ||||||
- Variance reduction techniques in power analysis | ||||||
- Cluster randomization | ||||||
- Switchback experiments | ||||||
- **Dict config**: Easy to configure power analysis with a dictionary | ||||||
|
||||||
### 2. Experiment Analysis | ||||||
- **Analysis Plans**: Define structured analysis plans | ||||||
- **Metrics**: | ||||||
- Simple metrics | ||||||
- Ratio metrics | ||||||
- **Dimensions**: Slice results by dimensions | ||||||
- **Statistical Methods**: | ||||||
- GEE | ||||||
- Mixed Linear Models | ||||||
- Clustered / regular OLS | ||||||
- T-tests | ||||||
- Synthetic Control | ||||||
- **Dict config**: Easy to define analysis plans with a dictionary | ||||||
|
||||||
### 3. Variance Reduction | ||||||
- **CUPED** (Controlled-experiment Using Pre-Experiment Data): | ||||||
- Use historical outcome data to reduce variance, choose any granularity | ||||||
- Support for several covariates | ||||||
- **CUPAC** (Control Using Predictors as Covariates): | ||||||
- Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data | ||||||
|
||||||
## Quick Start | ||||||
|
||||||
### Power Analysis Example | ||||||
**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows. | ||||||
|
||||||
```python | ||||||
import numpy as np | ||||||
import pandas as pd | ||||||
from cluster_experiments import PowerAnalysis, NormalPowerAnalysis | ||||||
--- | ||||||
|
||||||
# Create sample data | ||||||
N = 1_000 | ||||||
df = pd.DataFrame({ | ||||||
"target": np.random.normal(0, 1, size=N), | ||||||
"date": pd.to_datetime( | ||||||
np.random.randint( | ||||||
pd.Timestamp("2024-01-01").value, | ||||||
pd.Timestamp("2024-01-31").value, | ||||||
size=N, | ||||||
) | ||||||
), | ||||||
}) | ||||||
## 🚀 Key Features | ||||||
|
||||||
# Simulation-based power analysis with CUPED | ||||||
config = { | ||||||
"analysis": "ols", | ||||||
"perturbator": "constant", | ||||||
"splitter": "non_clustered", | ||||||
"n_simulations": 50, | ||||||
} | ||||||
pw = PowerAnalysis.from_dict(config) | ||||||
power = pw.power_analysis(df, average_effect=0.1) | ||||||
|
||||||
# Normal approximation (faster) | ||||||
npw = NormalPowerAnalysis.from_dict({ | ||||||
"analysis": "ols", | ||||||
"splitter": "non_clustered", | ||||||
"n_simulations": 5, | ||||||
"time_col": "date", | ||||||
}) | ||||||
power_normal = npw.power_analysis(df, average_effect=0.1) | ||||||
power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3]) | ||||||
### 📌 Experiment Design & Planning | ||||||
- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation | ||||||
- **Normal Approximation (CLT-based)**: Fast, analytical formulas assuming approximate normality | ||||||
- Best for large sample sizes and standard A/B tests | ||||||
- **Monte Carlo Simulation**: Empirically estimate power or MDE by simulating many experiments | ||||||
- Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- Supports complex **experimental designs**, including: | ||||||
- 🏢 **Cluster randomization** | ||||||
- 🔄 **Switchback experiments** | ||||||
- 📊 **Observational studies**, including **synthetic control** | ||||||
|
||||||
### 🧪 Statistical Methods for Analysis | ||||||
- 📌 **Ordinary Least Squares (OLS)** and **Clustered OLS**, with support for covariates | ||||||
- 🎯 **Variance Reduction Techniques**: **CUPED** and **CUPAC** | ||||||
|
||||||
### 📈 Scalable Experiment Analysis with Scorecards | ||||||
- Generate **Scorecards** to summarize experiment results, allowing analysis for multiple metrics | ||||||
- Include **confidence intervals, relative and absolute effect sizes, p-values**, | ||||||
|
||||||
`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows. | ||||||
|
||||||
🔗 **Get Started:** [Documentation Link] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing a link? |
||||||
|
||||||
📦 **Installation:** | ||||||
```sh | ||||||
pip install cluster-experiments | ||||||
======= | ||||||
# MDE calculation | ||||||
mde = npw.mde(df, power=0.8) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for the MDE example, I have to asks: needs to be reproducible (so dataframe needs to be created), and show the methods power_analysis, mde, power_line and mde_line. wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely what we should do. I'm thinking that, for a 1st time user, we should show the MDE calculation process and scorecard. wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes! in the simplest set-up but yes.
I really think there's value in a reproducible hello-world example in what all the users see, which is the readme There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the variance reduction example can go to quickstart instead of readme |
||||||
|
||||||
|
@@ -106,7 +57,6 @@ mde_timeline = npw.mde_time_line( | |||||
|
||||||
print(power, power_line_normal, power_normal, mde, mde_timeline) | ||||||
``` | ||||||
|
||||||
### Experiment Analysis Example | ||||||
|
||||||
```python | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# Quickstart | ||
|
||
## Installation | ||
|
||
You can install **Cluster Experiments** via pip: | ||
|
||
```bash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recommend adding simpler examples, like dictionary-based inputs in the quickstart. Not saying that we should remove what we have, but I'd also add the simplest use of the library |
||
pip install cluster-experiments | ||
``` | ||
|
||
!!! info "Python Version Support" | ||
**Cluster Experiments** requires **Python 3.9 or higher**. Make sure your environment meets this requirement before proceeding with the installation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's 3.8 I think |
||
|
||
--- | ||
|
||
## Usage | ||
|
||
Designing and analyzing experiments can feel overwhelming at times. After formulating a testable hypothesis, | ||
you're faced with a series of routine tasks. From collecting and transforming raw data to measuring the statistical significance of your experiment results and constructing confidence intervals, | ||
it can quickly become a repetitive and error-prone process. | ||
*Cluster Experiments* is here to change that. Built on top of well-known packages like `pandas`, `numpy`, `scipy` and `statsmodels`, it automates the core steps of an experiment, streamlining your workflow, saving you time and effort, while maintaining statistical rigor. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd make the paragraph shorter and stress what it automates: being MDE/power calculation and inference scorecards There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. given the next examples, I think it's worth mentioning that you're describing the simulaiton-based power analysis, and there are other pipelines like power analysis based on normal approximation and scorecard generation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the explanation style, maybe you could write a similar thing for NormalPowerAnalysis and AnalysisPlan |
||
## Key Features | ||
- **Modular Design**: Each component—`Splitter`, `Perturbator`, and `Analysis`—is independent, reusable, and can be combined in any way you need. | ||
- **Flexibility**: Whether you're conducting a simple A/B test or a complex clustered experiment, Cluster Experiments adapts to your needs. | ||
- **Statistical Rigor**: Built-in support for advanced statistical methods ensures that your experiments maintain high standards, including clustered standard errors and variance reduction techniques like CUPED and CUPAC. | ||
|
||
The core functionality of *Cluster Experiments* revolves around several intuitive, self-contained classes and methods: | ||
|
||
- **Splitter**: Define how your control and treatment groups are split. | ||
- **Perturbator**: Specify the type of effect you want to test. | ||
- **Analysis**: Perform statistical inference to measure the impact of your experiment. | ||
|
||
|
||
--- | ||
|
||
### `Splitter`: Defining Control and Treatment Groups | ||
|
||
The `Splitter` classes are responsible for dividing your data into control and treatment groups. The way you split your data depends on the **metric** (e.g., simple, ratio) you want to observe and the unit of observation (e.g., users, sessions, time periods). | ||
|
||
#### Features: | ||
|
||
- **Randomized Splits**: Simple random assignment of units to control and treatment groups. | ||
- **Stratified Splits**: Ensure balanced representation of key segments (e.g., geographic regions, user cohorts). | ||
- **Time-Based Splits**: Useful for switchback experiments or time-series data. | ||
|
||
```python | ||
from cluster_experiments import RandomSplitter | ||
|
||
splitter = RandomSplitter( | ||
cluster_cols=["cluster_id"], # Split by clusters | ||
treatment_col="treatment", # Name of the treatment column | ||
) | ||
``` | ||
|
||
--- | ||
|
||
### `Perturbator`: Simulating the Treatment Effect | ||
|
||
The `Perturbator` classes define the type of effect you want to test. It simulates the treatment effect on your data, allowing you to evaluate the impact of your experiment. | ||
|
||
#### Features: | ||
|
||
- **Absolute Effects**: Add a fixed uplift to the treatment group. | ||
- **Relative Effects**: Apply a percentage-based uplift to the treatment group. | ||
- **Custom Effects**: Define your own effect size or distribution. | ||
|
||
```python | ||
from cluster_experiments import ConstantPerturbator | ||
|
||
perturbator = ConstantPerturbator( | ||
average_effect=5.0 # Simulate a nominal 5% uplift | ||
) | ||
``` | ||
|
||
--- | ||
|
||
### `Analysis`: Measuring the Impact | ||
|
||
Once your data is split and the treatment effect is applied, the `Analysis` component helps you measure the statistical significance of the experiment results. It provides tools for calculating effects, confidence intervals, and p-values. | ||
|
||
You can use it for both **experiment design** (pre-experiment phase) and **analysis** (post-experiment phase). | ||
|
||
#### Features: | ||
|
||
- **Statistical Tests**: Perform t-tests, OLS regression, and other hypothesis tests. | ||
- **Effect Size**: Calculate both absolute and relative effects. | ||
- **Confidence Intervals**: Construct confidence intervals for your results. | ||
|
||
Example: | ||
|
||
```python | ||
from cluster_experiments import TTestClusteredAnalysis | ||
|
||
analysis = TTestClusteredAnalysis( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's use ClusteredOLS, I think this analysis method is a bit weird |
||
cluster_cols=["cluster_id"], # Cluster-level analysis | ||
treatment_col="treatment", # Name of the treatment column | ||
target_col="outcome" # Metric to analyze | ||
) | ||
``` | ||
|
||
--- | ||
|
||
### Putting It All Together for Experiment Design | ||
|
||
You can combine all classes as inputs in the `PowerAnalysis` class, where you can analyze different experiment settings, power lines, and Minimal Detectable Effects (MDEs). | ||
|
||
```python | ||
from cluster_experiments import PowerAnalysis | ||
from cluster_experiments import RandomSplitter, ConstantPerturbator, TTestClusteredAnalysis | ||
|
||
# Define the components | ||
splitter = RandomSplitter(cluster_cols=["cluster_id"], treatment_col="treatment") | ||
perturbator = ConstantPerturbator(average_effect=0.1) | ||
analysis = TTestClusteredAnalysis(cluster_cols=["cluster_id"], treatment_col="treatment", target_col="outcome") | ||
|
||
# Create the experiment | ||
experiment = PowerAnalysis( | ||
perturbator=perturbator, | ||
splitter=splitter, | ||
analysis=analysis, | ||
target_col="outcome", | ||
treatment_col="treatment" | ||
) | ||
|
||
# Run the experiment | ||
results = experiment.power_analysis() | ||
``` | ||
|
||
--- | ||
|
||
## Next Steps | ||
|
||
- Explore the **Core Documentation** for detailed explanations of each component. | ||
- Check out the **Usage Examples** for practical applications of the package. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
/* Custom admonition styling */ | ||
.md-typeset .admonition { | ||
border-radius: 8px; | ||
border-left: 4px solid var(--md-primary-fg-color); | ||
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); | ||
} | ||
|
||
/* Code block styling */ | ||
.md-typeset pre { | ||
border-radius: 8px; | ||
background-color: var(--md-code-bg-color); | ||
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
/* Apply text justification to all paragraphs in the documentation */ | ||
.md-content p { | ||
text-align: justify; | ||
} | ||
|
||
/* Optionally, justify lists or other specific elements */ | ||
.md-content ul, .md-content ol { | ||
text-align: justify; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add more spaces to render inside of the above, or asterisks