david26694 · luizhsuperti · Feb 8, 2025 · Feb 25, 2025 · Feb 26, 2025 · Feb 26, 2025
diff --git a/.gitignore b/.gitignore
@@ -170,3 +170,4 @@ todos.txt
 
 #
 experiments/
+cluster-experiments.code-workspace
diff --git a/README.md b/README.md
@@ -11,89 +11,40 @@ https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg
 ![License](https://img.shields.io/github/license/david26694/cluster-experiments)
 [![Pypi version](https://img.shields.io/pypi/pyversions/cluster-experiments.svg)](https://pypi.python.org/pypi/cluster-experiments)
 
-A Python library for end-to-end A/B testing workflows, featuring:
-- Experiment analysis and scorecards
-- Power analysis (simulation-based and normal approximation)
-- Variance reduction techniques (CUPED, CUPAC)
-- Support for complex experimental designs (cluster randomization, switchback experiments)
-
-## Key Features
-
-### 1. Power Analysis
-- **Simulation-based**: Run Monte Carlo simulations to estimate power
-- **Normal approximation**: Fast power estimation using CLT
-- **Minimum Detectable Effect**: Calculate required effect sizes
-- **Multiple designs**: Support for:
-  - Simple randomization
-  - Variance reduction techniques in power analysis
-  - Cluster randomization
-  - Switchback experiments
-- **Dict config**: Easy to configure power analysis with a dictionary
-
-### 2. Experiment Analysis
-- **Analysis Plans**: Define structured analysis plans
-- **Metrics**:
-  - Simple metrics
-  - Ratio metrics
-- **Dimensions**: Slice results by dimensions
-- **Statistical Methods**:
-  - GEE
-  - Mixed Linear Models
-  - Clustered / regular OLS
-  - T-tests
-  - Synthetic Control
-- **Dict config**: Easy to define analysis plans with a dictionary
-
-### 3. Variance Reduction
-- **CUPED** (Controlled-experiment Using Pre-Experiment Data):
-  - Use historical outcome data to reduce variance, choose any granularity
-  - Support for several covariates
-- **CUPAC** (Control Using Predictors as Covariates):
-  - Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data
-
-## Quick Start
-
-### Power Analysis Example
+**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows.
 
-```python
-import numpy as np
-import pandas as pd
-from cluster_experiments import PowerAnalysis, NormalPowerAnalysis
+---
 
-# Create sample data
-N = 1_000
-df = pd.DataFrame({
-    "target": np.random.normal(0, 1, size=N),
-    "date": pd.to_datetime(
-        np.random.randint(
-            pd.Timestamp("2024-01-01").value,
-            pd.Timestamp("2024-01-31").value,
-            size=N,
-        )
-    ),
-})
+## 🚀 Key Features
 
-# Simulation-based power analysis with CUPED
-config = {
-    "analysis": "ols",
-    "perturbator": "constant",
-    "splitter": "non_clustered",
-    "n_simulations": 50,
-}
-pw = PowerAnalysis.from_dict(config)
-power = pw.power_analysis(df, average_effect=0.1)
-
-# Normal approximation (faster)
-npw = NormalPowerAnalysis.from_dict({
-    "analysis": "ols",
-    "splitter": "non_clustered",
-    "n_simulations": 5,
-    "time_col": "date",
-})
-power_normal = npw.power_analysis(df, average_effect=0.1)
-power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3])
+### 📌 Experiment Design & Planning
+- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation
+  - **Normal Approximation (CLT-based)**: Fast, analytical formulas assuming approximate normality
+    - Best for large sample sizes and standard A/B tests
+  - **Monte Carlo Simulation**: Empirically estimate power or MDE by simulating many experiments
+    - Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes)
-    - Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes)
+    - Ideal for complex or non-standard designs (e.g., small samples, complex effect distribution)
-    - Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes)
+    - Ideal for complex or non-standard designs (e.g., small samples, complex effect distribution)
 
+- Supports complex **experimental designs**, including:
+  - 🏢 **Cluster randomization**
+  - 🔄 **Switchback experiments**
+  - 📊 **Observational studies**, including **synthetic control**
 
+### 🧪 Statistical Methods for Analysis
+- 📌 **Ordinary Least Squares (OLS)** and **Clustered OLS**, with support for covariates
+- 🎯 **Variance Reduction Techniques**: **CUPED** and **CUPAC**
+
+### 📈 Scalable Experiment Analysis with Scorecards
+- Generate **Scorecards** to summarize experiment results, allowing analysis for multiple metrics
+- Include **confidence intervals, relative and absolute effect sizes, p-values**,
+
+`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows.
+
+🔗 **Get Started:** [Documentation Link]
+
+📦 **Installation:**
+```sh
+pip install cluster-experiments
+=======
 # MDE calculation
 mde = npw.mde(df, power=0.8)
 
@@ -106,7 +57,6 @@ mde_timeline = npw.mde_time_line(
 
 print(power, power_line_normal, power_normal, mde, mde_timeline)
 ```
-
 ### Experiment Analysis Example
 
 ```python

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -0,0 +1,134 @@
+# Quickstart
+
+## Installation
+
+You can install **Cluster Experiments** via pip:
+
+```bash
+pip install cluster-experiments
+```
+
+!!! info "Python Version Support"
+    **Cluster Experiments** requires **Python 3.9 or higher**. Make sure your environment meets this requirement before proceeding with the installation.
+
+---
+
+## Usage
+
+Designing and analyzing experiments can feel overwhelming at times. After formulating a testable hypothesis,
+you're faced with a series of routine tasks. From collecting and transforming raw data to measuring the statistical significance of your experiment results and constructing confidence intervals,
+it can quickly become a repetitive and error-prone process.
+*Cluster Experiments* is here to change that. Built on top of well-known packages like `pandas`, `numpy`, `scipy` and `statsmodels`,  it automates the core steps of an experiment, streamlining your workflow, saving you time and effort, while maintaining statistical rigor.
+## Key Features
+- **Modular Design**: Each component—`Splitter`, `Perturbator`, and `Analysis`—is independent, reusable, and can be combined in any way you need.
+- **Flexibility**: Whether you're conducting a simple A/B test or a complex clustered experiment, Cluster Experiments adapts to your needs.
+- **Statistical Rigor**: Built-in support for advanced statistical methods ensures that your experiments maintain high standards, including clustered standard errors and variance reduction techniques like CUPED and CUPAC.
+
+The core functionality of *Cluster Experiments* revolves around several intuitive, self-contained classes and methods:
+
+- **Splitter**: Define how your control and treatment groups are split.
+- **Perturbator**: Specify the type of effect you want to test.
+- **Analysis**: Perform statistical inference to measure the impact of your experiment.
+
+
+---
+
+### `Splitter`: Defining Control and Treatment Groups
+
+The `Splitter` classes are responsible for dividing your data into control and treatment groups. The way you split your data depends on the **metric** (e.g., simple, ratio) you want to observe and the unit of observation (e.g., users, sessions, time periods).
+
+#### Features:
+
+- **Randomized Splits**: Simple random assignment of units to control and treatment groups.
+- **Stratified Splits**: Ensure balanced representation of key segments (e.g., geographic regions, user cohorts).
+- **Time-Based Splits**: Useful for switchback experiments or time-series data.
+
+```python
+from cluster_experiments import RandomSplitter
+
+splitter = RandomSplitter(
+    cluster_cols=["cluster_id"],  # Split by clusters
+    treatment_col="treatment",    # Name of the treatment column
+)
+```
+
+---
+
+### `Perturbator`: Simulating the Treatment Effect
+
+The `Perturbator` classes define the type of effect you want to test. It simulates the treatment effect on your data, allowing you to evaluate the impact of your experiment.
+
+#### Features:
+
+- **Absolute Effects**: Add a fixed uplift to the treatment group.
+- **Relative Effects**: Apply a percentage-based uplift to the treatment group.
+- **Custom Effects**: Define your own effect size or distribution.
+
+```python
+from cluster_experiments import ConstantPerturbator
+
+perturbator = ConstantPerturbator(
+    average_effect=5.0  # Simulate a nominal 5% uplift
+)
+```
+
+---
+
+### `Analysis`: Measuring the Impact
+
+Once your data is split and the treatment effect is applied, the `Analysis` component helps you measure the statistical significance of the experiment results. It provides tools for calculating effects, confidence intervals, and p-values.
+
+You can use it for both **experiment design** (pre-experiment phase) and **analysis** (post-experiment phase).
+
+#### Features:
+
+- **Statistical Tests**: Perform t-tests, OLS regression, and other hypothesis tests.
+- **Effect Size**: Calculate both absolute and relative effects.
+- **Confidence Intervals**: Construct confidence intervals for your results.
+
+Example:
+
+```python
+from cluster_experiments import TTestClusteredAnalysis
+
+analysis = TTestClusteredAnalysis(
+    cluster_cols=["cluster_id"],  # Cluster-level analysis
+    treatment_col="treatment",    # Name of the treatment column
+    target_col="outcome"          # Metric to analyze
+)
+```
+
+---
+
+### Putting It All Together for Experiment Design
+
+You can combine all classes as inputs in the `PowerAnalysis` class, where you can analyze different experiment settings, power lines, and Minimal Detectable Effects (MDEs).
+
+```python
+from cluster_experiments import PowerAnalysis
+from cluster_experiments import RandomSplitter, ConstantPerturbator, TTestClusteredAnalysis
+
+# Define the components
+splitter = RandomSplitter(cluster_cols=["cluster_id"], treatment_col="treatment")
+perturbator = ConstantPerturbator(average_effect=0.1)
+analysis = TTestClusteredAnalysis(cluster_cols=["cluster_id"], treatment_col="treatment", target_col="outcome")
+
+# Create the experiment
+experiment = PowerAnalysis(
+    perturbator=perturbator,
+    splitter=splitter,
+    analysis=analysis,
+    target_col="outcome",
+    treatment_col="treatment"
+)
+
+# Run the experiment
+results = experiment.power_analysis()
+```
+
+---
+
+## Next Steps
+
+- Explore the **Core Documentation** for detailed explanations of each component.
+- Check out the **Usage Examples** for practical applications of the package.
diff --git a/docs/stylesheets/overrides.css b/docs/stylesheets/overrides.css
@@ -0,0 +1,13 @@
+/* Custom admonition styling */
+.md-typeset .admonition {
+  border-radius: 8px;
+  border-left: 4px solid var(--md-primary-fg-color);
+  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
+}
+
+/* Code block styling */
+.md-typeset pre {
+  border-radius: 8px;
+  background-color: var(--md-code-bg-color);
+  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
+}
diff --git a/docs/stylesheets/style.css b/docs/stylesheets/style.css
@@ -0,0 +1,9 @@
+/* Apply text justification to all paragraphs in the documentation */
+.md-content p {
+    text-align: justify;
+}
+
+/* Optionally, justify lists or other specific elements */
+.md-content ul, .md-content ol {
+    text-align: justify;
+}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -170,3 +170,4 @@ todos.txt

		#
		experiments/
		cluster-experiments.code-workspace