CodeCutTech
diff --git a/‎.github/workflows/deploy.yml
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/deploy.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md
Lines changed: 31 additions & 27 deletions b/‎README.md
Lines changed: 31 additions & 27 deletions
diff --git a/‎contribution.md
Lines changed: 59 additions & 31 deletions b/‎contribution.md
Lines changed: 59 additions & 31 deletions
diff --git a/‎data_science_tools/marimo_examples/interactive_notebook.py
Lines changed: 14 additions & 6 deletions b/‎data_science_tools/marimo_examples/interactive_notebook.py
Lines changed: 14 additions & 6 deletions
diff --git a/‎data_science_tools/narwhals.py
Lines changed: 1 addition & 1 deletion b/‎data_science_tools/narwhals.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎data_science_tools/narwhals_row_ordering.py
Lines changed: 207 additions & 0 deletions b/‎data_science_tools/narwhals_row_ordering.py
Lines changed: 207 additions & 0 deletions
diff --git a/‎export_notebook.sh
Lines changed: 3 additions & 0 deletions b/‎export_notebook.sh
Lines changed: 3 additions & 0 deletions
diff --git a/‎public/data_science_tools/narwhals.html
Lines changed: 3 additions & 3 deletions b/‎public/data_science_tools/narwhals.html
Lines changed: 3 additions & 3 deletions
@@ -14,10 +14,10 @@ jobs:
     steps:
       - name: Checkout
         uses: actions/checkout@v4
-
       - name: Deploy
         uses: peaceiris/actions-gh-pages@v4
         if: github.ref == 'refs/heads/master'
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
-          publish_dir: ./public
+          publish_dir: ./public
+          force_orphan: true
@@ -2,21 +2,65 @@
 
 ## Table of Contents
 
-### Writing Code
-- [Environment Setup](#environment-setup)
-  - [Install uv](#install-uv)
-  - [Install Dependencies](#install-dependencies)
-  - [Install Pre-commit Hooks](#install-pre-commit-hooks)
-- [Working with Marimo Notebooks](#working-with-marimo-notebooks)
-  - [Creating a New Notebook](#creating-a-new-notebook)
-  - [Publishing Notebooks](#publishing-notebooks)
-- [Pull Request Process](#pull-request-process)
-
-### Writing Blog
-- [Using HackMD](#using-hackmd)
-- [Writing Style Guidelines](#writing-style-guidelines)
-
-## Writing Code
+- [CodeCut Mission](#codecut-mission)
+- [Your Responsibility as a Writer](#your-responsibility-as-a-writer)
+- [Writing Checklist](#writing-checklist)
+- [Write Article Draft](#write-article-draft)
+- [Write Code](#write-code)
+
+## CodeCut Mission
+
+CodeCut exists to help data scientists stay productive and up-to-date by delivering short, focused, and practical code examples that showcase modern tools in action.
+
+We strive to:
+
+- Help readers quickly understand what a tool does
+- Show how it fits into real-world data science workflows
+- Provide just enough to empower readers to try it on their own
+
+## Your Responsibility as a Writer
+
+As a writer for CodeCut, your role is to:
+
+- Break down complex tools and workflows into clear, digestible pieces
+- Focus on practical value over theoretical depth
+- Maintain a tone that is approachable, confident, and helpful
+- Write only about topics you are genuinely interested in
+- Enjoy the writing process—we want this to be fun for you, too
+
+## Writing Checklist
+
+To check off an item, replace `[ ]` with `[x]`.
+
+You can check off these items directly in your IDE (such as VS Code, PyCharm, or others).
+
+### Writing Style Checklist
+
+- [ ] Use action verbs instead of passive voice
+- [ ] Limit paragraphs to 2-4 sentences
+- [ ] For every major code block, provide a clear explanation of what it does and why it matters.
+- [ ] Structure content for quick scanning with clear headings and bullet points
+
+### Data Science-Focused Writing Checklist
+
+- [ ] Write for data scientists comfortable with Python but unfamiliar with this specific tool or library.
+- [ ] Use examples that align with common data science workflows or problems
+- [ ] Highlight **only** the features that matter to a data science audience
+
+### Structure Checklist
+
+- [ ] Start with a real, practical data science problem
+- [ ] Explain how each tool solves the problem
+- [ ] Use diagrams or charts to explain complex ideas, when appropriate.
+- [ ] Define new concepts and terminology
+- [ ] Only include the essential setup steps needed to run the examples. For anything beyond that, link to the official documentation.
+
+## Write Article Draft
+
+1. Create your blog post in [HackMD](https://hackmd.io)
+2. Follow [these instructions](https://hackmd.io/c/tutorials/%2F%40docs%2Finvite-others-to-a-private-note-en) to share your draft with khuyentran@codecut.ai for review
+
+## Write Code
 
 ### Environment Setup
 
@@ -94,19 +138,3 @@ The exported HTML files will be automatically deployed to GitHub Pages through t
 3. Make your changes
 4. Submit a pull request with a clear description of changes
 
-## Writing Blog
-
-### Using HackMD
-
-1. Create your blog post in [HackMD](https://hackmd.io)
-2. Follow [these instructions](https://hackmd.io/c/tutorials/%2F%40docs%2Finvite-others-to-a-private-note-en) to share your draft with khuyentran@codecut.ai for review
-
-### Writing Style Guidelines
-
-When writing content, please follow these guidelines:
-
-- Assume readers are data scientists who have basic programming knowledge but may be new to specific tools
-- Use direct, conversational language
-- Keep paragraphs short (2-4 sentences maximum)
-- Prioritize comprehensive but concise explanations without repetition
-- Maintain a balanced ratio of explanation to code (approximately 50/50)
 
@@ -1,22 +1,30 @@
+# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#     "marimo",
+# ]
+# ///
+
 import marimo
 
-__generated_with = "0.13.0"
+__generated_with = "0.13.7"
 app = marimo.App(width="medium")
 
 
 @app.cell
 def _():
+    import marimo as mo
     from marimo import ui
 
-    multiplier = ui.slider(1, 10, 3, label="Multiplier")
+    multiplier = ui.slider(1, 10, 1, label="Multiplier")
     multiplier
-    return (multiplier,)
+    return mo, multiplier
 
 
 @app.cell
-def _(multiplier):
-    result = [x * multiplier.value for x in range(5)]
-    print(result)
+def _(mo, multiplier):
+    stars = "⭐" * multiplier.value
+    mo.md(stars)
     return
 
 
 
@@ -71,7 +71,7 @@ def _(mo):
     - It doesn't return to the user the same class they started with.
     - It kills lazy execution.
     - It kills GPU acceleration.
-    - If forces pandas as a required dependency.
+    - If forces pandas as a required dependency
     """
     )
     return
 
@@ -0,0 +1,207 @@
+# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#     "duckdb==1.3.0",
+#     "marimo",
+#     "narwhals==1.40.0",
+#     "pandas==2.2.3",
+#     "polars==1.30.0",
+#     "pyarrow==20.0.0",
+# ]
+# ///
+
+import marimo
+
+__generated_with = "0.13.7"
+app = marimo.App(width="medium")
+
+
+@app.cell
+def _():
+    import marimo as mo
+    return (mo,)
+
+
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""# Eager vs Lazy DataFrames: One Fix to Make Your Code Work Anywhere""")
+    return
+
+
+@app.cell
+def _(mo):
+    mo.md(r"""## Motivation""")
+    return
+
+
+@app.cell
+def _():
+    from datetime import datetime
+
+    import pandas as pd
+    import polars as pl
+
+    data1 = {"store": [1, 1, 2], "date_id": [4, 5, 6]}
+    data2 = {"store": [1, 2], "sales": [7, 8]}
+
+    pandas_df1 = pd.DataFrame(data1)
+    pandas_df2 = pd.DataFrame(data2)
+
+    # The outputs are  the same
+    for _ in range(5):
+        # Left join
+        pandas_df = pd.merge(pandas_df1, pandas_df2, on="store", how="left")
+
+        # Cumulative sum of sales within each store
+        pandas_df["cumulative_sales"] = pandas_df.groupby("store")["sales"].cumsum()
+
+        print(pandas_df)
+    return data1, data2, datetime, pd, pl
+
+
+@app.cell
+def _(data1, data2, pl):
+    polars_df1 = pl.DataFrame(data1).lazy()
+    polars_df2 = pl.DataFrame(data2).lazy()
+
+    # The outputs are not the same
+    for _ in range(5):
+        print(
+            polars_df1.join(polars_df2, on="store", how="left")
+            .with_columns(cumulative_sales=pl.col("sales").cum_sum().over("store"))
+            .collect(engine="streaming")
+        )
+    return
+
+
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""## Eager-only solution""")
+    return
+
+
+@app.cell
+def _(datetime, pd):
+    data = {
+    	"sale_date": [
+    		datetime(2025, 5, 22),
+    		datetime(2025, 5, 23),
+    		datetime(2025, 5, 24),
+    		datetime(2025, 5, 22),
+    		datetime(2025, 5, 23),
+    		datetime(2025, 5, 24),
+    	],
+    	"store": [
+    		"Thimphu",
+    		"Thimphu",
+    		"Thimphu",
+    		"Paro",
+    		"Paro",
+    		"Paro",
+    	],
+    	"sales": [1100, None, 1450, 501, 500, None],
+    }
+
+    pdf = pd.DataFrame(data)
+    print(pdf)
+    return (data,)
+
+
+@app.cell
+def _():
+    import narwhals as nw
+    from narwhals.typing import IntoFrameT
+
+
+    def agnostic_ffill_by_store(df_native: IntoFrameT) -> IntoFrameT:
+    	# Supports pandas and Polars.DataFrame, but not lazy ones.
+    	return (
+    		nw.from_native(df_native)
+    		.with_columns(
+    			nw.col("sales").fill_null(strategy="forward").over("store")
+    		)
+    		.to_native()
+    	)
+    return IntoFrameT, agnostic_ffill_by_store, nw
+
+
+@app.cell
+def _(agnostic_ffill_by_store, data, pd):
+    # pandas.DataFrame
+    df_pandas = pd.DataFrame(data)
+    agnostic_ffill_by_store(df_pandas)
+    return (df_pandas,)
+
+
+@app.cell
+def _(agnostic_ffill_by_store, data, pl):
+    # polars.DataFrame
+    df_polars = pl.DataFrame(data)
+    agnostic_ffill_by_store(df_polars)
+    return (df_polars,)
+
+
+@app.cell
+def _():
+    import duckdb
+
+    duckdb_rel = duckdb.table("df_polars")
+    duckdb_rel
+    return (duckdb_rel,)
+
+
+@app.cell
+def _():
+    # agnostic_ffill_by_store(duckdb_rel)
+    # Error: narwhals.exceptions.OrderDependentExprError: Order-dependent expressions are not supported for use in LazyFrame.
+    return
+
+
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""## Eager and lazy solution""")
+    return
+
+
+@app.cell
+def _(IntoFrameT, nw):
+    def agnostic_ffill_by_store_improved(df_native: IntoFrameT) -> IntoFrameT:
+    	return (
+    		nw.from_native(df_native)
+    		.with_columns(
+    			nw.col("sales")
+    			.fill_null(strategy="forward")
+    			# Note the `order_by` statement
+    			.over("store", order_by="sale_date")
+    		)
+    		.to_native()
+    	)
+    return (agnostic_ffill_by_store_improved,)
+
+
+@app.cell
+def _(agnostic_ffill_by_store_improved, duckdb_rel):
+    agnostic_ffill_by_store_improved(duckdb_rel)
+    return
+
+
+@app.cell
+def _(agnostic_ffill_by_store_improved, df_polars):
+    agnostic_ffill_by_store_improved(df_polars.lazy()).collect()
+    return
+
+
+@app.cell
+def _(agnostic_ffill_by_store_improved, df_pandas):
+    # Note that it still supports pandas
+    print(agnostic_ffill_by_store_improved(df_pandas))
+    return
+
+
+@app.cell
+def _():
+    return
+
+
+if __name__ == "__main__":
+    app.run()
@@ -24,6 +24,9 @@ uv run marimo export html "$notebook_name.py" -o "public/$notebook_name.html" --
 # Check if the export was successful
 if [ $? -eq 0 ]; then
     echo "Successfully exported $notebook_name.py to public/$notebook_name.html"
+    # Generate index.html
+    echo "Generating index.html..."
+    uv run scripts/generate_index.py
 else
     echo "Error: Failed to export notebook"
     exit 1
Original file line number	Diff line number	Diff line change
`@@ -71,7 +71,7 @@ def _(mo):`
`71`	`71`	`- It doesn't return to the user the same class they started with.`
`72`	`72`	`- It kills lazy execution.`
`73`	`73`	`- It kills GPU acceleration.`
`74`		`- - If forces pandas as a required dependency.`
	`74`	`+ - If forces pandas as a required dependency`
`75`	`75`	`"""`
`76`	`76`	`)`
`77`	`77`	`return`