Skip to content

Commit e215ba8

Browse files
merge with remote changes
2 parents 4d680f0 + bc7f54f commit e215ba8

File tree

13 files changed

+620
-73
lines changed

13 files changed

+620
-73
lines changed

.github/workflows/deploy.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ jobs:
1414
steps:
1515
- name: Checkout
1616
uses: actions/checkout@v4
17-
1817
- name: Deploy
1918
uses: peaceiris/actions-gh-pages@v4
2019
if: github.ref == 'refs/heads/master'
2120
with:
2221
github_token: ${{ secrets.GITHUB_TOKEN }}
23-
publish_dir: ./public
22+
publish_dir: ./public
23+
force_orphan: true

README.md

Lines changed: 31 additions & 27 deletions
Large diffs are not rendered by default.

contribution.md

Lines changed: 59 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,65 @@
22

33
## Table of Contents
44

5-
### Writing Code
6-
- [Environment Setup](#environment-setup)
7-
- [Install uv](#install-uv)
8-
- [Install Dependencies](#install-dependencies)
9-
- [Install Pre-commit Hooks](#install-pre-commit-hooks)
10-
- [Working with Marimo Notebooks](#working-with-marimo-notebooks)
11-
- [Creating a New Notebook](#creating-a-new-notebook)
12-
- [Publishing Notebooks](#publishing-notebooks)
13-
- [Pull Request Process](#pull-request-process)
14-
15-
### Writing Blog
16-
- [Using HackMD](#using-hackmd)
17-
- [Writing Style Guidelines](#writing-style-guidelines)
18-
19-
## Writing Code
5+
- [CodeCut Mission](#codecut-mission)
6+
- [Your Responsibility as a Writer](#your-responsibility-as-a-writer)
7+
- [Writing Checklist](#writing-checklist)
8+
- [Write Article Draft](#write-article-draft)
9+
- [Write Code](#write-code)
10+
11+
## CodeCut Mission
12+
13+
CodeCut exists to help data scientists stay productive and up-to-date by delivering short, focused, and practical code examples that showcase modern tools in action.
14+
15+
We strive to:
16+
17+
- Help readers quickly understand what a tool does
18+
- Show how it fits into real-world data science workflows
19+
- Provide just enough to empower readers to try it on their own
20+
21+
## Your Responsibility as a Writer
22+
23+
As a writer for CodeCut, your role is to:
24+
25+
- Break down complex tools and workflows into clear, digestible pieces
26+
- Focus on practical value over theoretical depth
27+
- Maintain a tone that is approachable, confident, and helpful
28+
- Write only about topics you are genuinely interested in
29+
- Enjoy the writing process—we want this to be fun for you, too
30+
31+
## Writing Checklist
32+
33+
To check off an item, replace `[ ]` with `[x]`.
34+
35+
You can check off these items directly in your IDE (such as VS Code, PyCharm, or others).
36+
37+
### Writing Style Checklist
38+
39+
- [ ] Use action verbs instead of passive voice
40+
- [ ] Limit paragraphs to 2-4 sentences
41+
- [ ] For every major code block, provide a clear explanation of what it does and why it matters.
42+
- [ ] Structure content for quick scanning with clear headings and bullet points
43+
44+
### Data Science-Focused Writing Checklist
45+
46+
- [ ] Write for data scientists comfortable with Python but unfamiliar with this specific tool or library.
47+
- [ ] Use examples that align with common data science workflows or problems
48+
- [ ] Highlight **only** the features that matter to a data science audience
49+
50+
### Structure Checklist
51+
52+
- [ ] Start with a real, practical data science problem
53+
- [ ] Explain how each tool solves the problem
54+
- [ ] Use diagrams or charts to explain complex ideas, when appropriate.
55+
- [ ] Define new concepts and terminology
56+
- [ ] Only include the essential setup steps needed to run the examples. For anything beyond that, link to the official documentation.
57+
58+
## Write Article Draft
59+
60+
1. Create your blog post in [HackMD](https://hackmd.io)
61+
2. Follow [these instructions](https://hackmd.io/c/tutorials/%2F%40docs%2Finvite-others-to-a-private-note-en) to share your draft with khuyentran@codecut.ai for review
62+
63+
## Write Code
2064

2165
### Environment Setup
2266

@@ -94,19 +138,3 @@ The exported HTML files will be automatically deployed to GitHub Pages through t
94138
3. Make your changes
95139
4. Submit a pull request with a clear description of changes
96140

97-
## Writing Blog
98-
99-
### Using HackMD
100-
101-
1. Create your blog post in [HackMD](https://hackmd.io)
102-
2. Follow [these instructions](https://hackmd.io/c/tutorials/%2F%40docs%2Finvite-others-to-a-private-note-en) to share your draft with khuyentran@codecut.ai for review
103-
104-
### Writing Style Guidelines
105-
106-
When writing content, please follow these guidelines:
107-
108-
- Assume readers are data scientists who have basic programming knowledge but may be new to specific tools
109-
- Use direct, conversational language
110-
- Keep paragraphs short (2-4 sentences maximum)
111-
- Prioritize comprehensive but concise explanations without repetition
112-
- Maintain a balanced ratio of explanation to code (approximately 50/50)

data_science_tools/marimo_examples/interactive_notebook.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,30 @@
1+
# /// script
2+
# requires-python = ">=3.11"
3+
# dependencies = [
4+
# "marimo",
5+
# ]
6+
# ///
7+
18
import marimo
29

3-
__generated_with = "0.13.0"
10+
__generated_with = "0.13.7"
411
app = marimo.App(width="medium")
512

613

714
@app.cell
815
def _():
16+
import marimo as mo
917
from marimo import ui
1018

11-
multiplier = ui.slider(1, 10, 3, label="Multiplier")
19+
multiplier = ui.slider(1, 10, 1, label="Multiplier")
1220
multiplier
13-
return (multiplier,)
21+
return mo, multiplier
1422

1523

1624
@app.cell
17-
def _(multiplier):
18-
result = [x * multiplier.value for x in range(5)]
19-
print(result)
25+
def _(mo, multiplier):
26+
stars = "⭐" * multiplier.value
27+
mo.md(stars)
2028
return
2129

2230

data_science_tools/narwhals.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ def _(mo):
7171
- It doesn't return to the user the same class they started with.
7272
- It kills lazy execution.
7373
- It kills GPU acceleration.
74-
- If forces pandas as a required dependency.
74+
- If forces pandas as a required dependency
7575
"""
7676
)
7777
return
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# /// script
2+
# requires-python = ">=3.11"
3+
# dependencies = [
4+
# "duckdb==1.3.0",
5+
# "marimo",
6+
# "narwhals==1.40.0",
7+
# "pandas==2.2.3",
8+
# "polars==1.30.0",
9+
# "pyarrow==20.0.0",
10+
# ]
11+
# ///
12+
13+
import marimo
14+
15+
__generated_with = "0.13.7"
16+
app = marimo.App(width="medium")
17+
18+
19+
@app.cell
20+
def _():
21+
import marimo as mo
22+
return (mo,)
23+
24+
25+
@app.cell(hide_code=True)
26+
def _(mo):
27+
mo.md(r"""# Eager vs Lazy DataFrames: One Fix to Make Your Code Work Anywhere""")
28+
return
29+
30+
31+
@app.cell
32+
def _(mo):
33+
mo.md(r"""## Motivation""")
34+
return
35+
36+
37+
@app.cell
38+
def _():
39+
from datetime import datetime
40+
41+
import pandas as pd
42+
import polars as pl
43+
44+
data1 = {"store": [1, 1, 2], "date_id": [4, 5, 6]}
45+
data2 = {"store": [1, 2], "sales": [7, 8]}
46+
47+
pandas_df1 = pd.DataFrame(data1)
48+
pandas_df2 = pd.DataFrame(data2)
49+
50+
# The outputs are the same
51+
for _ in range(5):
52+
# Left join
53+
pandas_df = pd.merge(pandas_df1, pandas_df2, on="store", how="left")
54+
55+
# Cumulative sum of sales within each store
56+
pandas_df["cumulative_sales"] = pandas_df.groupby("store")["sales"].cumsum()
57+
58+
print(pandas_df)
59+
return data1, data2, datetime, pd, pl
60+
61+
62+
@app.cell
63+
def _(data1, data2, pl):
64+
polars_df1 = pl.DataFrame(data1).lazy()
65+
polars_df2 = pl.DataFrame(data2).lazy()
66+
67+
# The outputs are not the same
68+
for _ in range(5):
69+
print(
70+
polars_df1.join(polars_df2, on="store", how="left")
71+
.with_columns(cumulative_sales=pl.col("sales").cum_sum().over("store"))
72+
.collect(engine="streaming")
73+
)
74+
return
75+
76+
77+
@app.cell(hide_code=True)
78+
def _(mo):
79+
mo.md(r"""## Eager-only solution""")
80+
return
81+
82+
83+
@app.cell
84+
def _(datetime, pd):
85+
data = {
86+
"sale_date": [
87+
datetime(2025, 5, 22),
88+
datetime(2025, 5, 23),
89+
datetime(2025, 5, 24),
90+
datetime(2025, 5, 22),
91+
datetime(2025, 5, 23),
92+
datetime(2025, 5, 24),
93+
],
94+
"store": [
95+
"Thimphu",
96+
"Thimphu",
97+
"Thimphu",
98+
"Paro",
99+
"Paro",
100+
"Paro",
101+
],
102+
"sales": [1100, None, 1450, 501, 500, None],
103+
}
104+
105+
pdf = pd.DataFrame(data)
106+
print(pdf)
107+
return (data,)
108+
109+
110+
@app.cell
111+
def _():
112+
import narwhals as nw
113+
from narwhals.typing import IntoFrameT
114+
115+
116+
def agnostic_ffill_by_store(df_native: IntoFrameT) -> IntoFrameT:
117+
# Supports pandas and Polars.DataFrame, but not lazy ones.
118+
return (
119+
nw.from_native(df_native)
120+
.with_columns(
121+
nw.col("sales").fill_null(strategy="forward").over("store")
122+
)
123+
.to_native()
124+
)
125+
return IntoFrameT, agnostic_ffill_by_store, nw
126+
127+
128+
@app.cell
129+
def _(agnostic_ffill_by_store, data, pd):
130+
# pandas.DataFrame
131+
df_pandas = pd.DataFrame(data)
132+
agnostic_ffill_by_store(df_pandas)
133+
return (df_pandas,)
134+
135+
136+
@app.cell
137+
def _(agnostic_ffill_by_store, data, pl):
138+
# polars.DataFrame
139+
df_polars = pl.DataFrame(data)
140+
agnostic_ffill_by_store(df_polars)
141+
return (df_polars,)
142+
143+
144+
@app.cell
145+
def _():
146+
import duckdb
147+
148+
duckdb_rel = duckdb.table("df_polars")
149+
duckdb_rel
150+
return (duckdb_rel,)
151+
152+
153+
@app.cell
154+
def _():
155+
# agnostic_ffill_by_store(duckdb_rel)
156+
# Error: narwhals.exceptions.OrderDependentExprError: Order-dependent expressions are not supported for use in LazyFrame.
157+
return
158+
159+
160+
@app.cell(hide_code=True)
161+
def _(mo):
162+
mo.md(r"""## Eager and lazy solution""")
163+
return
164+
165+
166+
@app.cell
167+
def _(IntoFrameT, nw):
168+
def agnostic_ffill_by_store_improved(df_native: IntoFrameT) -> IntoFrameT:
169+
return (
170+
nw.from_native(df_native)
171+
.with_columns(
172+
nw.col("sales")
173+
.fill_null(strategy="forward")
174+
# Note the `order_by` statement
175+
.over("store", order_by="sale_date")
176+
)
177+
.to_native()
178+
)
179+
return (agnostic_ffill_by_store_improved,)
180+
181+
182+
@app.cell
183+
def _(agnostic_ffill_by_store_improved, duckdb_rel):
184+
agnostic_ffill_by_store_improved(duckdb_rel)
185+
return
186+
187+
188+
@app.cell
189+
def _(agnostic_ffill_by_store_improved, df_polars):
190+
agnostic_ffill_by_store_improved(df_polars.lazy()).collect()
191+
return
192+
193+
194+
@app.cell
195+
def _(agnostic_ffill_by_store_improved, df_pandas):
196+
# Note that it still supports pandas
197+
print(agnostic_ffill_by_store_improved(df_pandas))
198+
return
199+
200+
201+
@app.cell
202+
def _():
203+
return
204+
205+
206+
if __name__ == "__main__":
207+
app.run()

export_notebook.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ uv run marimo export html "$notebook_name.py" -o "public/$notebook_name.html" --
2424
# Check if the export was successful
2525
if [ $? -eq 0 ]; then
2626
echo "Successfully exported $notebook_name.py to public/$notebook_name.html"
27+
# Generate index.html
28+
echo "Generating index.html..."
29+
uv run scripts/generate_index.py
2730
else
2831
echo "Error: Failed to export notebook"
2932
exit 1

public/data_science_tools/narwhals.html

Lines changed: 3 additions & 3 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)