Skip to content

Dagster Essentials dg #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/dagster-essentials/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM mcr.microsoft.com/devcontainers/python:0-3.11-bullseye
ENV PYTHONUNBUFFERED 1

COPY --from=ghcr.io/astral-sh/uv:0.4.7 /uv /bin/uv
COPY --from=ghcr.io/astral-sh/uv:0.6.10 /uv /bin/uv

COPY dagster_university/dagster_essentials/pyproject.toml .
RUN uv pip install -r pyproject.toml --system
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ celerybeat.pid
*.sage.py

# Environments
.env
.venv
env/
venv/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ For example:

```python
import dagster as dg
from dagster_essentials.assets import metrics
from dagster_essentials.defs.assets import metrics

metric_assets = dg.load_assets_from_modules(
modules=[metrics],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ Let’s add metadata to the `taxi_trips_file` asset to demonstrate further. This
```python
import dagster as dg
import requests
from dagster_essentials.assets import constants
from dagster_essentials.partitions import monthly_partition
from dagster_essentials.defs.assets import constants
from dagster_essentials.defs.partitions import monthly_partition

@dg.asset(
partitions_def=monthly_partition,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ The metadata you built should look similar to the code contained in the **View a
```python {% obfuscated="true" %}
import dagster as dg


@dg.asset(
group_name="raw_files",
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ In Lesson 9, you created the `adhoc_request` asset. During materialization, the

import matplotlib.pyplot as plt

from dagster_essentials.assets import constants
from dagster_essentials.defs.assets import constants

class AdhocRequestConfig(dg.Config):
filename: str
Expand Down Expand Up @@ -138,7 +138,7 @@ from dagster_duckdb import DuckDBResource
import matplotlib.pyplot as plt
import base64

from dagster_essentials.assets import constants
from dagster_essentials.defs.assets import constants

class AdhocRequestConfig(dg.Config):
filename: str
Expand Down
37 changes: 15 additions & 22 deletions course/pages/dagster-essentials/lesson-2/1-set-up-local.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ lesson: '2'

# Set up local

This will set up Dagster for you local machine. If you would prefer to do this course in Github Codespaces, please follow [that guide](/dagster-essentials/lesson-2/2-set-up-codespace).

- **To install git.** Refer to the [Git documentation](https://github.com/git-guides/install-git) if you don’t have this installed.
- **To have Python installed.** Dagster supports Python 3.9 - 3.12.
- **To install a package manager**. To manage the python packages, we recommend [`uv`]((https://docs.astral.sh/uv/)) which Dagster uses internally.
Expand Down Expand Up @@ -34,46 +36,37 @@ After cloning the Dagster University project, you’ll want to navigate to speci
cd dagster_university/dagster_essentials
```

## Install the dependencies
## Install uv and dg

**uv**
Now we want to install `dg`. This is the command line interface that makes it easy to interact with Dagster. Throughout the course we will use `dg` to scaffold our project and streamline the development process.

To install the python dependencies with [uv](https://docs.astral.sh/uv/).
In order to best use `dg` we will need the Python package manager [`uv`](https://docs.astral.sh/uv/). `uv` will allow us to install `dg` globally and more easily build our virtual environments.

If you do not have `uv` instead already, you can do so with:
```bash
uv sync
brew install uv
```

This will create a virtual environment that you can now use.

Now you can use `uv` to install `dg` globally:
```bash
source .venv/bin/activate
uv tool install dagster-dg
```

**pip**

Create the virtual environment.
## Install the dependencies

With `uv` and `dg` set, we can create the virtual environment specific to this course. All of the dependencies are maintained in the `pyproject.toml` (you will not need to edit anything in that project for this course). To create the virtual environment, run:
```bash
python3 -m venv .venv
uv sync
```

Enter the virtual environment.
This will create a virtual environment and install all the necessary dependencies. To activate this virtual environment:

```bash
source .venv/bin/activate
```

Install the packages.

```bash
pip install -e ".[dev]"
```

## Create .env file

You will want to make a copy of the example file `.env.example` which will be used later on.
To ensure everything is working you can launch the Dagster UI.

```bash
cp .env.example .env
dg dev
```
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,13 @@ cd dagster_university/dagster_essentials
To ensure everything is working you can launch the Dagster UI.

```bash
dagster dev
dg dev
```

After Dagster starts running you will be prompted to open the Dagster UI within your browser. Click "Open in Browser".

![Codespace Launch](/images/shared/codespaces/codespaces-launch.png)

## Create .env file

You will want to make a copy of the example file `.env.example` which will be used later on.

```bash
cp .env.example .env
```

## Stopping your Github Codespace

Be sure to stop your Codespace when you are not using it. Github provides personal accounts [120 cores hours per month](https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-github-codespaces/about-billing-for-github-codespaces#monthly-included-storage-and-core-hours-for-personal-accounts).
Expand Down
40 changes: 23 additions & 17 deletions course/pages/dagster-essentials/lesson-2/3-project-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,40 @@ Let’s talk a bit about the files in the Dagster Essentials course. The `dagste

```bash
dagster_university/dagster_essentials
├── Makefile
├── README.md
.
├── dagster_cloud.yaml
├── dagster_essentials
│   ├── __init__.py
│   ├── assets
│   │   ├── __init__.py
│   │   ├── constants.py
│   │   ├── metrics.py
│   │   └── trips.py
│   ├── completed
│   │   └── ...
│   │   ├── lesson_3
│   │   ├── lesson_4
│   │   ├── lesson_5
│   │   ├── lesson_6
│   │   ├── lesson_7
│   │   ├── lesson_8
│   │   └── lesson_9
│   ├── definitions.py
│   ├── jobs.py
│   ├── partitions.py
│   ├── resources.py
│   ├── schedules.py
│   └── sensors.py
│   └── defs
│   ├── assets
│   │   ├── __init__.py
│   │   ├── constants.py
│   │   ├── metrics.py
│   │   └── trips.py
│   ├── jobs.py
│   ├── partitions.py
│   ├── resources.py
│   ├── schedules.py
│   └── sensors.py
├── dagster_essentials_tests
│   └── ...
├── data
│   ├── outputs
│   ├── raw
│   ├── requests
│   └── staging
├── env.example
├── .env
├── Makefile
├── pyproject.toml
├── pytest.ini
├── README.md
└── uv.lock
```

Expand Down Expand Up @@ -90,7 +96,7 @@ The columns in the following table are as follows:

---

- `.env.example`
- `.env`
- Python
- A text file containing pre-configured environment variables. We’ll talk more about this file in Lesson 6, when we cover connecting to external services.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,58 @@ In this course, you’ll use data from [NYC OpenData](https://opendata.cityofnew

Your first asset, which you’ll name `taxi_trips_file`, will retrieve the yellow taxi trip data for March 2023 and save it to a location on your local machine.

1. First, navigate to and open the `assets/trips.py` file in your Dagster project. This is where you’ll write your asset code.
## Project structure

2. At the top of the `trips.py` file, add the following imports:
Before we write our first asset, let's talk a little about project structures in Dagster. In the previous lesson we mentioned `dg` and how it offers a lot of helpful functionality to quickstart our project. We can use commands like `dg scaffold project` to initialize a `uv` virtual environment for us but we already took care of that when we set up the course in lesson 2.

However we can use `dg` to scaffold a file for our first asset. Run the following command to create the file that will contain our first asset.

```bash
dg scaffold dagster.asset assets/trips.py
```

This will add a `trips.py` file to our Dagster project.

```
.
└── dagster_essentials
   └── defs
   └── assets
   ├── __init__.py
   ├── constants.py # already present
   └── trips.py
```

**Note:** If we were starting a project from scratch we would use [`dg initialization`](https://docs.dagster.io/guides/labs/dg/scaffolding-a-project) which will handle the creation of our virtual environment. However since we already have a virtual environment defined, we can skip this step.


Using `dg` to scaffold your project will ensure that files are placed in the correct location. We can ensure that everything is configured correctly also using `dg`.

```bash
> dg check defs
No definitions are defined for this project.
```

This command will confirm that our project is laid out correctly. Next we can use `dg` to list all the objects in our project.

```bash
> dg list defs
No definitions are defined for this project.
```

This makes sense because even though we created the file that will contain our asset, we have not yet included the code.

## Defining your first asset

With the files set we can now add our first asset.

1. Navigate and open the newly created `defs/assets/trips.py` file in your Dagster project. This is where you’ll write your asset code.

2. Within the `trips.py` file, remove the generated code from the scaffolding and replace it with the following imports:

```python
import requests
from dagster_essentials.assets import constants
from dagster_essentials.defs.assets import constants
```

3. Below the imports, let's define a function that takes no inputs and returns nothing (type-annoted with `None`). Add the following code to create a function to do this named `taxi_trips_file`:
Expand Down Expand Up @@ -47,7 +92,7 @@ Your first asset, which you’ll name `taxi_trips_file`, will retrieve the yello

```python
import requests
from dagster_essentials.assets import constants
from dagster_essentials.defs.assets import constants
import dagster as dg

@dg.asset
Expand All @@ -66,4 +111,22 @@ Your first asset, which you’ll name `taxi_trips_file`, will retrieve the yello

That’s it - you’ve created your first Dagster asset! Using the `@dg.asset` decorator, you can easily turn any existing Python function into a Dagster asset.

We can use `dg` again to check our asset:

```bash
> dg check defs
No definitions are defined for this project.
```

And now when we run `dg list defs` our asset will register:
```bash
> dg list defs
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ taxi_trips_file │ default │ │ │ The raw parquet files for the taxi trips dataset. Sourced │
│ │ │ │ │ from the NYC Open Data portal. │
└─────────────────┴─────────┴──────┴───────┴──────────────────────────────────────────────────────────────┘
```

**Questions about the `-> None` bit?** That's a Python feature called **type annotation**. In this case, it's saying that the function returns nothing. You can learn more about type annotations in the [Python documentation](https://docs.python.org/3/library/typing.html). We highly recommend using type annotations in your code to make it easier to read and understand.
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,43 @@ With the basics of materialization out of the way, let’s move on to actually m

---

## Materializing assets using dg

We can use `dg` to execute and launch assets. In order to execute an asset, use the `dg launch` command while providing the asset you wish to execute.

```bash
dg launch --assets taxi_trips_file
```

You will then see the logs as Dagster executes our asset:
```
2025-04-09 13:43:50 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15920 - RUN_START - Started execution of run for "__ASSET_JOB".
2025-04-09 13:43:50 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15920 - ENGINE_EVENT - Executing steps using multiprocess executor: parent process (pid: 15920)
2025-04-09 13:43:50 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15920 - taxi_trips_file - STEP_WORKER_STARTING - Launching subprocess for "taxi_trips_file".
2025-04-09 13:43:51 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - STEP_WORKER_STARTED - Executing step "taxi_trips_file" in subprocess.
2025-04-09 13:43:51 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2025-04-09 13:43:51 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2025-04-09 13:43:51 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - LOGS_CAPTURED - Started capturing logs in process (pid: 15925).
2025-04-09 13:43:51 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - STEP_START - Started execution of step "taxi_trips_file".
2025-04-09 13:43:59 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - STEP_OUTPUT - Yielded output "result" of type "Nothing". (Type check passed).
2025-04-09 13:43:59 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - ASSET_MATERIALIZATION - Materialized value taxi_trips_file.
2025-04-09 13:43:59 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15925 - taxi_trips_file - STEP_SUCCESS - Finished execution of step "taxi_trips_file" in 7.71s.
2025-04-09 13:43:59 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15920 - ENGINE_EVENT - Multiprocess executor: parent process exiting after 8.35s (pid: 15920)
2025-04-09 13:43:59 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15920 - RUN_SUCCESS - Finished execution of run for "__ASSET_JOB".
```

You do not need to follow every line of the output, for now the most important thing is last line that the execution was successful.

```
2025-04-09 13:43:59 -0500 - dagster - DEBUG - __ASSET_JOB - 5b735e96-e6d4-4f37-80f4-641c22ef896d - 15920 - RUN_SUCCESS - Finished execution of run for "__ASSET_JOB".
```

## Materializing assets using the Dagster UI

If you don’t still have the Dagster UI running from Lesson 2, use the command line to run the following command in the root of your Dagster project (the top-level `dagster-university/dagster_essentials` directory):

```bash
dagster dev
dg dev
```

Navigate to [`localhost:3000`](http://localhost:3000/) in your browser. The page should look like the following - if it doesn’t, click **Overview** in the top navigation bar:
Expand Down Expand Up @@ -138,3 +169,9 @@ The page is empty for now, but it’ll look more interesting shortly. Let’s ge
{% /table %}

That’s it! You’ve successfully materialized your first asset! 🎉

## When to use `dg launch` vs `dg dev`

You now know two different ways to launch your asset. You may be wondering which one to use. Luckily there is no wrong answer. You might find it easier to execute an asset with `dg launch` when you need to quickly test something out while you may want to use `dg dev` as your Dagster project becomes more sophisticated.

For the majority of this course we will use `dg dev` to showcase more of the features of Dagster.
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ lesson: '3'

Now that you’re familiar with how assets are materialized and where to find details about their execution, let’s focus on how to troubleshoot issues. To demonstrate how to troubleshoot, you’ll intentionally cause the `taxi_trips_file` asset to fail.

In the `assets/trips.py` file, comment out the `from dagster_essentials.assets import constants` line so it looks like this:
In the `assets/trips.py` file, comment out the `from dagster_essentials.defs.assets import constants` line so it looks like this:

```python
import requests
# from dagster_essentials.assets import constants # <---- Import commented out here
# from dagster_essentials.defs.assets import constants # <---- Import commented out here
import dagster as dg

@dg.asset
Expand Down Expand Up @@ -74,7 +74,7 @@ To home in on what went wrong, let’s take a closer look at the logs. We’ll u

At this point, you can use the stacktrace to identify and fix the cause of the error. In this case, it’s because we didn’t import `constants`, leaving it undefined.

To fix this, uncomment the `from dagster_essentials.assets import constants` line in the `trips.py` file and save it.
To fix this, uncomment the `from dagster_essentials.defs.assets import constants` line in the `trips.py` file and save it.

In the Dagster UI, click **OK** to close the popover window from the run logs.

Expand Down
Loading