Mega Pipeline App

🎙️ → 📝 → 🗒️ → [🔊🇫🇷] → 🔊

The goal of this tutorial is to build an AI-assisted podcast generator that works across multiple languages. Starting from a recorded draft, we’ll transcribe it, enrich it with an LLM, translate it, and synthesize the result back into audio.

The key idea is to simulate a microservice architecture, where each component runs as its own containerized service. The full pipeline is shown below.

Pavlos recorded a draft podcast in English, which serves as our starting point.
The audio file is transcribed using the Google Cloud Speech-to-Text API.
The resulting text is sent to an LLM to generate an expanded version of the podcast.
The generated text is synthesized into audio with Google Cloud Text-to-Speech.
The text is also translated into French (or another language) using Google Translation services.
The translated text is synthesized into audio again with Google Cloud Text-to-Speech.
Bonus step: The translated text can also be synthesized with ElevenLabs to recreate Pavlos’ voice.

The pipeline flow is illustrated below:

What You’ll Learn

By completing this tutorial, you’ll gain experience with:

Containerizing AI/ML workflows step by step.
Using shared cloud storage (GCS) to connect independent services.
Securing applications with service account authentication.

Group Tasks for the Mega Pipeline

Students will form teams for this project. Each team will build the entire pipeline end to end. You won’t just handle one piece—you’ll containerize and connect every component.

All components and their step-by-step instructions are listed below so you can follow along:

📝 Task A — transcribe_audio
🗒️ Task B — generate_text
🔊 Task C — synthesis_audio_en
🇫🇷 Task D — translate_text
🔊 Task E — synthesis_audio

By the end, every team will have built a complete pipeline that mirrors a real-world microservice architecture: multiple independent services, each containerized, working together to form a larger application.

The overall progress of the mega pipeline can be viewed here.

⚠️ IMPORTANT NOTE

When building your containers, make sure you update the group name inside your configuration.
This is how we track your progress and display it correctly on the leaderboard.

If you don’t change the group name, your work may overwrite someone else’s, or it won’t be visible under your team.
So please double-check before you push or run your containerized tasks!

Connecting the Pipeline Components

In a production pipeline, containerized services talk to each other through APIs, sending requests and responses directly between microservices.

Since we haven’t covered APIs yet, we’ll simplify. Instead of calling one another directly, components will communicate indirectly by writing their outputs to storage, which the next stage will then read as input.

In this tutorial, rather than just using your local disk, components will write to and read from a Google Cloud Storage (GCS) bucket. This shared bucket acts like a common drive for transcripts, generated text, and synthesized audio.

This setup gives you practical hands-on experience now, while preparing you for the API-driven systems we’ll tackle later.

GCS Bucket Details

In Google Cloud, a bucket is like a shared online folder where files can be stored and retrieved. Instead of saving outputs locally, our pipeline components will read and write to this shared bucket so all stages can communicate easily.

input_audios/ — raw audio files (starting point).
text_prompts/ — transcripts generated from speech-to-text.
text_paragraphs/ — expanded text generated by the LLM.
text_translated/ — translated versions of the text.
text_audios/ — synthesized audio clips for each paragraph.
output_audios/ — final audio outputs in French (or another language).
output_audios_pp/ — French audio outputs in Pavlos’ voice.

GCP Credentials File:

The last piece we need in order to access the GCP bucket is authentication. Buckets won’t let you read or write anything unless you are both authenticated (proving who you are) and authorized (having the right permissions).

In this course, you don’t need to authenticate yourself as a person. Instead, you’ll authenticate your app so it can talk to GCP securely. The way we do this is by using a Service Account—part of IAM in GCP.

To keep it simple, you’ll use a JSON credentials file that represents this Service Account. Just download the file and place it inside <app_folder>/secrets/:

mega-pipeline.json

Later in the course, we’ll revisit authentication in more depth, but for now, this file is all you need to let your containerized apps talk to the GCP bucket.

🔑 Important Note on Secrets

We do not want to put this JSON file in GitHub — it is a secret, after all. Make sure the secrets/ folder containing the file is not part of your repo. For this tutorial, we’ve already added a .gitignore entry so the file won’t be pushed accidentally. The canonical (best) way to handle this is to keep your secrets folder outside the repo entirely. That’s what we’ll be moving toward later in the course.

Running the Pipeline Components

Transcribe Audio

python cli.py --download
python cli.py --transcribe
python cli.py --upload

Generate Text

python cli.py --download
python cli.py --generate
python cli.py --upload

Synthesize Audio (English)

python cli.py --download
python cli.py --synthesis

Translate Text

python cli.py --download
python cli.py --translate
python cli.py --upload

Synthesize Audio (Trnslated)

python cli.py --download
python cli.py --synthesis

Sample Code: Read/Write to GCS Bucket

Download from bucket

from google.cloud import storage

# Initiate Storage client
storage_client = storage.Client(project=gcp_project)

# Get reference to bucket
bucket = storage_client.bucket(bucket_name)

# Find all content in a bucket
blobs = bucket.list_blobs(prefix="input_audios/")
for blob in blobs:
    print(blob.name)
    if not blob.name.endswith("/"):
        blob.download_to_filename(blob.name)

Upload to bucket

from google.cloud import storage

# Initiate Storage client
storage_client = storage.Client(project=gcp_project)

# Get reference to bucket
bucket = storage_client.bucket(bucket_name)

# Destination path in GCS 
destination_blob_name = "input_audios/test.mp3"
blob = bucket.blob(destination_blob_name)

blob.upload_from_filename("Path to test.mp3 on local computer")

Sample Dockerfile

# Use the official Debian-hosted Python image
FROM python:3.12-slim-bookworm

ARG DEBIAN_PACKAGES="build-essential curl"

# Prevent apt from showing prompts
ENV DEBIAN_FRONTEND=noninteractive

# Python wants UTF-8 locale
ENV LANG=C.UTF-8

# Tell Python to disable buffering so we don't lose any logs.
ENV PYTHONUNBUFFERED=1

# Tell uv to copy packages from the wheel into the site-packages
ENV UV_LINK_MODE=copy
ENV UV_PROJECT_ENVIRONMENT=/home/app/.venv

# This is done for the tutorial only
ENV GOOGLE_APPLICATION_CREDENTIALS=secrets/mega-pipeline.json

# Ensure we have an up to date baseline, install dependencies and
# create a user so we don't run the app as root
RUN set -ex; \
    for i in $(seq 1 8); do mkdir -p "/usr/share/man/man${i}"; done && \
    apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends $DEBIAN_PACKAGES && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    pip install --no-cache-dir --upgrade pip && \
    pip install uv && \
    useradd -ms /bin/bash app -d /home/app -u 1000 && \
    mkdir -p /app && \
    chown app:app /app

# Switch to the new user
USER app
WORKDIR /app

# Copy the source code
COPY --chown=app:app . ./

RUN uv sync

# Entry point
ENTRYPOINT ["/bin/bash"]
# Get into the uv virtual environment shell
CMD ["-c", "source /home/app/.venv/bin/activate && exec bash"]

Some notes for running on Windows

Docker Win10 installation - needs WSL2 or Hyper-V enabled: https://docs.docker.com/desktop/windows/install/
Use Git BASH to run (which is like a smaller Cygwin)
Needed to add pwd in quotes in order to escape the spaces that common in windows directory structures
Need to prefix docker run with winpty otherwise I get a "the input device is not a TTY." error
winpty docker run --rm -ti --mount type=bind,source="$(pwd)",target=/app generate_text

Solutions

Solutions to this tutorial can be found here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mega Pipeline App

What You’ll Learn

Group Tasks for the Mega Pipeline

Connecting the Pipeline Components

GCS Bucket Details

GCP Credentials File:

Running the Pipeline Components

Sample Code: Read/Write to GCS Bucket

Sample Dockerfile

Some notes for running on Windows

Solutions

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
generate_text		generate_text
synthesis_audio		synthesis_audio
synthesis_audio_en		synthesis_audio_en
transcribe_audio		transcribe_audio
translate_text		translate_text
.gitignore		.gitignore
README.md		README.md
mega-pipeline-bucket.png		mega-pipeline-bucket.png
mega-pipeline-flow.png		mega-pipeline-flow.png

dlops-io/mega-pipeline

Folders and files

Latest commit

History

Repository files navigation

Mega Pipeline App

What You’ll Learn

Group Tasks for the Mega Pipeline

Connecting the Pipeline Components

GCS Bucket Details

GCP Credentials File:

Running the Pipeline Components

Sample Code: Read/Write to GCS Bucket

Sample Dockerfile

Some notes for running on Windows

Solutions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages