GitHub - adrianlyjak/onnx-tutorial

Below is a structured, 30-hour (1 hour/day for ~1 month) curriculum designed to get you from basic ONNX exports to deploying a Transformer-based TTS model in the browser. Each “stage” corresponds to a set of consecutive daily tasks. The tasks are broken down so that you can typically complete each in about one hour of focused effort.

Overview of the Stages

Stage 1 (Days 1–6): ONNX Fundamentals with a Simple MLP
Stage 2 (Days 7–12): Convolutional Model Export (ResNet) and Optimization
Stage 3 (Days 13–18): Transformer Basics (DistilBERT) and Dynamic Axes
Stage 4 (Days 19–24): TTS Pipeline Components (Tacotron2/Glow-TTS & Vocoder)
Stage 5 (Days 25–30): Deploying TTS in the Browser (ONNX Runtime Web)

By the end, you will have all the key skills—exporting various model types (MLP, CNN, Transformer, TTS) to ONNX, verifying in Python, optimizing, and finally running inference in a JavaScript browser environment.

Stage 1 (Days 1–6): ONNX Fundamentals with a Simple MLP

Goal

Understand ONNX basics: how to export a simple model, load it in Python (ONNX Runtime), and verify correctness.

Suggested Model

A small feed-forward MLP in PyTorch (2–3 Linear layers, ReLU activations).

Day 1

Install & Setup:
- Install PyTorch, onnx, onnxruntime, and a visualization tool like Netron.
- Skim the PyTorch ONNX Export Tutorial to see the basic workflow.

Day 2

Implement/Load an MLP:
- In PyTorch, code a simple MLP model (e.g., input=10, hidden=20, output=2).
- Optionally train briefly on random data to confirm it runs (or just define the model architecture).

Day 3

Export to ONNX:
- Use torch.onnx.export(...) with a dummy input tensor of the correct shape.
- Save the exported model as mlp.onnx.
- Pay attention to opset_version and input_names, output_names, dynamic_axes if you plan to handle variable batch sizes.

Day 4

Verify in ONNX Runtime:
- Load mlp.onnx in Python with onnxruntime.InferenceSession(...).
- Run inference on random inputs and compare output to your original PyTorch model.

Day 5

Debug & Validate:
- Use onnx.checker.check_model(...) to ensure validity.
- Visualize the ONNX graph in Netron to see how your layers have been exported.

Day 6

Recap & Document:
- Summarize any issues you encountered (e.g., shape mismatches).
- Confirm you understand each step: define → export → load → verify.

End-of-Stage 1 Outcome

You can confidently export a basic feed-forward network to ONNX and verify it in Python.

Stage 2 (Days 7–12): Convolutional Model Export (ResNet) and Optimization

Goal

Learn to handle more complex ops (convolutions, batch norm, etc.) and explore basic ONNX optimization.

Suggested Model

ResNet18 from TorchVision (pretrained on ImageNet).
Compare your exported model to the ONNX Model Zoo’s ResNet for reference.

Day 7

Load Pretrained ResNet:
- resnet18 = torchvision.models.resnet18(pretrained=True) in PyTorch.
- Test a quick inference in Python to confirm it works.

Day 8

Export to ONNX:
- Similar steps as before: torch.onnx.export(...).
- Use dynamic axes for the batch dimension if desired.
- Save as resnet18.onnx.

Day 9

Compare to Official ONNX Model:
- Download the official resnet18 (or resnet50) from the ONNX Model Zoo.
- In Netron, compare the structure/ops.
- Verify outputs with the same test image to check for close numerical matches.

Day 10

Shape Inference & Simplification:
- Install/use onnx-simplifier to reduce graph complexity.
- onnxruntime can also do some optimizations. Explore onnx.shape_inference.infer_shapes(...).

Day 11

Optional: Quantization:
- Investigate dynamic or static quantization.
- Compare the performance difference in ONNX Runtime between the original and a quantized model.

Day 12

Performance & Summary:
- Measure inference speed in Python for the original vs. simplified vs. quantized (if done).
- Document your findings.

End-of-Stage 2 Outcome

You can export a pretrained ResNet to ONNX, compare to a reference, simplify it, and potentially improve runtime performance.

Stage 3 (Days 13–18): Transformer Basics (DistilBERT) and Dynamic Axes

Goal

Get comfortable exporting Transformer architectures.
Handle dynamic sequence lengths and advanced ops like Multi-Head Attention.

Suggested Model

DistilBERT from Hugging Face Transformers.
Smaller and simpler than full BERT, but representative of typical Transformer ops.

Day 13

Set Up Hugging Face Transformers:
- Install transformers via pip.
- from transformers import DistilBertModel, DistilBertTokenizer
- Load the pretrained distilbert-base-uncased model and tokenizer.

Day 14

Test Inference in Python:
- Tokenize a sample sentence (e.g. "Hello world!").
- Pass it through DistilBERT in PyTorch.
- Inspect output shapes (last hidden states, etc.).

Day 15

Export to ONNX:
- Use transformers.onnx CLI (python -m transformers.onnx) or manually call torch.onnx.export(...).
- Ensure you specify dynamic axes for the token dimension so it can handle variable sentence lengths.

Day 16

Verify with ONNX Runtime:
- Compare the DistilBERT ONNX inference outputs with the PyTorch outputs for the same inputs.
- Check for numerical closeness (small floating-point differences are normal).

Day 17

Optimize:
- Use onnxruntime.transformers to fuse attention and layer norm subgraphs.
- Measure speedups with onnxruntime.InferenceSession(..., providers=["CPUExecutionProvider"]) or GPU if available.

Day 18

Wrap Up & Document:
- Summarize the export process.
- Note any important flags for Transformers (e.g., opset version, sequence length constraints).

End-of-Stage 3 Outcome

You have a working DistilBERT ONNX model with dynamic axes, tested in Python, and optionally optimized.

Stage 4 (Days 19–24): TTS Pipeline Components (Tacotron2/Glow-TTS & Vocoder)

Goal

Learn how TTS is often split into two or more models (text → mel spectrogram → waveform).
Export and chain TTS models in ONNX.

Suggested Models

Tacotron 2 or Glow-TTS for acoustic model (mel-spectrogram generation).
WaveGlow or HiFi-GAN for the vocoder (mel-spectrogram → waveform).

You can find pretrained checkpoints and partial ONNX export references in:

Day 19

Choose & Download Pretrained Models:
- For example, a pretrained Tacotron 2 model and a WaveGlow model from NVIDIA’s GitHub.
- Familiarize yourself with the TTS pipeline: text input → phoneme/text encoder → mel spectrogram → audio.

Day 20

Acoustic Model Export:
- If using Tacotron 2, adapt a known ONNX export script or write your own torch.onnx.export(...).
- Pay attention to dynamic shapes (the length of text tokens, output mel frames).

Day 21

Vocoder Export:
- Export WaveGlow or HiFi-GAN to ONNX.
- Again, manage dynamic shapes if needed (the number of mel frames can vary).

Day 22

Chain in Python:
- Run text → mel with your ONNX acoustic model.
- Pass the resulting mel spectrograms into your ONNX vocoder to get waveforms.
- Save the waveforms to a .wav file to confirm correctness.

Day 23

Compare with Original Models:
- Use the same text input on the original PyTorch models.
- Compare the mel-spectrogram and final audio for any major discrepancies.

Day 24

Refine & Document:
- Note any performance issues, shape alignment pitfalls, or memory usage considerations.
- If the pipeline is stable, you have your TTS modules in ONNX!

End-of-Stage 4 Outcome

You have a working two-step TTS pipeline (acoustic + vocoder) fully exported to ONNX and tested in Python.

Stage 5 (Days 25–30): Deploying TTS in the Browser (ONNX Runtime Web)

Goal

Learn to run ONNX models entirely in JavaScript/TypeScript via WebAssembly/WebGL.
Integrate TTS inference into a simple web app that can play audio.

Key Tools

ONNX Runtime Web
Web Audio API for playback.

Day 25

Set Up a Simple Web Project:
- Initialize an npm project.
- Install onnxruntime-web (or onnxruntime-web-gpu if you want WebGL).
- Start with a plain HTML + JS or a small React/Vue app.

Day 26

Test a Simple ONNX Model:
- First, try loading the resnet18.onnx from Stage 2 in the browser to confirm your environment is configured.
- Use an image input (or random data) to verify it returns an inference result.

Day 27

Load Acoustic & Vocoder Models:
- Copy over your TTS .onnx files from Stage 4 into the web app’s static/public folder.
- Load them with ONNX Runtime Web (session = await InferenceSession.create('tacotron2.onnx'), etc.).

Day 28

Implement the Text-to-Mel Step:
- Tokenize/encode text in JavaScript (a simple approach or a precompiled JSON mapping) to pass into the acoustic model.
- Receive mel spectrogram as an output Float32Array.

Day 29

Waveform Generation & Audio Playback:
- Pass the mel spectrogram to the vocoder session.
- Convert the output (waveform) to a playable audio buffer.
- Use the Web Audio API to play the generated audio in the browser.

Day 30

Optimize & Finalize:
- Measure performance. If it’s slow, consider smaller models, quantization, or WebGPU.
- Confirm end-to-end TTS (text → audio) runs within a reasonable time.
- (Optional) Add a simple UI (text input box and “Speak” button).

End-of-Stage 5 Outcome

You have a functional in-browser TTS demo using ONNX Runtime Web. Users type in text, your acoustic + vocoder models generate audio, and the browser plays it back.

Final Notes & Additional References

Model-Specific Guides
- ONNX Model Zoo has many reference models and example code to compare against.
- Hugging Face Transformers ONNX Docs for additional tips on exporting advanced Transformers.
Common Pitfalls
- Operator Support in the browser can be limited; check if your TTS model uses any unsupported ops in WASM or WebGL.
- Dynamic Axes: Ensure you carefully define them when exporting; TTS often needs variable sequence lengths.
- Large Graphs can be slow in JavaScript. Investigate quantization or smaller architectures if real-time is desired.
Time Management
- Each day’s task should be doable in roughly one hour, but if you find yourself needing more time for debugging, feel free to adapt.
- Be sure to document each step so you can look back on your progress.

Following this daily plan (1 hour/day for about a month) will steadily build your expertise. You’ll start with simple ONNX exports, then tackle increasingly complex models, culminating in a Transformer-based TTS pipeline deployed in the browser. Good luck!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/onnx_tutorial		src/onnx_tutorial
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
KModel.txt		KModel.txt
README.md		README.md
analyzer.ipynb		analyzer.ipynb
bert_visualization		bert_visualization
kororo.ipynb		kororo.ipynb
onnx_export_2025-02-04_17-40-13-611735_pt_export.md		onnx_export_2025-02-04_17-40-13-611735_pt_export.md
onnx_export_2025-02-04_17-42-43-867920_pt_export.md		onnx_export_2025-02-04_17-42-43-867920_pt_export.md
onnx_export_2025-02-04_17-45-37-026952_pt_export.md		onnx_export_2025-02-04_17-45-37-026952_pt_export.md
onnx_export_2025-02-04_17-49-33-048460_pt_export.md		onnx_export_2025-02-04_17-49-33-048460_pt_export.md
onnx_export_2025-02-04_17-50-30-095331_pt_export.md		onnx_export_2025-02-04_17-50-30-095331_pt_export.md
onnx_export_2025-02-04_17-52-29-311996_pt_export.md		onnx_export_2025-02-04_17-52-29-311996_pt_export.md
onnx_export_2025-02-04_17-53-28-263678_pt_export.md		onnx_export_2025-02-04_17-53-28-263678_pt_export.md
onnx_export_2025-02-04_17-56-51-918937_pt_export.md		onnx_export_2025-02-04_17-56-51-918937_pt_export.md
pyproject.toml		pyproject.toml
resnet18.html		resnet18.html
stage1-day1.ipynb		stage1-day1.ipynb
stage1-rest.ipynb		stage1-rest.ipynb
stage2.ipynb		stage2.ipynb
stage3.ipynb		stage3.ipynb
tokenizer.js		tokenizer.js
uv.lock		uv.lock

adrianlyjak/onnx-tutorial

Folders and files

Latest commit

History

Repository files navigation

Overview of the Stages

Stage 1 (Days 1–6): ONNX Fundamentals with a Simple MLP

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Stage 2 (Days 7–12): Convolutional Model Export (ResNet) and Optimization

Day 7

Day 8

Day 9

Day 10

Day 11

Day 12

Stage 3 (Days 13–18): Transformer Basics (DistilBERT) and Dynamic Axes

Day 13

Day 14

Day 15

Day 16

Day 17

Day 18

Stage 4 (Days 19–24): TTS Pipeline Components (Tacotron2/Glow-TTS & Vocoder)

Day 19

Day 20

Day 21

Day 22

Day 23

Day 24

Stage 5 (Days 25–30): Deploying TTS in the Browser (ONNX Runtime Web)

Day 25

Day 26

Day 27

Day 28

Day 29

Day 30

Final Notes & Additional References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages