Skip to content

adrianlyjak/onnx-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Below is a structured, 30-hour (1 hour/day for ~1 month) curriculum designed to get you from basic ONNX exports to deploying a Transformer-based TTS model in the browser. Each “stage” corresponds to a set of consecutive daily tasks. The tasks are broken down so that you can typically complete each in about one hour of focused effort.


Overview of the Stages

  1. Stage 1 (Days 1–6): ONNX Fundamentals with a Simple MLP
  2. Stage 2 (Days 7–12): Convolutional Model Export (ResNet) and Optimization
  3. Stage 3 (Days 13–18): Transformer Basics (DistilBERT) and Dynamic Axes
  4. Stage 4 (Days 19–24): TTS Pipeline Components (Tacotron2/Glow-TTS & Vocoder)
  5. Stage 5 (Days 25–30): Deploying TTS in the Browser (ONNX Runtime Web)

By the end, you will have all the key skills—exporting various model types (MLP, CNN, Transformer, TTS) to ONNX, verifying in Python, optimizing, and finally running inference in a JavaScript browser environment.


Stage 1 (Days 1–6): ONNX Fundamentals with a Simple MLP

Goal

  • Understand ONNX basics: how to export a simple model, load it in Python (ONNX Runtime), and verify correctness.

Suggested Model

  • A small feed-forward MLP in PyTorch (2–3 Linear layers, ReLU activations).

Day 1

Day 2

  • Implement/Load an MLP:
    • In PyTorch, code a simple MLP model (e.g., input=10, hidden=20, output=2).
    • Optionally train briefly on random data to confirm it runs (or just define the model architecture).

Day 3

  • Export to ONNX:
    • Use torch.onnx.export(...) with a dummy input tensor of the correct shape.
    • Save the exported model as mlp.onnx.
    • Pay attention to opset_version and input_names, output_names, dynamic_axes if you plan to handle variable batch sizes.

Day 4

  • Verify in ONNX Runtime:
    • Load mlp.onnx in Python with onnxruntime.InferenceSession(...).
    • Run inference on random inputs and compare output to your original PyTorch model.

Day 5

  • Debug & Validate:
    • Use onnx.checker.check_model(...) to ensure validity.
    • Visualize the ONNX graph in Netron to see how your layers have been exported.

Day 6

  • Recap & Document:
    • Summarize any issues you encountered (e.g., shape mismatches).
    • Confirm you understand each step: define → export → load → verify.

End-of-Stage 1 Outcome

  • You can confidently export a basic feed-forward network to ONNX and verify it in Python.

Stage 2 (Days 7–12): Convolutional Model Export (ResNet) and Optimization

Goal

  • Learn to handle more complex ops (convolutions, batch norm, etc.) and explore basic ONNX optimization.

Suggested Model

  • ResNet18 from TorchVision (pretrained on ImageNet).
  • Compare your exported model to the ONNX Model Zoo’s ResNet for reference.

Day 7

  • Load Pretrained ResNet:
    • resnet18 = torchvision.models.resnet18(pretrained=True) in PyTorch.
    • Test a quick inference in Python to confirm it works.

Day 8

  • Export to ONNX:
    • Similar steps as before: torch.onnx.export(...).
    • Use dynamic axes for the batch dimension if desired.
    • Save as resnet18.onnx.

Day 9

  • Compare to Official ONNX Model:
    • Download the official resnet18 (or resnet50) from the ONNX Model Zoo.
    • In Netron, compare the structure/ops.
    • Verify outputs with the same test image to check for close numerical matches.

Day 10

  • Shape Inference & Simplification:
    • Install/use onnx-simplifier to reduce graph complexity.
    • onnxruntime can also do some optimizations. Explore onnx.shape_inference.infer_shapes(...).

Day 11

  • Optional: Quantization:
    • Investigate dynamic or static quantization.
    • Compare the performance difference in ONNX Runtime between the original and a quantized model.

Day 12

  • Performance & Summary:
    • Measure inference speed in Python for the original vs. simplified vs. quantized (if done).
    • Document your findings.

End-of-Stage 2 Outcome

  • You can export a pretrained ResNet to ONNX, compare to a reference, simplify it, and potentially improve runtime performance.

Stage 3 (Days 13–18): Transformer Basics (DistilBERT) and Dynamic Axes

Goal

  • Get comfortable exporting Transformer architectures.
  • Handle dynamic sequence lengths and advanced ops like Multi-Head Attention.

Suggested Model

  • DistilBERT from Hugging Face Transformers.
  • Smaller and simpler than full BERT, but representative of typical Transformer ops.

Day 13

  • Set Up Hugging Face Transformers:
    • Install transformers via pip.
    • from transformers import DistilBertModel, DistilBertTokenizer
    • Load the pretrained distilbert-base-uncased model and tokenizer.

Day 14

  • Test Inference in Python:
    • Tokenize a sample sentence (e.g. "Hello world!").
    • Pass it through DistilBERT in PyTorch.
    • Inspect output shapes (last hidden states, etc.).

Day 15

  • Export to ONNX:
    • Use transformers.onnx CLI (python -m transformers.onnx) or manually call torch.onnx.export(...).
    • Ensure you specify dynamic axes for the token dimension so it can handle variable sentence lengths.

Day 16

  • Verify with ONNX Runtime:
    • Compare the DistilBERT ONNX inference outputs with the PyTorch outputs for the same inputs.
    • Check for numerical closeness (small floating-point differences are normal).

Day 17

  • Optimize:
    • Use onnxruntime.transformers to fuse attention and layer norm subgraphs.
    • Measure speedups with onnxruntime.InferenceSession(..., providers=["CPUExecutionProvider"]) or GPU if available.

Day 18

  • Wrap Up & Document:
    • Summarize the export process.
    • Note any important flags for Transformers (e.g., opset version, sequence length constraints).

End-of-Stage 3 Outcome

  • You have a working DistilBERT ONNX model with dynamic axes, tested in Python, and optionally optimized.

Stage 4 (Days 19–24): TTS Pipeline Components (Tacotron2/Glow-TTS & Vocoder)

Goal

  • Learn how TTS is often split into two or more models (text → mel spectrogram → waveform).
  • Export and chain TTS models in ONNX.

Suggested Models

  1. Tacotron 2 or Glow-TTS for acoustic model (mel-spectrogram generation).
  2. WaveGlow or HiFi-GAN for the vocoder (mel-spectrogram → waveform).

You can find pretrained checkpoints and partial ONNX export references in:

Day 19

  • Choose & Download Pretrained Models:
    • For example, a pretrained Tacotron 2 model and a WaveGlow model from NVIDIA’s GitHub.
    • Familiarize yourself with the TTS pipeline: text input → phoneme/text encoder → mel spectrogram → audio.

Day 20

  • Acoustic Model Export:
    • If using Tacotron 2, adapt a known ONNX export script or write your own torch.onnx.export(...).
    • Pay attention to dynamic shapes (the length of text tokens, output mel frames).

Day 21

  • Vocoder Export:
    • Export WaveGlow or HiFi-GAN to ONNX.
    • Again, manage dynamic shapes if needed (the number of mel frames can vary).

Day 22

  • Chain in Python:
    • Run text → mel with your ONNX acoustic model.
    • Pass the resulting mel spectrograms into your ONNX vocoder to get waveforms.
    • Save the waveforms to a .wav file to confirm correctness.

Day 23

  • Compare with Original Models:
    • Use the same text input on the original PyTorch models.
    • Compare the mel-spectrogram and final audio for any major discrepancies.

Day 24

  • Refine & Document:
    • Note any performance issues, shape alignment pitfalls, or memory usage considerations.
    • If the pipeline is stable, you have your TTS modules in ONNX!

End-of-Stage 4 Outcome

  • You have a working two-step TTS pipeline (acoustic + vocoder) fully exported to ONNX and tested in Python.

Stage 5 (Days 25–30): Deploying TTS in the Browser (ONNX Runtime Web)

Goal

  • Learn to run ONNX models entirely in JavaScript/TypeScript via WebAssembly/WebGL.
  • Integrate TTS inference into a simple web app that can play audio.

Key Tools

Day 25

  • Set Up a Simple Web Project:
    • Initialize an npm project.
    • Install onnxruntime-web (or onnxruntime-web-gpu if you want WebGL).
    • Start with a plain HTML + JS or a small React/Vue app.

Day 26

  • Test a Simple ONNX Model:
    • First, try loading the resnet18.onnx from Stage 2 in the browser to confirm your environment is configured.
    • Use an image input (or random data) to verify it returns an inference result.

Day 27

  • Load Acoustic & Vocoder Models:
    • Copy over your TTS .onnx files from Stage 4 into the web app’s static/public folder.
    • Load them with ONNX Runtime Web (session = await InferenceSession.create('tacotron2.onnx'), etc.).

Day 28

  • Implement the Text-to-Mel Step:
    • Tokenize/encode text in JavaScript (a simple approach or a precompiled JSON mapping) to pass into the acoustic model.
    • Receive mel spectrogram as an output Float32Array.

Day 29

  • Waveform Generation & Audio Playback:
    • Pass the mel spectrogram to the vocoder session.
    • Convert the output (waveform) to a playable audio buffer.
    • Use the Web Audio API to play the generated audio in the browser.

Day 30

  • Optimize & Finalize:
    • Measure performance. If it’s slow, consider smaller models, quantization, or WebGPU.
    • Confirm end-to-end TTS (text → audio) runs within a reasonable time.
    • (Optional) Add a simple UI (text input box and “Speak” button).

End-of-Stage 5 Outcome

  • You have a functional in-browser TTS demo using ONNX Runtime Web. Users type in text, your acoustic + vocoder models generate audio, and the browser plays it back.

Final Notes & Additional References

  1. Model-Specific Guides

  2. Common Pitfalls

    • Operator Support in the browser can be limited; check if your TTS model uses any unsupported ops in WASM or WebGL.
    • Dynamic Axes: Ensure you carefully define them when exporting; TTS often needs variable sequence lengths.
    • Large Graphs can be slow in JavaScript. Investigate quantization or smaller architectures if real-time is desired.
  3. Time Management

    • Each day’s task should be doable in roughly one hour, but if you find yourself needing more time for debugging, feel free to adapt.
    • Be sure to document each step so you can look back on your progress.

Following this daily plan (1 hour/day for about a month) will steadily build your expertise. You’ll start with simple ONNX exports, then tackle increasingly complex models, culminating in a Transformer-based TTS pipeline deployed in the browser. Good luck!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages