-
Notifications
You must be signed in to change notification settings - Fork 432
Open
Description
Great library! I inspected the models with Netron, and noticed there was some room for optimization. Using a tool called onnxslim, you can remove a bunch of redundant nodes (nothing major, but can save some loading time). One benefit of these optimized models is that they remove all Shape
ops, which - when run on WebGPU w/ ONNX Runtime Web - must be run on CPU. So, now, there aren't any Shape ops left, meaning less data transfer between CPU and GPU. I've uploaded the optimized models here. Feel free to adapt into this project too, if you'd like! 🤗
fp32
fp16
quint8
Metadata
Metadata
Assignees
Labels
No labels