ONNX (Open Neural Network Exchange)
An open format for exchanging ML models between different frameworks – train in PyTorch, deploy with TensorRT or CoreML.
ONNX is the universal exchange format for ML models – train in PyTorch, deploy anywhere with up to 5x faster inference through ONNX Runtime.
Explanation
ONNX defines a standard graph for neural networks with over 150 operators. ONNX Runtime is a highly optimized inference engine from Microsoft that runs on CPU, GPU, and NPU.
Marketing Relevance
ONNX eliminates framework lock-in: Models can be freely moved between PyTorch, TensorFlow, and inference engines. ONNX Runtime accelerates inference by 2-5x.
Example
A sentiment model trained in PyTorch is exported to ONNX and deployed with ONNX Runtime – 3x faster inference and cross-platform compatibility.
Common Pitfalls
Not all custom operators are supported. Conversion can introduce numerical deviations. Dynamic shapes require special handling.
Origin & History
Facebook and Microsoft founded ONNX in 2017. ONNX Runtime was open-sourced in 2019 and is now integrated in Windows, Azure, and Office. Version 1.15+ supports LLM inference.
Comparisons & Differences
ONNX (Open Neural Network Exchange) vs. TensorRT
TensorRT is NVIDIA-specific and GPU-optimized; ONNX is framework-agnostic and runs on CPU, GPU, and NPU.
ONNX (Open Neural Network Exchange) vs. GGUF
GGUF is for local LLM inference with llama.cpp; ONNX is a general format for all ML model types.