KServe
Kubernetes-native model serving framework (formerly KFServing) for standardized, scalable ML inference on Kubernetes.
KServe is the Kubernetes standard framework for ML serving with auto-scaling, scale-to-zero, and multi-framework support.
Explanation
KServe provides a standardized InferenceService CRD for Kubernetes with auto-scaling (including scale-to-zero), canary rollouts, multi-framework support, and ModelMesh for high-density serving.
Marketing Relevance
KServe is the standard for model serving in the Kubeflow and Kubernetes ecosystem.
Common Pitfalls
Kubernetes dependency and expertise required. Knative/Istio as dependency. Debugging in multi-container pods.
Origin & History
KFServing was released in 2019 as part of Kubeflow. In 2021 it was renamed to KServe and migrated to a standalone project. ModelMesh was integrated in 2022 for multi-model serving.
Comparisons & Differences
KServe vs. Seldon Core
Seldon Core offers more enterprise features (explainability, MAB); KServe is more lightweight with better auto-scaling.
KServe vs. Triton Inference Server
Triton is an inference runtime; KServe is an orchestration framework that can use Triton as a backend.