The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
-
Updated
May 7, 2026 - Python
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Local AI workflow orchestration and runtime management framework
AI inference platform architecture lab demonstrating admission control, fairness scheduling, bounded queues, and graceful degradation under burst traffic.
A conceptual framework for a high-scale Agentic AI orchestrator, inspired by enterprise-grade inference platforms.
Add a description, image, and links to the inference-platform topic page so that developers can more easily learn about it.
To associate your repository with the inference-platform topic, visit your repo's landing page and select "manage topics."