MII makes low-latency and high-throughput inference possible
A robust, efficient, low-latency speech-to-text library
FlashInfer: Kernel Library for LLM Serving
LiteRT is the new name for TensorFlow Lite (TFLite)
Low-latency machine code generation
Personal AI, On Personal Devices
A blazing fast AI Gateway with integrated guardrails
Optimizing inference proxy for LLMs
Faster Whisper transcription with CTranslate2
lightweight, standalone C++ inference engine for Google's Gemma models
Build Vision Agents quickly with any model or video provider
AI gateway with token compression for Claude Code, Codex, and more
A high-performance inference system for large language models
Build and deploy AI Agents on Cloudflare
Moonshot's most powerful AI model
Graph-vector database for building unified AI backends fast
Towards Human-Sounding Speech
Machine learning on FPGAs using HLS
Foundational Models for State-of-the-Art Speech and Text Translation
Alibaba's high-performance LLM inference engine for diverse apps
Parallax is a distributed model serving framework
Deep learning optimization library: makes distributed training easy
Ultra-Efficient LLMs on End Device
C++ library for high performance inference on NVIDIA GPUs
Low-latency REST API for serving text-embeddings