MII makes low-latency and high-throughput inference possible
A robust, efficient, low-latency speech-to-text library
FlashInfer: Kernel Library for LLM Serving
LiteRT is the new name for TensorFlow Lite (TFLite)
Personal AI, On Personal Devices
Low-latency machine code generation
Optimizing inference proxy for LLMs
A blazing fast AI Gateway with integrated guardrails
lightweight, standalone C++ inference engine for Google's Gemma models
Faster Whisper transcription with CTranslate2
Build Vision Agents quickly with any model or video provider
AI gateway with token compression for Claude Code, Codex, and more
Build and deploy AI Agents on Cloudflare
A high-performance inference system for large language models
Towards Human-Sounding Speech
Graph-vector database for building unified AI backends fast
Moonshot's most powerful AI model
Alibaba's high-performance LLM inference engine for diverse apps
One CLAUDE.md file. Keeps Claude responses terse
Machine learning on FPGAs using HLS
Foundational Models for State-of-the-Art Speech and Text Translation
Parallax is a distributed model serving framework
Deep learning optimization library: makes distributed training easy
Ultra-Efficient LLMs on End Device
Fast multimodal LLM for real-time voice interaction and AI apps