Open speech-to-speech models and pipelines by Hugging Face toolkit AI
A high-quality rapid TTS voice cloning model
Code for openai.fm, a demo for the OpenAI Speech API
Qwen3-TTS is an open-source series of TTS models
A lightweight text-to-speech model with zero-shot voice cloning
A robust, efficient, low-latency speech-to-text library
Framework for building real-time voice and multimodal AI agents
Fast multimodal LLM for real-time voice interaction and AI apps
PersonaPlex code
The python library for real-time communication
Real-time voice interactive digital human
Speakr is a personal, self-hosted web application
Tokenizer-Free TTS for Multilingual Speech Generation
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Controllable & emotion-expressive zero-shot TTS
Industrial-level controllable zero-shot text-to-speech system
Multi-lingual large voice generation model, providing inference
Foundational model for human-like, expressive TTS
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP
Offline inference engine for art, real-time voice conversations
Open source text-to-speech tool, supports extra-long text
Robust Speech Recognition via Large-Scale Weak Supervision
Long-form streaming TTS system for multi-speaker dialogue generation
Workflow and speech recognition app
TTS with kokoro and onnx runtime