SOTA Open Source TTS
Open-source framework for intelligent speech interaction
LLM-based Reinforcement Learning audio edit model
Multilingual speech recognition and audio understanding model
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Towards Human-Sounding Speech
Controllable & emotion-expressive zero-shot TTS
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Tokenizer-Free TTS for Multilingual Speech Generation
A TTS model capable of generating ultra-realistic dialogue
Interface for OuteTTS models
Instant voice cloning by MIT and MyShell. Audio foundation model
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
SoTA open-source TTS
Multi-modal large language model designed for audio understanding
Maimaibot, a (more focused) multi-platform intelligent agent
VITS2 backbone with multilingual-bert
Amica is an open source interface for interactive communication
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)