A text-to-speech, speech-to-text and speech-to-speech library
Open-source framework for intelligent speech interaction
Oobabooga - The definitive Web UI for local AI, with powerful features
Multi-modal large language model designed for audio understanding
Official Python inference and LoRA trainer package
Large Audio Language Model built for natural interactions
Transforming Multimodal Content into Captivating Multilingual Audio
Audiocraft is a library for audio processing and generation
Streaming Real-time Audio-Driven Avatar Generation
A Family of Open Sourced Music Foundation Models
Implementation of AudioLM audio generation model in Pytorch
AI video generator optimized for low VRAM and older GPUs use
Tokenizer-Free TTS for Multilingual Speech Generation
Open source AI model for generating full songs from lyrics prompts
Taming Stable Diffusion for Lip Sync
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Multimodal Diffusion with Representation Alignment
48khz stereo neural audio codec for general audio
A Python library for audio data augmentation
AI tool converting video/audio into structured documents instantly
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A speech-text foundation model for real time dialogue
Unofficial Python API and agentic skill for Google NotebookLM
Generate audiobooks from EPUBs, PDFs and text with captions
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD