Open-source multi-speaker long-form text-to-speech model
Qwen3-ASR is an open-source series of ASR models
Multi-modal large language model designed for audio understanding
Omnilingual ASR Open-Source Multilingual SpeechRecognition
An Open Source text-to-speech system built by inverting Whisper
SOTA discrete acoustic codec models with 40/75 tokens per second
48khz stereo neural audio codec for general audio
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Audio foundation model excelling in audio understanding
Spark-TTS Inference Code
A PyTorch-based Speech Toolkit
AudioMuse-AI is an Open Source Dockerized environment
Open Source Speech Language Model
Python Audio Analysis Library: Feature Extraction, Classification
kaldi-asr/kaldi is the official location of the Kaldi project
A subtitle generator for Japanese Adult Videos.
VITS2 backbone with multilingual-bert
Headphone Correction and Spatial Audio on Headphones
Intelligent Precision for Vibration Detection
Free, easy to use, lightweight soundboard for Windows
Open source software calculating industrial noise in the environment
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)