Automatic Speech Recognition with Word-level Timestamps
Faster Whisper transcription with CTranslate2
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Crowdsourcing platform for full text transcription and tagging
A Web UI for easy subtitle using whisper model
Comprehensive Gradio WebUI for audio processing
Qwen3-ASR is an open-source series of ASR models
Generate blog articles from video or audio
A lightweight audio-to-MIDI converter with pitch bend detection
A Family of Open Sourced Music Foundation Models
Voice Recognition to Text Tool
AI-powered tool for generating, optimizing, and translating subtitles
Synchronized Translation for Videos
AI tool converting video/audio into structured documents instantly
A nearly-live implementation of OpenAI's Whisper
Cut videos with a text editor
The official Python Library for the Groq API
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A robust, efficient, low-latency speech-to-text library
Multilingual speech recognition and audio understanding model
Translate the video from one language to another and embed dubbing
Get your documents ready for gen AI
A text-to-speech, speech-to-text and speech-to-speech library
Open source AI wearable platform for recording and summarizing speech
A python tool that uses GPT-4, FFmpeg, and OpenCV