Offline inference engine for art, real-time voice conversations
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Towards Human-Sounding Speech
Workflow and speech recognition app
LLM Large Model of Selling Anchor
Virtual AI anchor that combines state-of-the-art technology
Python Audio Analysis Library: Feature Extraction, Classification
HivisionIDPhotos: a lightweight and efficient AI ID photos tools
Implementing large models into scenario-based applications
Build Vision Agents quickly with any model or video provider
Full stack framework for building cross-platform mobile AI apps
Advanced AI Explainability for computer vision
Basic Machine Learning Natural Language Processing Roadmap
Refine and quantize messy AI pixel art into clean, perfect pixels
ProStack - a platform for image processing and analysis
SigPack - A signal processing library using Armadillo
Toolkit for working with and mapping geospatial data
Multi-Voice and Prompt-Controlled TTS Engine
Chinese text-to-speech engine
NaveGo: an open source MATLAB/GNU Octave toolbox for processing integr
We provide a PyTorch implementation of the paper Voice Separation
Separate audio recordings into individual sources
VGGFace2 Dataset for Face Recognition
Python crawler that downloads image galleries and analyzes titles