Uncommon Objects in 3D dataset
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Constrained Value Alignment via Safe Reinforcement Learning
Ensure consistency and alignment between different codebases
Qwen3 is the large language model series developed by Qwen team
Automatic Speech Recognition with Word-level Timestamps
High-Performance Face Recognition Library on PaddlePaddle & PyTorch
Open source AI model for generating full songs from lyrics prompts
A dataset consists of 15,140 ChatGPT prompts from Reddit
Video translation and dubbing tool powered by LLMs
One-stop AI digital human system with video voice synthesis tools
A tool to snap pixels to a perfect grid
Multimodal-Driven Architecture for Customized Video Generation
Analyze computation-communication overlap in V3/R1
Recipes to train reward model for RLHF
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Course to get into Large Language Models (LLMs)
Pluggable SOTA multi-object tracking modules for segmentation
Synchronized Translation for Videos
SOTA Open Source TTS
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
HivisionIDPhotos: a lightweight and efficient AI ID photos tools
Pretrained (Language) Models for Probabilistic Time Series Forecasting
Java interface to OpenCV, FFmpeg, and more