A GUI tool for extracting hard-coded subtitle (hardsub) from videos
Real-World Centric Foundation GUI Agents
A state-of-the-art open visual language model
Framework and no-code GUI for fine-tuning LLMs
UI-TARS-desktop version that can operate on your local personal device
An open sourced end-to-end VLM-based GUI Agent
Generate audiobooks from e-books, voice cloning & 1107+ languages
Agent framework and applications built upon Qwen>=3.0
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Agent S: an open agentic framework that uses computers like a human
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Real-time behaviour synthesis with MuJoCo, using Predictive Control
All-in-one web-based IDE specialized for machine learning
A low code unified framework for computer vision and deep learning