Hunyuan Translation Model Version 1.5
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Inference script for Oasis 500M
A Production-ready Reinforcement Learning AI Agent Library
PyTorch code and models for the DINOv2 self-supervised learning
Memory-efficient and performant finetuning of Mistral's models
Official implementation of DreamCraft3D
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen3-omni is a natively end-to-end, omni-modal LLM
Diversity-driven optimization and large-model reasoning ability
Chinese and English multimodal conversational language model
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Renderer for the harmony response format to be used with gpt-oss
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
code for Mesh R-CNN, ICCV 2019
Large Multimodal Models for Video Understanding and Editing
Genome modeling and design across all domains of life
Pretrained time-series foundation model developed by Google Research