Multimodal Diffusion with Representation Alignment
Inference code for scalable emulation of protein equilibrium ensembles
Pushing the Limits of Mathematical Reasoning in Open Language Models
Industrial-level controllable zero-shot text-to-speech system
Video understanding codebase from FAIR for reproducing video models
Tool for exploring and debugging transformer model behaviors
DeepSeek Coder: Let the Code Write Itself
Qwen-Image is a powerful image generation foundation model
HY-Motion model for 3D character animation generation
Official implementation of DreamCraft3D
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Open Source Speech Language Model
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
General-purpose image editing model that delivers high-fidelity
ICLR2024 Spotlight: curation/training code, metadata, distribution
OCR expert VLM powered by Hunyuan's native multimodal architecture
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Language modeling in a sentence representation space
The ChatGPT Retrieval Plugin lets you easily find personal documents
A SOTA open-source image editing model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Release for Improved Denoising Diffusion Probabilistic Models
High-Resolution Image Synthesis with Latent Diffusion Models