GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
Qwen-Image is a powerful image generation foundation model
Open-source multi-speaker long-form text-to-speech model
Qwen3-TTS is an open-source series of TTS models
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Open-source framework for intelligent speech interaction
Capable of understanding text, audio, vision, video
tiktoken is a fast BPE tokeniser for use with OpenAI's models
A SOTA open-source image editing model
Diversity-driven optimization and large-model reasoning ability
OpenTinker is an RL-as-a-Service infrastructure for foundation models
GLM-4 series: Open Multilingual Multimodal Chat LMs
Long-form streaming TTS system for multi-speaker dialogue generation
Open-weight, large-scale hybrid-attention reasoning model
Qwen3-omni is a natively end-to-end, omni-modal LLM
An AI-powered security review GitHub Action using Claude
Renderer for the harmony response format to be used with gpt-oss
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
State-of-the-art (SoTA) text-to-video pre-trained model
Block Diffusion for Ultra-Fast Speculative Decoding
Chat & pretrained large audio language model proposed by Alibaba Cloud
Tongyi Deep Research, the Leading Open-source Deep Research Agent
FAIR Sequence Modeling Toolkit 2
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity