A library for accelerating Transformer models on NVIDIA GPUs
Learn How LLM Transformer Models Work with Interactive Visualization
Tool for exploring and debugging transformer model behaviors
Implementation of Vision Transformer, a simple way to achieve SOTA
Julia Implementation of Transformer models
Fast inference engine for Transformer models
RF-DETR is a real-time object detection and segmentation
Ongoing research training transformer models at scale
PyTorch library of curated Transformer models and their components
ReFT: Representation Finetuning for Language Models
Trained models & code to predict toxic comments
Fast State-of-the-Art Static Embeddings
The most powerful local music generation model
MoBA: Mixture of Block Attention for Long-Context LLMs
Repo for SeedVR2 & SeedVR
Image generation model with single-stream diffusion transformer
Fast and memory-efficient exact attention
Hackable and optimized Transformers building blocks
Foundation Model for Tabular Data
Accelerate local LLM inference and finetuning
Diffusion Transformer with Fine-Grained Chinese Understanding
A multimodal model for brain response prediction
NeurIPS2025 Spotlight] Quantized Attention
A simple but complete full-attention transformer
BitNet: Scaling 1-bit Transformers for Large Language Models