layoutlm-base-uncased download

layoutlm-base-uncased is a multimodal transformer model developed by Microsoft for document image understanding tasks. It incorporates both text and layout (position) features to effectively process structured documents like forms, invoices, and receipts. This base version has 113 million parameters and is pre-trained on 11 million documents from the IIT-CDIP dataset. LayoutLM enables better performance in tasks where the spatial arrangement of text plays a crucial role. The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. It achieves state-of-the-art results in form understanding and information extraction benchmarks. This model is particularly useful for document AI applications like document classification, question answering, and named entity recognition.

Features

Combines text and layout (bounding box) embeddings
Pre-trained on 11 million scanned document images
Supports document image understanding and information extraction
Uses 12 transformer layers with 768 hidden units and 12 attention heads
Trained on the IIT-CDIP 1.0 dataset for 2 epochs
Compatible with Hugging Face Transformers, PyTorch, and TensorFlow
Licensed under the permissive MIT license
Achieves SOTA on datasets like FUNSD and SROIE

Project Samples

Project Activity

See All Activity >

Follow layoutlm-base-uncased

layoutlm-base-uncased Web Site

Other Useful Business Software

Dynamic Work and Complex Project Management Platform | Quickbase

Quickbase is the leading application platform for dynamic work.

Our no-code platform lets you easily create, connect, and customize enterprise applications that fix visibility and workflow gaps without replacing a single system.

Learn More

Rate This Project

User Reviews

Be the first to post a review of layoutlm-base-uncased!

Additional Project Details

Registered

2025-07-02

Similar Business Software

Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
Llama 4 Scout

Llama 4 Scout is a powerful 17 billion active parameter multimodal AI model that excels in both text and image processing. With an industry-leading context length of 10 million tokens, it outperforms its predecessors, including Llama 3, in tasks such as multi-document summarization and parsing...

See Software
GLM-OCR

GLM-OCR is a multimodal optical character recognition model and open source repository that provides accurate, efficient, and comprehensive document understanding by combining text and visual modalities into a unified encoder–decoder architecture derived from the GLM-V family. Built with a...

See Software
GPT-4

GPT-4 (Generative Pre-trained Transformer 4) is a large-scale unsupervised language model, yet to be released by OpenAI. GPT-4 is the successor to GPT-3 and part of the GPT-n series of natural language processing models, and was trained on a dataset of 45TB of text to produce human-like text...

See Software