OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.

Features

  • Parse user interface screenshots into structured and easy-to-understand elements
  • Examples available
  • Enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface
  • Ensure you have the V2 weights downloaded in weights folder
  • Model Weights License

Project Samples

Project Activity

See All Activity >

License

Creative Commons Attribution License

Follow OmniParser

OmniParser Web Site

Other Useful Business Software
Labra enables you to launch your solutions on Azure, AWS, and Google Cloud quickly and seamlessly—without a single line of code. Icon
Labra enables you to launch your solutions on Azure, AWS, and Google Cloud quickly and seamlessly—without a single line of code.

Cloud GTM Without Limits

Labra is designed for cloud businesses, independent software vendors (ISVs), and channel partners looking to streamline their go-to-market strategies, accelerate product listings, and enhance sales efficiency through AI-powered automation and CRM integration. Additionally, it caters to teams seeking to enhance collaboration with cloud providers and partner ecosystems while maintaining control over their sales processes and optimizing their growth potential
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OmniParser!

Additional Project Details

Operating Systems

Windows

Programming Language

Python

Related Categories

Python Agentic AI Tool, Python AI Agent Frameworks, Python AI Agents

Registered

2025-02-18