vision free download - SourceForge

Showing 528 open source projects for "vision"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

Empower Your Workforce and Digitize Your Shop Floor
Benefits to Manufacturers

Easily connect to most tools and equipment on the shop floor, enabling efficient data collection and boosting productivity with vital insights. Turn information into action to generate new ideas and better processes.

Learn More
Build experiences that drive engagement and increase transactions
Connect your users - doctors, gamers, shoppers, or lovers - wherever they are.

Sendbird's chat, voice, and video APIs power conversations and communities in hundreds of the most innovative apps and products. Sendbird’s feature-rich platform, and pre-fab UI components make developers more productive. We take care of a ton of operational complexity under the hood, so you can power a rich chat service, and life-like voice, and video experiences, and not worry about features, edge cases, reliability, or scale.

Learn More
1

LLM Vision

Visual intelligence for your home.

LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
2

Raster Vision

Open source framework for deep learning satellite and aerial imagery

...The input to a Raster Vision pipeline is a set of images and training data, optionally with Areas of Interest (AOIs) that describe where the images are labeled. The output of a Raster Vision pipeline is a model bundle that allows you to easily utilize models in various deployment scenarios.

Downloads: 0 This Week

Last Update: 2024-08-30
See Project
3

AskUI Vision Agent

Enable AI to control your desktop, mobile and HMI devices

...The repository presents a feature overview, sample media, and frequent release notes, which show ongoing improvements such as CORS checks and other operational tweaks. The broader AskUI documentation covers the Python Vision Agent along with suite services and inference APIs, indicating a productized ecosystem rather than a single library. Community-curated lists also recognize Vision Agent as part of the broader “GUI agents” landscape, placing it among other computer-use agents.

Downloads: 10 This Week

Last Update: 1 day ago
See Project
4

Vision Transformer Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA

...Because it stays close to vanilla PyTorch, you can integrate custom datasets and training loops without framework lock-in. It’s widely used as an educational reference for people learning transformers in vision and as a lightweight baseline for research prototypes. The project encourages experimentation—swap optimizers, change augmentations, or plug the transformer backbone into downstream tasks.

Downloads: 8 This Week

Last Update: 2026-02-11
See Project
The Apple Device Management and Security Platform
For IT teams at organizations that run on Apple

Achieve harmony across your Apple device fleet with Kandji's unmatched management and security capabilities.

Learn More
5

Computer Vision in Action

A computer vision closed-loop learning platform

Computer Vision in Action is a practical, example-rich repository that demonstrates real-world applications of computer vision techniques and algorithms in Python, often using OpenCV, deep learning models, and related tooling. It serves as a hands-on companion for learners and engineers who want to understand not just the theory, but how computer vision is actually implemented for tasks like object detection, image classification, feature tracking, optical flow, and image segmentation. ...

Downloads: 0 This Week

Last Update: 2026-02-17
See Project
6

Open Vision Agents by Stream

Build Vision Agents quickly with any model or video provider

Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences.

Downloads: 6 This Week

Last Update: 1 day ago
See Project
7

OpenCV

Open Source Computer Vision Library

OpenCV (Open Source Computer Vision Library) is a comprehensive open-source library for computer vision, machine learning, and image processing. It enables developers to build real-time vision applications ranging from facial recognition to object tracking. OpenCV supports a wide range of programming languages including C++, Python, and Java, and is optimized for both CPU and GPU operations.

Downloads: 40 This Week

Last Update: 2025-12-31
See Project
8

Computer Vision Annotation Tool (CVAT)

Interactive video and image annotation tool for computer vision

Computer Vision Annotation Tool (CVAT) is a free and open source, interactive online tool for annotating videos and images for Computer Vision algorithms. It offers many powerful features, including automatic annotation using deep learning models, interpolation of bounding boxes between key frames, LDAP and more. It is being used by its own professional data annotation team to annotate millions of objects with different properties.

Downloads: 24 This Week

Last Update: 2026-04-02
See Project
9

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 6 This Week

Last Update: 2025-03-13
See Project
Award-winning proxy networks, AI-powered web scrapers, and business-ready datasets for download. 
How the world collects public web data

Bright Data is a leading data collection platform, enabling businesses to collect crucial structured and unstructured data from millions of websites through our proprietary technology. Our proxy networks give you access to sophisticated target sites using precise geo-targeting. You can also use our tools to unblock tough target sites, accomplish SERP-specific data collection tasks, manage and optimize your proxy performance as well as automating all of your data collection needs.

Learn More
10

Kornia

Open Source Differentiable Computer Vision Library

...With Kornia we fill the gap between classical and deep computer vision that implements standard and advanced vision algorithms for AI. Our libraries and initiatives are always according to the community needs.

Downloads: 5 This Week

Last Update: 2025-11-08
See Project
11

DeepSeek VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal

DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. ...

Downloads: 10 This Week

Last Update: 2025-10-03
See Project
12

LearnOpenCV

C++ and Python Examples

...The repository supports beginners and advanced practitioners by offering reproducible code that demonstrates real-world computer vision techniques. Many examples integrate popular frameworks like PyTorch, OpenCV, and ONNX to reflect modern AI workflows. Overall, LearnOpenCV functions as a comprehensive applied learning hub for developers building expertise in computer vision and AI.

Downloads: 4 This Week

Last Update: 2026-04-07
See Project
13

SAM 2

The repository provides code for running inference with SAM 2

...The updated model is optimized for faster inference and lower memory use, enabling real-time interactivity even on larger images or constrained hardware. SAM2 comes with pretrained weights and easy-to-use APIs, enabling developers and researchers to integrate promptable segmentation into annotation tools, vision pipelines, or downstream tasks. The project also includes scripts and notebooks to compare SAM2 against SAM on edge cases, benchmarks showing improvements, and evaluation suites to measure mask quality metrics like IoU and boundary error.

Downloads: 11 This Week

Last Update: 2025-10-06
See Project
14

MESHROOM

3D reconstruction software

...Photography is the projection of a 3D scene onto a 2D plane, losing depth information. The goal of photogrammetry is to reverse this process. The dense modeling of the scene is the result yielded by chaining two computer vision-based pipelines, “Structure-from-Motion” (SfM) and “Multi View Stereo” (MVS). Fusion of Multi-bracketing LDR images into HDR. Alignment of panorama images. Support for fisheye optics. Automatically estimate fisheye circle or manually edit it. Take advantage of motorized-head file. Easy to integrate in your Renderfarm System. Add specific rules to select the most suitable machines regarding CPU, RAM, GPU requirements of each Node.

1 Review

Downloads: 101 This Week

Last Update: 2025-08-19
See Project
15

CVPR 2026

Collection of CVPR 2026 Papers and Open Source Projects

...The project serves as a centralized index that makes it easier for practitioners to explore the latest advances presented at major computer vision conferences. In addition to the current CVPR cycle, the repository also references related lists covering earlier conferences such as ECCV and ICCV, creating a broader archive of vision research.

Downloads: 4 This Week

Last Update: 2026-03-10
See Project
16

R1-V

Witness the aha moment of VLM with less than $3

R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.

Downloads: 0 This Week

Last Update: 2025-03-19
See Project
17

COLMAP

Structure-from-Motion and Multi-View Stereo

COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. It offers a wide range of features for the reconstruction of ordered and unordered image collections. The software is licensed under the new BSD license.

Downloads: 57 This Week

Last Update: 2026-04-06
See Project
18

OpenVINO

OpenVINO™ Toolkit repository

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. ...

Downloads: 44 This Week

Last Update: 2026-03-25
See Project
19

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything

X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for tasks such as object detection, segmentation, classification, tracking, and pose estimation. ...

Downloads: 45 This Week

Last Update: 2026-03-26
See Project
20

TorchIO

Medical imaging toolkit for deep learning

...Transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity.

Downloads: 5 This Week

Last Update: 2026-04-01
See Project
21

SAHI

A lightweight vision library for performing large object detection

A lightweight vision library for performing large-scale object detection & instance segmentation. Object detection and instance segmentation are by far the most important fields of applications in Computer Vision. However, detection of small objects and inference on large images are still major issues in practical usage. Here comes the SAHI to help developers overcome these real-world problems with many vision utilities.

Downloads: 0 This Week

Last Update: 2025-09-28
See Project
22

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

...The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. It implements an agent-style workflow where the system interprets the challenge prompt, selects the appropriate vision model, and generates the required interaction automatically.

Downloads: 6 This Week

Last Update: 2026-03-06
See Project
23

DeepSeek VL

Towards Real-World Vision-Language Understanding

DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot).

Downloads: 11 This Week

Last Update: 2025-10-03
See Project
24

MIVisionX

Set of comprehensive computer vision & machine intelligence libraries

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX delivers highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions along with Convolution Neural Net Model Compiler & Optimizer supporting ONNX, and Khronos NNEF™ exchange formats. The toolkit allows for rapid prototyping and deployment of optimized computer vision and machine learning inference workloads on a wide range of computer hardware, including small embedded x86 CPUs, APUs, discrete GPUs, and heterogeneous servers. ...

Downloads: 1 This Week

Last Update: 2026-02-06
See Project
25

GoogleTest

Google Testing and Mocking Framework

GoogleTest is Google's C++ mocking and test framework. It's used by many internal projects at Google, as well as a number of notable projects such as The Chromium projects, the OpenCV computer vision library, and the LLVM compiler. This GoogleTest project is actually a union of what used to be two separate projects: the old GoogleTest and GoogleMock, an extension of GoogleTest for writing and using C++ mock classes. Since they were so closely related, they were merged to create an even better GoogleTest. GoogleTest features an xUnit test framework, a rich set of assertions, user-defined assertions, death tests, among many others. ...

Downloads: 24 This Week

Last Update: 2025-04-30
See Project