Showing 18 open source projects for "video ocr"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files.
    Downloads: 65 This Week
    Last Update:
    See Project
  • 2
    Paper2GUI

    Paper2GUI

    Convert AI papers to GUI

    ...让每个人都简单方便的使用前沿人工智能技术 Paper2GUI: An AI desktop APP toolbox for ordinary people. It can be used immediately without installation. It already supports 40+ AI models, covering AI painting, speech synthesis, video frame complementing, video super-resolution, object detection, and image stylization. , OCR recognition and other fields. Support Windows, Mac, Linux systems. Paper2GUI: 一款面向普通人的 AI 桌面 APP 工具箱,免安装即开即用,已支持 40+AI 模型,内容涵盖 AI 绘画、语音合成、视频补帧、视频超分、目标检测、图片风格化、OCR 识别等领域。支持 Windows、Mac、Linux 系统。
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    ...Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. The project provides code, pretrained weights, and inference instructions, making it feasible to deploy locally or on a server, and to integrate with applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    DeepDetect

    DeepDetect

    Deep Learning API and Server in C++14 support for Caffe, PyTorch

    ...While the Open Source Deep Learning Server is the core element, with REST API, and multi-platform support that allows training & inference everywhere, the Deep Learning Platform allows higher level management for training neural network models and using them as if they were simple code snippets. Ready for applications of image tagging, object detection, segmentation, OCR, Audio, Video, Text classification, CSV for tabular data and time series. Neural network templates for the most effective architectures for GPU, CPU, and Embedded devices. Training in a few hours and with small data thanks to 25+ pre-trained models. Full Open Source, with an ecosystem of tools (API clients, video, annotation, ...) ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    ...Qwen3-VL is built for complex tasks such as GUI automation, multimodal coding (converting images or videos into HTML, CSS, JS, or Draw.io diagrams), long-context reasoning with support up to 1M tokens, and comprehensive video understanding. It also brings advanced perception capabilities, including spatial grounding, object recognition, OCR across 32 languages, and robust handling of challenging inputs like low-light or distorted text.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Tarsier

    Tarsier

    Vision utilities for web interaction agents

    ...Furthermore, we've developed an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand. Since current vision-language models still lack fine-grained representations needed for web interaction tasks, this is critical.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    VideoSubFinder
    The main purpose of this program is to provide functionality for extract hardcoded subtitles (hardsub) from video. It provides two main features: 1) Autodetection of frames with hardcoded text (hardsub) on video with saving info about timing positions. 2) Generation of cleared from background text images, which allows with usage of OCR programs (like FineReader, Subtitle Edit, Google Drive) to generate complete subtitles with original text and timing. ...
    Leader badge
    Downloads: 544 This Week
    Last Update:
    See Project
  • 11

    CCTV Frame Timestamp Extractor

    CCTV Footage Timestamp Search Tool

    ...Link to paper: https://link.springer.com/chapter/10.1007/978-3-031-10078-9_8 The project has been divided into four modules: Framextract.py- Extracts frames from video footages Reconstruct.py- Attempts to repair unplayable video by extracting the frames. framestitch.py- Attempts to construct video using frames extracted from unplayable video. OCR.py- Performs image preprocessing & OCR on the extracted frames.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    TagUI

    TagUI

    Free RPA tool by AI Singapore

    ...It's easy to setup and use, and works on Windows, macOS and Linux. Besides English, flows can be written in 22 other languages, so you can do RPA using your native language. Check out this demo video automating data collection in 4 different languages. With the new TagUI turbo mode, you can even run your automation 10X faster than normal human speed! In TagUI language, you use steps like click and type to interact with identifiers, which include web identifiers, image snapshots, screen coordinates, or even text using OCR.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 13
    Visual Novel OCR
    Visual Novel OCR help you to play visual novel in Japanese on PC. IF THIS LINK DOES NOT WORK OR STOPPED MIDWAY, USE GOOGLE LINK ON THE DEMO VIDEO DESCRIPTION: https://youtu.be/AdLwcU03230 If you have any questions, feel free to join our discord group: https://discord.gg/XFbWSjMHJh
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 14
    Sugoi Manga OCR
    Sugoi Manga OCR help you to read manga in Japanese conveniently IF THIS LINK DOES NOT WORK OR STOPPED MIDWAY, USE GOOGLE LINK ON THE DEMO VIDEO DESCRIPTION: https://youtu.be/E4uwPSuCg1U If you have any questions, feel free to join our discord group: https://discord.gg/XFbWSjMHJh
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Linux-Intelligent-Ocr-Solution

    Linux-Intelligent-Ocr-Solution

    Easy-OCR solution and Tesseract trainer for GNU/Linux

    Linux-intelligent-ocr-solution Lios is a free and open source software for converting print in to text using either scanner or a camera, It can also produce text out of scanned images from other sources such as Pdf, Image, Folder containing Images or screenshot. Program is given total accessibility for visually impaired. A Tesseract Trainer GUI is also shipped with this package.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    Subtitle Workshop

    Subtitle Workshop

    Free subtitle editor

    Subtitle Workshop is a free application for creating, editing, and converting text-based subtitle files. It supports all the subtitle formats you need and has all the features you would want.
    Leader badge
    Downloads: 1,256 This Week
    Last Update:
    See Project
  • 17

    Linux for Beagleboard-xm

    A Tailored Small Linux for Beagleboard-xm

    Beagleboard-xm is a powerful chip with a cortex-A8 CPU and a DSP. I have the plan to build an OCR gadget using it with Linux. As a by product I will post my tailored Linux kernel and u-boot, and all relevant stuff here, from now on. I was shocked by the blocking of Chinese citizens from accessing some of the contents on sourceforge. I deeply regret the outrageous action initiated, even though I fully understand the reasoning behind it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SubHub

    SubHub

    Post-OCR correction tool for SRT subtitles

    A post-OCR correction tool for SRT subtitle files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB