Showing 142 open source projects for "pdf tool python"

View related business solutions
  • Powerful Website Security | Continuous Web Threat Platform Icon
    Powerful Website Security | Continuous Web Threat Platform

    Continuously detect, prioritize, and validate web threats to quickly mitigate security, privacy, and compliance risks.

    Reflectiz is a comprehensive web exposure management platform that helps organizations proactively identify, monitor, and mitigate security, privacy, and compliance risks across their online environments. Designed to address the growing complexity of modern websites, Reflectiz provides full visibility and control over first, third, and even fourth-party components, such as scripts, trackers, and open-source libraries that often evade traditional security tools.
    Learn More
  • Point of Sale. Powerful and Simple. Icon
    Point of Sale. Powerful and Simple.

    For retail store owners and multi-location retail operations needing a tool to manage sales, inventory, staff and channels in one place

    Vibe Retail is an all-in-one retail point-of-sale and operations platform built for single-store and multi-location retailers seeking to unify inventory, sales, staff and customer data from one mobile-friendly interface. The system lets you track inventory across locations and warehouses, handle item variations (size, color, material), manage purchase orders and supplier deliveries, print custom barcodes, and transfer stock between stores in real time. On the sales side, Vibe supports multiple payment types (cards, cash, checks, gift cards, EBT), layaway workflows, serial number tracking, delivery management, loyalty programs and branded receipts. Retailers can integrate with online platforms (such as Shopify and WooCommerce), sync in-store and online sales, access 40+ real-time reports on sales, inventory and performance, set up promotions and discounts, and print receipts from mobile devices.
    Learn More
  • 1
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 19 This Week
    Last Update:
    See Project
  • Corporate Compliance Software | Skillcast Icon
    Corporate Compliance Software | Skillcast

    Trusted by 1,400+ companies to simplify compliance

    Skillcast delivers compliance training and RegTech through a unified Compliance Portal that brings e-learning, Policy Hub (versioning & attestations), staff declarations, compliance registers, CPD/Training 360 and Events Management in one place.
    Learn More
  • 5
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 102 This Week
    Last Update:
    See Project
  • 6
    PDF4QT

    PDF4QT

    Open source PDF editor

    PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts.
    Downloads: 83 This Week
    Last Update:
    See Project
  • 7
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 8
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 11 This Week
    Last Update:
    See Project
  • Unimus makes Network Automation and Configuration Management easy. Icon
    Unimus makes Network Automation and Configuration Management easy.

    Deploying Unimus to manage your entire network requires only minutes, allowing for rapid deployment without headaches.

    We aim to make automation, disaster recovery, change management and configuration auditing painless and affordable for a network of any size.
    Learn More
  • 10
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 254 This Week
    Last Update:
    See Project
  • 11
    RenderCV

    RenderCV

    LaTeX CV generator from a YAML/JSON input file

    RenderCV is a LaTeX CV/resume framework. It allows you to create a high-quality CV as a PDF from a YAML file with full Markdown syntax support and complete control over the LaTeX code. RenderCV offers built-in LaTeX and Markdown templates ready to produce high-quality CVs. However, the templates are entirely arbitrary and can easily be updated to leverage RenderCV's capabilities with your custom CV themes.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    Gotenberg

    Gotenberg

    A Docker-powered stateless API for PDF files

    Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more! Thanks to Docker, you don't have to install each tool in your environments; drop the Docker image in your stack, and you're good to go! The webhook feature allows you to upload the output file to the destination of your choice. There are many options to fit your requirements, from the custom HTTP headers sent to your webhook to the HTTP method used to call it. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 14
    Percollate

    Percollate

    A command-line tool to turn web pages into beautiful, readable PDF

    Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, or HTML files. By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero. By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    JC

    JC

    CLI tool and python library

    CLI tool and python library that converts the output of popular command-line tools and file types to JSON or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts. jc JSONifies the output of many CLI tools and file types for easier parsing in scripts. This allows further command-line processing of output with tools like jq or jello by piping commands.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    PasDoc

    PasDoc

    Documentation tool for ObjectPascal (Free Pascal, Lazarus, Delphi)

    PasDoc is a documentation tool for Pascal and Object Pascal source code. Documentation is generated from comments found in the source code or from external files. Many formatting @-tags are supported. Many output formats are supported, including HTML and LaTeX.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    dvisvgm

    dvisvgm

    A fast DVI, EPS, and PDF to SVG converter

    The command-line utility dvisvgm is a tool for TEX/LATEX users. It converts DVI, EPS, and PDF files to the XML-based vector graphics format SVG. In contrast to bitmap graphics, vector graphics are arbitrarily scalable without loss of quality. All modern web browsers support a large amount of the current SVG standard 1.1. Furthermore, SVG files can also be displayed with the Java-based Squiggle SVG browser which is part of the Apache Batik project, and the free vector graphics editor Inkscape.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    JS-Beautify

    JS-Beautify

    Beautifier for javascript

    js-beautify is a command-line and Python-based tool that beautifies and formats JavaScript, HTML, and CSS code. It helps improve code readability by enforcing consistent indentation and style rules. Widely used in development workflows and CI pipelines, it supports customization through config files and can process both single files and entire projects.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 19
    autopep8

    autopep8

    A tool that automatically formats Python code to conform to the PEP 8

    autopep8 automatically formats Python code to conform to the PEP 8 style guide. It uses the pycodestyle utility to determine what parts of the code need to be formatted. autopep8 is capable of fixing most of the formatting issues that can be reported by pycodestyle. Correct deprecated or non-idiomatic Python code (via lib2to3). Use this for making Python 2.7 code more compatible with Python 3. Put a blank line between a class docstring and its first method declaration. Remove blank lines...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    PDF Split and Merge

    PDF Split and Merge

    Split and merge PDF files on any platform

    Split and merge PDF files with PDFsam, an easy-to-use desktop tool with graphical, command line and web interface.
    Leader badge
    Downloads: 306 This Week
    Last Update:
    See Project
  • 21
    jello

    jello

    CLI tool to filter JSON and JSON Lines data with Python syntax

    Filter JSON and JSON Lines data with Python syntax. jello is similar to jq in that it processes JSON and JSON Lines data except jello uses standard python dict and list syntax. JSON or JSON Lines can be piped into jello via STDIN or can be loaded from a JSON file or JSON Lines files (JSON Lines are automatically slurped into a list of dictionaries). Once loaded, the data is available as a python list or dictionary object named '_'. Processed data can be output as JSON, JSON Lines, bash array...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    circuitikz

    circuitikz

    CircuiTikZ TeX/LaTeX package for drawing circuits

    This package provides a set of macros on top of TikZ for naturally typesetting electrical and electronic networks. It was born mainly for writing Massimo Redaelli's exercise book and exam sheets for the Elettrotecnica courses at Politecnico di Milano, Italy. He wanted a tool that was easy to use, with a lean syntax, native to LaTeX, and supporting direct PDF output format. circuitikz is included with the most common LaTeX systems, so it should work out of the box. Anyway, the main dependency is on TikZ/PGF, xstring and siunitx.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 191 This Week
    Last Update:
    See Project
  • 24
    Rapid LaTeX OCR

    Rapid LaTeX OCR

    Formula recognition based on LaTeX-OCR and ONNXRuntime

    Formula recognition based on LaTeX-OCR and ONNXRuntime. rapid_latex_ocr is a tool to convert formula images to latex format. The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy. The repo only has codes based on ONNXRuntime or OpenVINO inference in onnx format and does not contain training model codes. If you want to train your own model, please move to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Skim

    Skim

    A PDF Reader and Note-taker for OS X

    Skim is a PDF reader and note-taker for OS X. It is designed to help you read and annotate scientific papers in PDF, but is also great for viewing any PDF file. Skim requires OS X 10.10 or higher.
    Leader badge
    Downloads: 10,522 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB