Showing 114 open source projects for "pdf tool python"

View related business solutions
  • Fully managed relational database service for MySQL, PostgreSQL, and SQL Server Icon
    Fully managed relational database service for MySQL, PostgreSQL, and SQL Server

    Focus on your application, and leave the database to us

    Cloud SQL manages your databases so you don't have to, so your business can run without disruption. It automates all your backups, replication, patches, encryption, and storage capacity increases to give your applications the reliability, scalability, and security they need.
    Try for free
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Learn More
  • 1
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 4
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 102 This Week
    Last Update:
    See Project
  • Resco toolkit for building mobile apps Icon
    Resco toolkit for building mobile apps

    A no-code toolkit for building responsive and resilient mobile business applications for Microsoft Power Platform, Dynamics 365, Dataverse and Salesfo

    Deploying mobile apps with Resco takes days, not months—all without writing a single line of code. Workers can download the Resco app from AppStore, Google Play, or Windows Store, log into your company environment, and instantly use the app you have published on any device.
    Learn More
  • 5
    PDF4QT

    PDF4QT

    Open source PDF editor

    PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts.
    Downloads: 83 This Week
    Last Update:
    See Project
  • 6
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 254 This Week
    Last Update:
    See Project
  • The Apple Device Management and Security Platform Icon
    The Apple Device Management and Security Platform

    For IT teams at organizations that run on Apple

    Achieve harmony across your Apple device fleet with Kandji's unmatched management and security capabilities.
    Learn More
  • 10
    RenderCV

    RenderCV

    LaTeX CV generator from a YAML/JSON input file

    RenderCV is a LaTeX CV/resume framework. It allows you to create a high-quality CV as a PDF from a YAML file with full Markdown syntax support and complete control over the LaTeX code. RenderCV offers built-in LaTeX and Markdown templates ready to produce high-quality CVs. However, the templates are entirely arbitrary and can easily be updated to leverage RenderCV's capabilities with your custom CV themes.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Gotenberg

    Gotenberg

    A Docker-powered stateless API for PDF files

    Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more! Thanks to Docker, you don't have to install each tool in your environments; drop the Docker image in your stack, and you're good to go! The webhook feature allows you to upload the output file to the destination of your choice. There are many options to fit your requirements, from the custom HTTP headers sent to your webhook to the HTTP method used to call it. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 13
    Percollate

    Percollate

    A command-line tool to turn web pages into beautiful, readable PDF

    Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, or HTML files. By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero. By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    PasDoc

    PasDoc

    Documentation tool for ObjectPascal (Free Pascal, Lazarus, Delphi)

    PasDoc is a documentation tool for Pascal and Object Pascal source code. Documentation is generated from comments found in the source code or from external files. Many formatting @-tags are supported. Many output formats are supported, including HTML and LaTeX.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    JC

    JC

    CLI tool and python library

    CLI tool and python library that converts the output of popular command-line tools and file types to JSON or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts. jc JSONifies the output of many CLI tools and file types for easier parsing in scripts. This allows further command-line processing of output with tools like jq or jello by piping commands.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    dvisvgm

    dvisvgm

    A fast DVI, EPS, and PDF to SVG converter

    The command-line utility dvisvgm is a tool for TEX/LATEX users. It converts DVI, EPS, and PDF files to the XML-based vector graphics format SVG. In contrast to bitmap graphics, vector graphics are arbitrarily scalable without loss of quality. All modern web browsers support a large amount of the current SVG standard 1.1. Furthermore, SVG files can also be displayed with the Java-based Squiggle SVG browser which is part of the Apache Batik project, and the free vector graphics editor Inkscape.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    JS-Beautify

    JS-Beautify

    Beautifier for javascript

    js-beautify is a command-line and Python-based tool that beautifies and formats JavaScript, HTML, and CSS code. It helps improve code readability by enforcing consistent indentation and style rules. Widely used in development workflows and CI pipelines, it supports customization through config files and can process both single files and entire projects.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 18
    PDF Split and Merge

    PDF Split and Merge

    Split and merge PDF files on any platform

    Split and merge PDF files with PDFsam, an easy-to-use desktop tool with graphical, command line and web interface.
    Leader badge
    Downloads: 306 This Week
    Last Update:
    See Project
  • 19
    circuitikz

    circuitikz

    CircuiTikZ TeX/LaTeX package for drawing circuits

    This package provides a set of macros on top of TikZ for naturally typesetting electrical and electronic networks. It was born mainly for writing Massimo Redaelli's exercise book and exam sheets for the Elettrotecnica courses at Politecnico di Milano, Italy. He wanted a tool that was easy to use, with a lean syntax, native to LaTeX, and supporting direct PDF output format. circuitikz is included with the most common LaTeX systems, so it should work out of the box. Anyway, the main dependency is on TikZ/PGF, xstring and siunitx.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    jello

    jello

    CLI tool to filter JSON and JSON Lines data with Python syntax

    Filter JSON and JSON Lines data with Python syntax. jello is similar to jq in that it processes JSON and JSON Lines data except jello uses standard python dict and list syntax. JSON or JSON Lines can be piped into jello via STDIN or can be loaded from a JSON file or JSON Lines files (JSON Lines are automatically slurped into a list of dictionaries). Once loaded, the data is available as a python list or dictionary object named '_'. Processed data can be output as JSON, JSON Lines, bash array...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 191 This Week
    Last Update:
    See Project
  • 22
    Rapid LaTeX OCR

    Rapid LaTeX OCR

    Formula recognition based on LaTeX-OCR and ONNXRuntime

    Formula recognition based on LaTeX-OCR and ONNXRuntime. rapid_latex_ocr is a tool to convert formula images to latex format. The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy. The repo only has codes based on ONNXRuntime or OpenVINO inference in onnx format and does not contain training model codes. If you want to train your own model, please move to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    OSCAL

    OSCAL

    Open Security Controls Assessment Language (OSCAL)

    NIST is developing the Open Security Controls Assessment Language (OSCAL), a set of hierarchical, XML-, JSON-, and YAML-based formats that provide a standardized representation of information pertaining to the publication, implementation, and assessment of security controls. OSCAL is being developed through a collaborative approach with the public. Public contributions to this project are welcome. With this effort, we are stressing the agile development of a set of minimal formats that are...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 24
    arxiv_latex_cleaner

    arxiv_latex_cleaner

    arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper

    This tool allows you to easily clean the LaTeX code of your paper to submit to arXiv. From a folder containing all your code, e.g. /path/to/latex/, it creates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload to arXiv.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next