Showing 87 open source projects for "pdf metadata"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    PDF Signature

    PDF Signature

    Free web software for signing PDFs and also organize pages

    Free web software for signing, organizing, editing metadatas or compressing PDFs.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    Zotero PDF Translate

    Zotero PDF Translate

    Translate PDF, EPub, webpage, metadata, annotations, notes

    Zotero PDF Translate is a plugin for Zotero that enhances the research workflow by enabling in-app translation of PDFs, EPUBs, webpages, and associated metadata directly within the Zotero interface. It integrates seamlessly with Zotero’s document reader, allowing users to select text and instantly receive translations in a pop-up or side panel without leaving the application.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 4
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    ...Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.
    Downloads: 37 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 6
    pdfmake

    pdfmake

    Client/server side PDF printing in pure JavaScript

    ...Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser (client-side) and in Node.js (server-side). PDF name can be defined only by using metadata title property. Add-ons used in browsers can affect the functionality of pdfmake (especially for open() and print()). If pdfmake is not working try disable add-ons in browser. The supported browsers are Internet Explorer 10+, Edge 12+, Firefox, Chrome, Opera and Safari.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 7
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 8
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    ...Vanilla.PDF supports advanced PDF features such as adding CMS (PKCS#7) digital signatures, modifying content streams and metadata, and working with encryption and permissions based on standard PDF security models. It includes tools for parsing PDF internals like cross-reference tables and objects, providing fine-grained document analysis capabilities. The project is unit-tested with continuous integration pipelines, supporting sanitizers for enhanced code quality and stability.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    pandoc-crossref filter

    pandoc-crossref filter

    Pandoc filter for cross-references

    pandoc-crossref is a pandoc filter for numbering figures, equations, tables and cross-references to them. The input file (like demo.md) can be converted into HTML, LaTeX, PDF, Markdown or other formats. Optionally, you can use cleveref for LaTeX/PDF output, e.g. cleveref PDF, cleveref LaTeX, and listings package, e.g. listings PDF, listings LaTeX. This package tries to use LaTeX labels and references if output type is LaTeX. It also tries to supplement rudimentary LaTeX configuration that should mimic metadata configuration by setting header-includes variable. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Komga

    Komga

    Media server for comics/mangas/BDs/magazines/eBooks with API and OPDS

    A media server for your comics, mangas, BDs, magazines, and eBooks. Organize your CBZ, CBR, PDF, and EPUB files in different libraries, collections, or reading lists. Use the integrated Webreader, the Tachiyomi extension, any OPDS reader, or other integrations.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    GROBID

    GROBID

    A machine learning software for extracting information

    ...The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Kavita

    Kavita

    Kavita is a fast, feature rich, cross platform reading server

    ...Quickly resume your reading from your homepage, and get to your reading lists and collections. Serve up Manga/Webtoons/Comics (cbr, cbz, zip/rar, 7zip, raw images) and Books (epub, pdf). First-class responsive readers that work great on any device (phone, tablet, desktop). Dark mode and customizable theming support. Provide hooks into metadata providers to fetch metadata for Comics, Manga, and Books. Metadata should allow for collections, want-to-read integration from 3rd party services, genres. Ability to manage users, access, and ratings.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 221 This Week
    Last Update:
    See Project
  • 15
    KOReader

    KOReader

    An ebook reader application supporting PDF, DjVu, EPUB, FB2, etc.

    KOReader is a document viewer for E Ink devices. Supported fileformats include EPUB, PDF, DjVu, XPS, CBT, CBZ, FB2, PDB, TXT, HTML, RTF, CHM, DOC, MOBI and ZIP files. It’s available for Kindle, Kobo, PocketBook, Android and desktop Linux. Runs on embedded devices (Cervantes, Kindle, Kobo, PocketBook, reMarkable), Android and Linux computers. Developers can run a KOReader emulator in Linux and MacOS. Multi-lingual user interface with a highly customizable reader view and many typesetting...
    Downloads: 93 This Week
    Last Update:
    See Project
  • 16
    Image Toolbox

    Image Toolbox

    Image Toolbox is an powerful picture editor, which can crop

    Image Toolbox is a powerful picture editor, which can crop, apply filters, add some drawings, erase background, edit EXIF, or even create a PDF file.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 17
    Calibre-Web

    Calibre-Web

    Web app for browsing, reading and downloading eBooks stored in Calibre

    ...User Interface in Brazilian, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Khmer, Polish, Russian, simplified and traditional Chinese, Spanish, Swedish, Turkish, Ukrainian. Filter and search by titles, authors, tags, series and language. Support for editing eBook metadata and deleting eBooks from Calibre library. Support for converting eBooks through Calibre binaries. Restrict eBook download to logged-in users. Support for public user registration. Send eBooks to Kindle devices with the click of a button. Support for reading eBooks directly in the browser (.txt, .epub, .pdf, .cbr, .cbt, .cbz, .djvu).
    Downloads: 27 This Week
    Last Update:
    See Project
  • 18
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    HDoujin Downloader

    HDoujin Downloader

    An easy-to-use manga and dōjinshi downloader supporting 800+ webistes

    HDoujin Downloader is a manga and dōjinshi download manager supporting 800+ websites across many different languages.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 20
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 21
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Grimmory

    Grimmory

    Grimmory is the successor of booklore

    Grimmory is a self-hosted digital library management platform designed to help users organize, read, and manage their entire book collection in a centralized and fully controlled environment. As the successor to Booklore, it expands on the idea of personal knowledge ownership by allowing users to store and interact with books without relying on third-party cloud services. The platform supports a wide range of formats, including eBooks, PDFs, comics, and audiobooks, making it versatile for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    shuyuan

    shuyuan

    Reading book source

    ...It likely supports different input formats (text, HTML, PDF), and may integrate optional translation or text normalization tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB