pdf metadata free download

Showing 87 open source projects for "pdf metadata"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

OpenDataLoader PDF

PDF Parser for AI-ready data. Automate PDF accessibility

OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes.

Downloads: 10 This Week

Last Update: 2026-04-03
See Project
2

PDF Signature

Free web software for signing PDFs and also organize pages

Free web software for signing, organizing, editing metadatas or compressing PDFs.

Downloads: 9 This Week

Last Update: 2025-12-03
See Project
3

Zotero PDF Translate

Translate PDF, EPub, webpage, metadata, annotations, notes

Zotero PDF Translate is a plugin for Zotero that enhances the research workflow by enabling in-app translation of PDFs, EPUBs, webpages, and associated metadata directly within the Zotero interface. It integrates seamlessly with Zotero’s document reader, allowing users to select text and instantly receive translations in a pop-up or side panel without leaving the application.

Downloads: 20 This Week

Last Update: 2026-03-20
See Project
4

PDFPatcher

A versatile toolkit for PDF manipulation

...Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.

Downloads: 37 This Week

Last Update: 2025-08-14
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

PDFCraft

PDFCraft is a free, privacy-focused PDF toolkit

PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.

Downloads: 14 This Week

Last Update: 2026-04-07
See Project
6

pdfmake

Client/server side PDF printing in pure JavaScript

...Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser (client-side) and in Node.js (server-side). PDF name can be defined only by using metadata title property. Add-ons used in browsers can affect the functionality of pdfmake (especially for open() and print()). If pdfmake is not working try disable add-ons in browser. The supported browsers are Internet Explorer 10+, Edge 12+, Firefox, Chrome, Opera and Safari.

Downloads: 19 This Week

Last Update: 2026-03-17
See Project
7

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 15 This Week

Last Update: 2026-04-07
See Project
8

Vanilla.PDF

Cross-platform SDK for creating and modifying PDF documents

...Vanilla.PDF supports advanced PDF features such as adding CMS (PKCS#7) digital signatures, modifying content streams and metadata, and working with encryption and permissions based on standard PDF security models. It includes tools for parsing PDF internals like cross-reference tables and objects, providing fine-grained document analysis capabilities. The project is unit-tested with continuous integration pipelines, supporting sanitizers for enhanced code quality and stability.

Downloads: 6 This Week

Last Update: 2026-03-17
See Project
9

PyPDF

A pure-python PDF library capable of splitting, merging, cropping

pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.

Downloads: 6 This Week

Last Update: 17 hours ago
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

pandoc-crossref filter

Pandoc filter for cross-references

pandoc-crossref is a pandoc filter for numbering figures, equations, tables and cross-references to them. The input file (like demo.md) can be converted into HTML, LaTeX, PDF, Markdown or other formats. Optionally, you can use cleveref for LaTeX/PDF output, e.g. cleveref PDF, cleveref LaTeX, and listings package, e.g. listings PDF, listings LaTeX. This package tries to use LaTeX labels and references if output type is LaTeX. It also tries to supplement rudimentary LaTeX configuration that should mimic metadata configuration by setting header-includes variable. ...

Downloads: 5 This Week

Last Update: 2026-02-08
See Project
11

Komga

Media server for comics/mangas/BDs/magazines/eBooks with API and OPDS

A media server for your comics, mangas, BDs, magazines, and eBooks. Organize your CBZ, CBR, PDF, and EPUB files in different libraries, collections, or reading lists. Use the integrated Webreader, the Tachiyomi extension, any OPDS reader, or other integrations.

Downloads: 12 This Week

Last Update: 2026-03-27
See Project
12

GROBID

A machine learning software for extracting information

...The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). All the usual publication metadata are covered (including DOI, PMID, etc.).

Downloads: 7 This Week

Last Update: 2026-04-07
See Project
13

Kavita

Kavita is a fast, feature rich, cross platform reading server

...Quickly resume your reading from your homepage, and get to your reading lists and collections. Serve up Manga/Webtoons/Comics (cbr, cbz, zip/rar, 7zip, raw images) and Books (epub, pdf). First-class responsive readers that work great on any device (phone, tablet, desktop). Dark mode and customizable theming support. Provide hooks into metadata providers to fetch metadata for Comics, Manga, and Books. Metadata should allow for collections, want-to-read integration from 3rd party services, genres. Ability to manage users, access, and ratings.

Downloads: 12 This Week

Last Update: 2026-01-18
See Project
14

Pandoc

The universal markup converter

Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...

Downloads: 221 This Week

Last Update: 2026-03-19
See Project
15

KOReader

An ebook reader application supporting PDF, DjVu, EPUB, FB2, etc.

KOReader is a document viewer for E Ink devices. Supported fileformats include EPUB, PDF, DjVu, XPS, CBT, CBZ, FB2, PDB, TXT, HTML, RTF, CHM, DOC, MOBI and ZIP files. It’s available for Kindle, Kobo, PocketBook, Android and desktop Linux. Runs on embedded devices (Cervantes, Kindle, Kobo, PocketBook, reMarkable), Android and Linux computers. Developers can run a KOReader emulator in Linux and MacOS. Multi-lingual user interface with a highly customizable reader view and many typesetting...

Downloads: 93 This Week

Last Update: 2026-03-17
See Project
16

Image Toolbox

Image Toolbox is an powerful picture editor, which can crop

Image Toolbox is a powerful picture editor, which can crop, apply filters, add some drawings, erase background, edit EXIF, or even create a PDF file.

Downloads: 25 This Week

Last Update: 7 days ago
See Project
17

Calibre-Web

Web app for browsing, reading and downloading eBooks stored in Calibre

...User Interface in Brazilian, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Khmer, Polish, Russian, simplified and traditional Chinese, Spanish, Swedish, Turkish, Ukrainian. Filter and search by titles, authors, tags, series and language. Support for editing eBook metadata and deleting eBooks from Calibre library. Support for converting eBooks through Calibre binaries. Restrict eBook download to logged-in users. Support for public user registration. Send eBooks to Kindle devices with the click of a button. Support for reading eBooks directly in the browser (.txt, .epub, .pdf, .cbr, .cbt, .cbz, .djvu).

Downloads: 27 This Week

Last Update: 2026-02-08
See Project
18

PaperQA2

High accuracy RAG for answering questions from scientific documents

PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...

Downloads: 4 This Week

Last Update: 2026-03-18
See Project
19

HDoujin Downloader

An easy-to-use manga and dōjinshi downloader supporting 800+ webistes

HDoujin Downloader is a manga and dōjinshi download manager supporting 800+ websites across many different languages.

Downloads: 24 This Week

Last Update: 3 days ago
See Project
20

Papermerge

Open Source Document Management System for Digital Archives

...Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. ...

Downloads: 20 This Week

Last Update: 2025-07-24
See Project
21

kb

A minimalist command line knowledge base manager

kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...

Downloads: 7 This Week

Last Update: 2026-02-16
See Project
22

Grimmory

Grimmory is the successor of booklore

Grimmory is a self-hosted digital library management platform designed to help users organize, read, and manage their entire book collection in a centralized and fully controlled environment. As the successor to Booklore, it expands on the idea of personal knowledge ownership by allowing users to store and interact with books without relying on third-party cloud services. The platform supports a wide range of formats, including eBooks, PDFs, comics, and audiobooks, making it versatile for...

Downloads: 3 This Week

Last Update: 2026-03-25
See Project
23

shuyuan

Reading book source

...It likely supports different input formats (text, HTML, PDF), and may integrate optional translation or text normalization tools.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
24

Extractous

Fast and efficient unstructured data extraction

Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. ...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
25

ArXiv MCP Server

A Model Context Protocol server for searching and analyzing arXiv

arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...

Downloads: 2 This Week

Last Update: 2026-04-06
See Project