web site scraper free download

Showing 6174 open source projects for "web site scraper"

View related business solutions

Application Monitoring That Won't Slow Your App Down
AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.

Start Free
The full-stack observability platform that protects your dataLayer, tags and conversion data
Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.

Learn More
1

shot-scraper

A command-line utility for taking automated screenshots of websites

shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
2

TEAMMATES Developer Web Site

This is the project website for the TEAMMATES feedback management tool

TEAMMATES is a free online tool for managing peer evaluations and other feedback paths of your students. It is provided as a cloud-based service for educators/students and is currently used by hundreds of universities across the world.

Downloads: 0 This Week

Last Update: 2024-04-23
See Project
3

Web-Check

All-in-one OSINT tool for analysing any website

Comprehensive, on-demand open source intelligence for any website. Get an insight into the inner-workings of a given website: uncover potential attack vectors, analyse server architecture, view security configurations, and learn what technologies a site is using. Currently the dashboard will show: IP info, SSL chain, DNS records, cookies, headers, domain info, search crawl rules, page map, server location, redirect ledger, open ports, traceroute, DNS security extensions, site performance,...

Downloads: 3 This Week

Last Update: 2024-09-06
See Project
4

Markdown Site

An open-source publishing framework built for AI agents and developers

Markdown Site is an open-source publishing framework built to help developers and AI agents quickly ship content-driven websites, blogs, or documentation directly from Markdown files with a seamless sync workflow. It is built on modern web technologies such as React, Convex, and Vite, and integrates real-time syncing so that changes to Markdown content locally instantly propagate to live views without the need to rebuild or redeploy.

Downloads: 0 This Week

Last Update: 2026-03-21
See Project
The AI workplace management platform
Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.

Learn More
5

Site Kit for WordPress

Site Kit is a one-stop solution for WordPress users

Site Kit is a first-party WordPress plugin that brings key Google services into a single dashboard so site owners can see how their content performs and fix issues without leaving wp-admin. After a guided setup and verification flow, it connects properties to Search Console, Analytics, AdSense, PageSpeed Insights, and other services, surfacing the most relevant metrics per page and per site. The plugin focuses on clarity: traffic sources, search queries, top pages, and monetization signals...

Downloads: 3 This Week

Last Update: 2026-04-01
See Project
6

CommunityScrapers

This is a public repository containing scrapers

Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
7

Scraper of Death

Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)

Downloads: 3 This Week

Last Update: 2026-02-19
See Project
8

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 0 This Week

Last Update: 2026-01-20
See Project
9

html-metadata

MetaData html scraper and parser for Node.js (supports Promises

The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...

Downloads: 1 This Week

Last Update: 2025-04-30
See Project
Get full visibility and control over your tasks and projects with Wrike.
A cloud-based collaboration, work management, and project management software

Wrike offers world-class features that empower cross-functional, distributed, or growing teams take their projects from the initial request stage all the way to tracking work progress and reporting results.

Learn More
10

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...

Downloads: 0 This Week

Last Update: 2024-09-29
See Project
11

MeshCentral

A complete web-based remote monitoring and management web site

The open source, multi-platform, self-hosted, feature-packed web site for remote device management. MeshCentral is a full computer management web site. With MeshCentral, you can run your own web server to remotely manage and control computers on a local network or anywhere on the internet. Once you get the server started, create device group and download and install an agent on each computer you want to manage.

Downloads: 136 This Week

Last Update: 2026-03-25
See Project
12

Free ChatGPT Site List

It collects and organizes a wide variety of ChatGPT resources

Free ChatGPT Site List is an open-source aggregation project that collects and organizes a wide variety of ChatGPT and AI web resources into a single navigable directory. The repository functions primarily as a curated navigation hub where users can discover free AI tools, websites, and services in one place. It was designed to reduce friction for users trying to locate working AI endpoints or utilities across the rapidly changing ecosystem.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
13

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 12 This Week

Last Update: 5 days ago
See Project
14

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 0 This Week

Last Update: 2024-03-02
See Project
15

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...

Downloads: 5 This Week

Last Update: 2025-09-08
See Project
16

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 14 This Week

Last Update: 2026-03-31
See Project
17

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
18

Fluent UI Web

Collection of utilities andcomponents for building web applications

A collection of UX frameworks for creating beautiful, cross-platform apps that share code, design, and interaction behavior. Build for one platform or for all. Everything you need is here. Build your own apps using the same open source components we do, with accessibility, internationalization, and performance included. From tutorials to a fun collection of API references, find what you need to design and develop your own Fluent experience. From Word and Excel to PowerBI and Teams, many...

Downloads: 0 This Week

Last Update: 2026-02-26
See Project
19

MDCx

Movie metadata scraper and organizer for media libraries and NFO

MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...

Downloads: 11 This Week

Last Update: 2026-03-10
See Project
20

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...

Downloads: 9 This Week

Last Update: 2026-01-05
See Project
21

goclone

Fast CLI tool for cloning entire websites for local browsing offline

goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the pages offline and navigate between them as if they were viewing the site online. goclone is written in Go and leverages concurrency through Go routines to perform downloads efficiently. goclone can also optionally start a local web server to serve the mirrored files for a more realistic browsing experience. ...

Downloads: 26 This Week

Last Update: 2026-03-11
See Project
22

Heimdall

An Application dashboard and launcher

As the name suggests Heimdall Application Dashboard is a dashboard for all your web applications. It doesn't need to be limited to applications though, you can add links to anything you like. Heimdall is an elegant solution to organize all your web applications. It’s dedicated to this purpose so you won’t lose your links in a sea of bookmarks. Why not use it as your browser start page? It even has the ability to include a search bar using either Google, Bing or DuckDuckGo. ...

Downloads: 69 This Week

Last Update: 2025-09-15
See Project
23

Plausible Analytics

Simple, open-source, lightweight and privacy-friendly web analytics

Web analytics went from a simple, fun and useful practice for site owners to a data-grabbing machine for surveillance capitalism. Google Analytics is frustrating to use, difficult to understand, slow to load and privacy-invasive too. Plausible Analytics is built for privacy-conscious site owners. You get valuable and actionable stats to help you improve your efforts while your visitors keep having a nice and enjoyable experience.

Downloads: 8 This Week

Last Update: 2026-01-16
See Project
24

Jekyll RDF

A Jekyll plugin to include RDF data in your static site

Transform your RDF Knowledge Graph into static websites and blogs.

Downloads: 5 This Week

Last Update: 2024-06-12
See Project
25

blogdown

Create Blogs and Websites with R Markdown

blogdown is an R package that enables the creation and maintenance of static websites and blogs using R Markdown and Hugo (or other static-site generators). Developed by Yihui Xie and team, it provides functions to initialize sites, write posts, manage themes, and deploy with minimal fuss. It seamlessly blends R code chunks and web content, ideal for data storytellers and technical bloggers.

Downloads: 0 This Week

Last Update: 2026-01-18
See Project