Showing 6174 open source projects for "web site scraper"

View related business solutions
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • The full-stack observability platform that protects your dataLayer, tags and conversion data Icon
    The full-stack observability platform that protects your dataLayer, tags and conversion data

    Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

    Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.
    Learn More
  • 1
    shot-scraper

    shot-scraper

    A command-line utility for taking automated screenshots of websites

    shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    TEAMMATES Developer Web Site

    TEAMMATES Developer Web Site

    This is the project website for the TEAMMATES feedback management tool

    TEAMMATES is a free online tool for managing peer evaluations and other feedback paths of your students. It is provided as a cloud-based service for educators/students and is currently used by hundreds of universities across the world.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Web-Check

    Web-Check

    All-in-one OSINT tool for analysing any website

    Comprehensive, on-demand open source intelligence for any website. Get an insight into the inner-workings of a given website: uncover potential attack vectors, analyse server architecture, view security configurations, and learn what technologies a site is using. Currently the dashboard will show: IP info, SSL chain, DNS records, cookies, headers, domain info, search crawl rules, page map, server location, redirect ledger, open ports, traceroute, DNS security extensions, site performance,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Markdown Site

    Markdown Site

    An open-source publishing framework built for AI agents and developers

    Markdown Site is an open-source publishing framework built to help developers and AI agents quickly ship content-driven websites, blogs, or documentation directly from Markdown files with a seamless sync workflow. It is built on modern web technologies such as React, Convex, and Vite, and integrates real-time syncing so that changes to Markdown content locally instantly propagate to live views without the need to rebuild or redeploy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • 5
    Site Kit for WordPress

    Site Kit for WordPress

    Site Kit is a one-stop solution for WordPress users

    Site Kit is a first-party WordPress plugin that brings key Google services into a single dashboard so site owners can see how their content performs and fix issues without leaving wp-admin. After a guided setup and verification flow, it connects properties to Search Console, Analytics, AdSense, PageSpeed Insights, and other services, surfacing the most relevant metrics per page and per site. The plugin focuses on clarity: traffic sources, search queries, top pages, and monetization signals...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    CommunityScrapers

    CommunityScrapers

    This is a public repository containing scrapers

    Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Scraper of Death
    Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Get full visibility and control over your tasks and projects with Wrike. Icon
    Get full visibility and control over your tasks and projects with Wrike.

    A cloud-based collaboration, work management, and project management software

    Wrike offers world-class features that empower cross-functional, distributed, or growing teams take their projects from the initial request stage all the way to tracking work progress and reporting results.
    Learn More
  • 10
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MeshCentral

    MeshCentral

    A complete web-based remote monitoring and management web site

    The open source, multi-platform, self-hosted, feature-packed web site for remote device management. MeshCentral is a full computer management web site. With MeshCentral, you can run your own web server to remotely manage and control computers on a local network or anywhere on the internet. Once you get the server started, create device group and download and install an agent on each computer you want to manage.
    Downloads: 136 This Week
    Last Update:
    See Project
  • 12
    Free ChatGPT Site List

    Free ChatGPT Site List

    It collects and organizes a wide variety of ChatGPT resources

    Free ChatGPT Site List is an open-source aggregation project that collects and organizes a wide variety of ChatGPT and AI web resources into a single navigable directory. The repository functions primarily as a curated navigation hub where users can discover free AI tools, websites, and services in one place. It was designed to reduce friction for users trying to locate working AI endpoints or utilities across the rapidly changing ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 17
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Fluent UI Web

    Fluent UI Web

    Collection of utilities andcomponents for building web applications

    A collection of UX frameworks for creating beautiful, cross-platform apps that share code, design, and interaction behavior. Build for one platform or for all. Everything you need is here. Build your own apps using the same open source components we do, with accessibility, internationalization, and performance included. From tutorials to a fun collection of API references, find what you need to design and develop your own Fluent experience. From Word and Excel to PowerBI and Teams, many...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 20
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    goclone

    goclone

    Fast CLI tool for cloning entire websites for local browsing offline

    goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the pages offline and navigate between them as if they were viewing the site online. goclone is written in Go and leverages concurrency through Go routines to perform downloads efficiently. goclone can also optionally start a local web server to serve the mirrored files for a more realistic browsing experience. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 22
    Heimdall

    Heimdall

    An Application dashboard and launcher

    As the name suggests Heimdall Application Dashboard is a dashboard for all your web applications. It doesn't need to be limited to applications though, you can add links to anything you like. Heimdall is an elegant solution to organize all your web applications. It’s dedicated to this purpose so you won’t lose your links in a sea of bookmarks. Why not use it as your browser start page? It even has the ability to include a search bar using either Google, Bing or DuckDuckGo. ...
    Downloads: 69 This Week
    Last Update:
    See Project
  • 23
    Plausible Analytics

    Plausible Analytics

    Simple, open-source, lightweight and privacy-friendly web analytics

    Web analytics went from a simple, fun and useful practice for site owners to a data-grabbing machine for surveillance capitalism. Google Analytics is frustrating to use, difficult to understand, slow to load and privacy-invasive too. Plausible Analytics is built for privacy-conscious site owners. You get valuable and actionable stats to help you improve your efforts while your visitors keep having a nice and enjoyable experience.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    Jekyll RDF

    Jekyll RDF

    A Jekyll plugin to include RDF data in your static site

    Transform your RDF Knowledge Graph into static websites and blogs.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    blogdown

    blogdown

    Create Blogs and Websites with R Markdown

    blogdown is an R package that enables the creation and maintenance of static websites and blogs using R Markdown and Hugo (or other static-site generators). Developed by Yihui Xie and team, it provides functions to initialize sites, write posts, manage themes, and deploy with minimal fuss. It seamlessly blends R code chunks and web content, ideal for data storytellers and technical bloggers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB