Search Results for "data scraper website"

Sort By:

Showing 723 open source projects for "data scraper website"

View related business solutions

The fastest way to host, scale and get paid on WordPress
For developers searching for a web hosting solution

Lightning-fast hosting, AI-assisted site management, and enterprise payments all in one platform designed for agencies and growth-focused businesses.

Learn More
Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution
For Windows-Centric Organizations Looking for Secure File Transfer solutions

Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.

Learn More
1

LLM Scraper

Extract structured data from webpages using LLM-powered scraping

LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. ...

Downloads: 6 This Week

Last Update: 11 hours ago
See Project
2

Linkedin Scraper

A library that scrapes Linkedin for user data

Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways.

Downloads: 9 This Week

Last Update: 5 days ago
See Project
3

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 0 This Week

Last Update: 2024-03-02
See Project
4

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

...Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details. JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.

Downloads: 0 This Week

Last Update: 2024-09-29
See Project
Information Security Made Simple and Affordable | Carbide
For companies requiring a solution to scale their business without incurring security debt

Get expert guidance and smart tools to launch or level up your security and compliance efforts without the complexity.

Learn More
5

Colly

Elegant Scraper and Crawler Framework for Golang

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping.

Downloads: 10 This Week

Last Update: 2025-03-27
See Project
6

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 13 This Week

Last Update: 2026-03-31
See Project
7

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 1 This Week

Last Update: 2026-01-20
See Project
8

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 15 This Week

Last Update: 6 days ago
See Project
9

Automa

A chrome extension for automating your browser by connecting blocks

Automa is a browser extension for browser automation. From auto-fill forms, doing a repetitive task, taking a screenshot, to scraping data of the website, it's up to you what you want to do with this extension. Automa has provided various kinds of blocks that will help you do automation, and all you need to do is connect them. Want your workflow to run every day or every time you visit a specific website? You can set the workflow trigger on the trigger block. Try a workflow from the marketplace. ...

Downloads: 18 This Week

Last Update: 2025-08-11
See Project
Hightouch is a data and AI platform for marketing and personalization.
Marketing needs data and AI. Give them Hightouch.

Find insights, run real-time campaigns, and build AI agents with all your data.

Learn More
10

Matomo

Alternative to Google Analytics that gives you full control over data

Google Analytics alternative that protects your data and your customers' privacy. Take back control with Matomo – a powerful web analytics platform that gives you 100% data ownership. You could lose your customers’ trust and risk damaging your reputation if people learn their data is used for Google’s “own purposes”. By choosing the ethical alternative, Matomo, you won’t make privacy sacrifices or compromise your site.

Downloads: 12 This Week

Last Update: 2026-03-04
See Project
11

Books.jl

Create books with Julia

In a nutshell, this package is meant to generate books (or reports or dashboards) with embedded Julia output. Via Pandoc, the package can live serve a website and build various outputs including a website and PDF. For many standard output types, such as DataFrames and plots, the package can run your code and will automatically handle proper embedding in the output documents, and also try to guess suitable captions and labels. Also, it is possible to work via the live server, which shows...

Downloads: 8 This Week

Last Update: 2024-08-18
See Project
12

watercrawl

AI-ready web crawler that extracts and structures website content

WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...

Downloads: 9 This Week

Last Update: 2026-03-11
See Project
13

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
14

Chart.js

Simple yet flexible JavaScript charting for designers & developers

Chart.js is a Javascript library that allows designers and developers to draw all kinds of charts using the HTML5 canvas element. Chart js offers a great array of simple, clean charts including animated and interactive versions. Chartjs is an easy way to include beautiful and engaging charts into your website for free.

3 Reviews

Downloads: 58 This Week

Last Update: 2025-12-15
See Project
15

Simple.css

Simple.css is a classless CSS template to make a good website

A classless CSS framework that makes semantic HTML look good. By classless I mean that there are no CSS classes anywhere in the CSS or the HTML. So your website can look just like this using plain old vanilla HTML. When starting a new project, I wanted a CSS framework that would get me up and running quickly, and give me something I could hack on. I got sick of all these giant frameworks that include everything but the kitchen sink, 90% of which I’ll never use. For example, the minified CSS...

Downloads: 10 This Week

Last Update: 2025-05-29
See Project
16

Firecrawl

Turn entire websites into LLM-ready markdown or structured data

Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each.

Downloads: 20 This Week

Last Update: 5 days ago
See Project
17

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...

Downloads: 11 This Week

Last Update: 2026-01-05
See Project
18

changedetection.io

The best free open source website change detection and restock service

Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. ...

Downloads: 13 This Week

Last Update: 1 day ago
See Project
19

Ghostery

Ghostery Browser Extension for Firefox, Chrome, Opera and Edge

Ghostery helps you browse smarter by giving you control over ads and tracking technologies to speed up page loads, eliminate clutter, and protect your data. This is the unified code repository for the Ghostery browser extensions in Chrome, Firefox, Opera and Edge. Browse the web safer, faster & with less annoying ads. Equipped with award-winning AI anti-tracking technology to browse the websafe and quickly. Ghostery helps you stay informed about what companies are tracking you by listing the trackers on each website you visit. ...

Downloads: 26 This Week

Last Update: 2026-04-07
See Project
20

Laravel Sharp

Laravel 10+ Content management framework

Sharp is a content management framework, a toolset that provides help to build a CMS section in a website, with some rules in mind. The public website should not have any knowledge of the CMS, the CMS is a part of the system, not the center of it. In fact, removing the CMS should not have any effect on the project. Content administrators should work with their data and terminology, not CMS terms. I mean, if the project is about spaceships, space travels, and pilots, why would the CMS talk about articles, categories, and tags? ...

Downloads: 14 This Week

Last Update: 2026-04-01
See Project
21

sakura

A minimal CSS framework/theme

Just drop in sakura.css to any webpage and go from ugly looking 1900s website to a pretty modern website in literally 0 seconds. Easy to customize and build on top of sakura. Sakura supports extremely easy theming support using variables for duotone color scheming. Comes with several existing themes, and can be found in the CSS folder of this repository. Don't want to develop using sakura, but instead want to use it on websites with outdated 90's design (i.e. no CSS)? Quick prototyping,...

Downloads: 4 This Week

Last Update: 2025-06-24
See Project
22
$LaTeX.CSS$

LaTeX.CSS

LaTeX.css is a library that makes your website look like a LaTeX doc

This almost class-less CSS library turns your HTML document into a website that looks like a LATEX document. Write semantic HTML, and you are good to go. The source code can be found on GitHub. LaTeX.css is a minimal, almost class-less CSS library that makes any website look like a LaTeX document. Add any optional classes to elements with special styles (author subtitle, abstract, lemmas, theorems, etc.). The labels of theorems, definitions, lemmas and proofs can be changed to other...

Downloads: 6 This Week

Last Update: 2025-05-15
See Project
23

AI-Crawler

Crawl a website starting from a URL, find relevant pages

AI Crawler is an experimental AI-powered web crawling and data extraction tool that uses natural language prompts to guide the discovery and retrieval of relevant information across websites. Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure.

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
24

FinMind

Open Data, more than 50 financial data

In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. ...

Downloads: 9 This Week

Last Update: 23 hours ago
See Project
25

101-0250-00

ETH course - Solving PDEs in parallel on GPUs

This course aims to cover state-of-the-art methods in modern parallel Graphical Processing Unit (GPU) computing, supercomputing and code development with applications to natural sciences and engineering.

Downloads: 8 This Week

Last Update: 2026-01-05
See Project