Showing 2 open source projects for "java crawler"

View related business solutions
  • Endpoint Protection Software for Businesses | HYPERSECURE Icon
    Endpoint Protection Software for Businesses | HYPERSECURE

    DriveLock protects systems, data, end devices from data loss and misuse.

    The HYPERSECURE endpoint protection platform is a comprehensive suite of products and services enhanced by European third-party solutions. It ensures our customers’ IT security, regulatory compliance, and digital sovereignty.
    Learn More
  • Apify is a full-stack web scraping and automation platform helping anyone get value from the web. Icon
    Apify is a full-stack web scraping and automation platform helping anyone get value from the web.

    Get web data. Build automations.

    Actors are serverless cloud programs that extract data, automate web tasks, and run AI agents. Developers build them using JavaScript, Python, or Crawlee, Apify's open-source library. Build once, publish to Store, and earn when others use it. Thousands of developers do this - Apify handles infrastructure, billing, and monthly payouts.
    Learn More
  • 1
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives† and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB