Showing 8 open source projects for "heritrix"

View related business solutions
  • Top Corporate LMS for Training | Best Learning Management Software Icon
    Top Corporate LMS for Training | Best Learning Management Software

    Deliver and Track Online Training and Stay Compliant - with Axis LMS!

    Axis LMS enables you to deliver online and virtual learning and training through a scalable, easy-to-use LMS that is designed to enhance your training, automate your workflows, engage your learners and keep you compliant.
    Learn More
  • Contractor Foreman is the most affordable all-in-one construction management software for contractors and is trusted by contractors in more than 75 countries. Icon
    Contractor Foreman is the most affordable all-in-one construction management software for contractors and is trusted by contractors in more than 75 countries.

    For Residential, Commercial and Public Works Contractors

    Starting at $49/m for the WHOLE company, Contractor Foreman is the most affordable all-in-one construction management system for contractors. Our customers in 75+ countries and industry awards back it up. And it's all backed by a 100 day guarantee.
    Learn More
  • 1
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2

    Offnet

    Program that saves complete web pages retaining multiple timestamps

    ...Project goals: - Web page downloads for less experienced users, including easy setup - Project based page maintanance - Not too plain functions that include also multiple snapshots per project - Iterative, understandable and storage efficient data structure to enable more manual control over stored pages (meta files editable with Easy Folder Morpher) - Retain archived files and query links as original, altering links only during query Current status: - Alpha stadium, archivation quality below Heritrix...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ARCOMEM

    ARCOMEM

    Semantic and social web crawling

    ...Throughout the project a large number of components have been developed to collect content from Web and Social Web, to analyse it from semantic and social perspectives and to enable Web archive access by different facets. The whole system based on the Heritrix crawler is released as open source to the public. Since many components or composite tools are of interest also for other areas and usage scenarios, the ARCOMEM consortium defined a number of pre-packaged tools which can be used independently from each other. By combining all packages the full ARCOMEM system can be build. The following major packages will be released in the coming weeks as pre-compiled packages with source code. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Professional Email Hosting for Small Business | Greatmail Icon
    Professional Email Hosting for Small Business | Greatmail

    Ready to switch to a more reliable and secure email hosting solution?

    Dependable cloud based email hosting with spam filtering, antivirus protection, generous storage and webmail. Compatible with Outlook and all other POP3/IMAP clients. High volume SMTP service for responsible senders. Outbound relay service for transactional messages, email marketing campaigns, newsletters and other applications. Dedicated email servers, clustering and multiple IP load balancing for high volume senders. Fixed monthly cost with unlimited sending and reputation monitoring. Greatmail is an email service provider (ESP) specializing in business class email hosting, SMTP hosting and email servers. For ISPs, application programmers and cloud developers, we also provide custom solutions including dedicated IP servers and process specific, load balanced configurations with multiple servers.
    Learn More
  • 5
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Heritrix expand project
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB