Showing 3 open source projects for "mazilla html parser"

View related business solutions
  • Optimize every aspect of hiring with Greenhouse Recruiting Icon
    Optimize every aspect of hiring with Greenhouse Recruiting

    Hire for what's next.

    What’s next for many of us is changing. Your company’s ability to hire great talent is as important as ever – so you’ll be ready for whatever’s ahead. Whether you need to scale your team quickly or improve your hiring process, Greenhouse gives you the right technology, know-how and support to take on what’s next.
    Learn More
  • Papirfly: Best user-friendly DAM and Content Creation Software Icon
    Papirfly: Best user-friendly DAM and Content Creation Software

    The #1 solution to create and manage content. On‑brand. At scale.

    Papirfly provides a single online destination for all your employees and other stakeholders who are engaging with your brand, ensuring consistency in all aspects of their communications. Teams can produce infinite studio-standard marketing materials from bespoke templates, store, share and adapt them for their own markets and stay firmly educated on the brand’s purpose, guidelines and evolution – with no specialist skills or agency help necessary.
    Learn More
  • 1
    BudouX

    BudouX

    Standalone, small, language-neutral

    Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning-powered line break organizer tool. It is standalone. It works with no dependency on third-party word segmenters such as Google cloud natural language API. It is small. It takes only around 15 KB including its machine learning model. It's reasonable to use it even on the client-side. It is language-neutral. You can train a model for any language by feeding a dataset to BudouX’s training...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB