Showing 33 open source projects for "spark"

View related business solutions
  • Inventory and Order Management Software for Multichannel Sellers Icon
    Inventory and Order Management Software for Multichannel Sellers

    Avoid stockouts, overselling, and losing control as your business grows.

    We are the most powerful inventory and order management platform for Amazon, Walmart, and multichannel product sellers. Centralize orders, product information, and fulfillment operations to run more efficiently, sell more products, and stay compliant with marketplace requirements so you can grow profitably.
    Learn More
  • Securden Privileged Account Manager Icon
    Securden Privileged Account Manager

    Unified Privileged Access Management

    Discover and manage administrator, service, and web app passwords, keys, and identities. Automate management with approval workflows. Centrally control, audit, monitor, and record all access to critical IT assets.
    Learn More
  • 1
    Spark NLP

    Spark NLP

    State of the Art Natural Language Processing

    Experience the power of large language models like never before, unleashing the full potential of Natural Language Processing (NLP) with Spark NLP, the open source library that delivers scalable LLMs. The full code base is open under the Apache 2.0 license, including pre-trained models and pipelines. The only NLP library built natively on Apache Spark. The most widely used NLP library in the enterprise. Spark ML provides a set of machine learning applications that can be built using two main components, estimators and transformers. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    Spark TTS is an open-source, PyTorch-based text-to-speech inference system that leverages large language models to produce highly natural, intelligible speech from text input. It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    fugue

    fugue

    A unified interface for distributed computing

    Fugue is a unified interface for distributed computing that lets users execute Python, Pandas, and SQL code on Spark, Dask, and Ray with minimal rewrites.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Iris Powered By Generali - Iris puts your customer in control of their identity. Icon
    Iris Powered By Generali - Iris puts your customer in control of their identity.

    Increase customer and employee retention by offering Onwatch identity protection today.

    Iris Identity Protection API sends identity monitoring and alerts data into your existing digital environment – an ideal solution for businesses that are looking to offer their customers identity protection services without having to build a new product or app from scratch.
    Learn More
  • 5
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    ...These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. For production-grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Daft

    Daft

    Distributed DataFrame for Python designed for the cloud

    ...Underneath its Python API, Daft is built in blazing fast Rust code. Rust powers Daft’s vectorized execution and async I/O, allowing Daft to outperform frameworks such as Spark.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 7
    Bytewax

    Bytewax

    Python Stream Processing

    Bytewax is a Python framework that simplifies event and stream processing. Because Bytewax couples the stream and event processing capabilities of Flink, Spark, and Kafka Streams with the friendly and familiar interface of Python, you can re-use the Python libraries you already know and love. Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    mlforecast

    mlforecast

    Scalable machine learning for time series forecasting

    ...It supports multi-series forecasting, meaning you can train one model that forecasts many time series at once (common in retail, demand forecasting, etc.), rather than one model per series. The library is built to scale: behind the scenes, it can leverage distributed computing frameworks (Spark, Dask, Ray) when datasets or the number of series grow large.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    dtreeviz

    dtreeviz

    Python library for decision tree visualization & model interpretation

    A python library for decision tree visualization and model interpretation. Decision trees are the fundamental building block of gradient boosting machines and Random Forests(tm), probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. The visualizations are inspired by an educational animation by R2D3; A visual introduction to machine learning. Please see How to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • 10
    ChatALL

    ChatALL

    Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vincuna, etc.

    Concurrently chat with ChatGPT, Bing Chat, bard, Alpaca, Vincuna, Claude, ChatGLM, MOSS, iFlytek Spark, ERNIE and more, discover the best answers. Large Language Models (LLMs) based AI bots are amazing. However, their behavior can be random and different bots excel at different tasks. If you want the best experience, don't try them one by one. ChatALL (Chinese name: 齐叨) can send prompt to several AI bots concurrently, help you to discover the best results.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    Angel

    Angel

    A Flexible and Powerful Parameter Server for large-scale ML

    ...With a model-centered core design concept, Angel partitions the parameters of complex models into multiple parameter-server nodes and implements a variety of machine learning algorithms and graph algorithms using efficient model-updating interfaces and functions, as well as a flexible consistency model for synchronization. Angel is developed with Java and Scala. It supports running on Yarn. With PS Service abstraction, it supports Spark on Angel.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    MLflow

    MLflow

    Open source platform for the machine learning lifecycle

    MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud).
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    DoWhy

    DoWhy

    DoWhy is a Python library for causal inference

    ...DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks. Much like machine learning libraries have done for prediction, DoWhy is a Python library that aims to spark causal thinking and analysis. DoWhy provides a wide variety of algorithms for effect estimation, causal structure learning, diagnosis of causal structures, root cause analysis, interventions and counterfactuals. DoWhy builds on two of the most powerful frameworks for causal inference: graphical causal models and potential outcomes. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    StatsForecast

    StatsForecast

    Fast forecasting with statistical and econometric models

    StatsForecast is a Python library for time-series forecasting that delivers a suite of classical statistical and econometric forecasting models optimized for high performance and scalability. It is designed not just for academic experiments but for production-level time-series forecasting, meaning it handles forecasting for many series at once, efficiently, reliably, and with minimal overhead. The library implements a broad set of models, including AutoARIMA, ETS, CES, Theta, plus a battery...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 15
    omegaml

    omegaml

    MLOps simplified. From ML Pipeline ⇨ Data Product without the hassle

    omega|ml is the innovative Python-native MLOps platform that provides a scalable development and runtime environment for your Data Products. Works from laptop to cloud.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    BentoML

    BentoML

    Unified Model Serving Framework

    BentoML simplifies ML model deployment and serves your models at a production scale. Support multiple ML frameworks natively: Tensorflow, PyTorch, XGBoost, Scikit-Learn and many more! Define custom serving pipeline with pre-processing, post-processing and ensemble models. Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Smile

    Smile

    Statistical machine intelligence and learning engine

    Smile is a fast and comprehensive machine learning engine. With advanced data structures and algorithms, Smile delivers the state-of-art performance. Compared to this third-party benchmark, Smile outperforms R, Python, Spark, H2O, xgboost significantly. Smile is a couple of times faster than the closest competitor. The memory usage is also very efficient. If we can train advanced machine learning models on a PC, why buy a cluster? Write applications quickly in Java, Scala, or any JVM languages. Data scientists and developers can speak the same language now! ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Horovod

    Horovod

    Distributed training framework for TensorFlow, Keras, PyTorch, etc.

    ...With Horovod, an existing training script can be scaled up to run on hundreds of GPUs in just a few lines of Python code. Horovod can be installed on-premise or run out-of-the-box in cloud platforms, including AWS, Azure, and Databricks. Horovod can additionally run on top of Apache Spark, making it possible to unify data processing and model training into a single pipeline. Once Horovod has been configured, the same infrastructure can be used to train models with any framework, making it easy to switch between TensorFlow, PyTorch, MXNet, and future frameworks as machine learning tech stacks continue to evolve. Start scaling your model training with just a few lines of Python code. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    aqueduct LLM

    aqueduct LLM

    Aqueduct allows you to run LLM and ML workloads on any infrastructure

    ...You can connect Aqueduct to your existing cloud infrastructure (docs), and Aqueduct will seamlessly move your code from your laptop to the cloud or between different cloud infrastructure layers. Aqueduct provides a single interface to running machine learning tasks on your existing cloud infrastructure — Kubernetes, Spark, Lambda, etc. From the same Python API, you can run code across any or all of these systems seamlessly and gain visibility into how your code is performing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Elephas

    Elephas

    Distributed Deep learning with Keras & Spark

    Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark. Elephas currently supports a number of applications. Elephas brings deep learning with Keras to Spark. Elephas intends to keep the simplicity and high usability of Keras, thereby allowing for fast prototyping of distributed models, which can be run on massive data sets. Elephas implements a class of data-parallel algorithms on top of Keras, using Spark's RDDs and data frames. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    TensorFlowOnSpark

    TensorFlowOnSpark

    TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

    By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    SparrowRecSys

    SparrowRecSys

    A Deep Learning Recommender System

    ...SparrowRecSys supports a wide range of state-of-the-art recommendation algorithms, including models for click-through rate prediction and user behavior modeling that are widely used in advertising and content recommendation systems. The system is designed as a modular platform combining technologies such as Spark, TensorFlow, and web server components to represent the full lifecycle of recommendation pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TransmogrifAI

    TransmogrifAI

    TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library

    TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library written in Scala that runs on top of Apache Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Weld

    Weld

    High-performance runtime for data analytics applications

    ...This approach reduces data movement between libraries and enables the system to generate highly optimized machine code for parallel execution. Weld is particularly useful for workloads involving large-scale data processing in frameworks such as NumPy, Spark, and TensorFlow. The language includes built-in constructs for expressing data-parallel operations, enabling efficient execution on modern hardware architectures. By combining operations from multiple libraries into a single optimized execution plan, Weld can significantly improve performance in analytics and machine learning pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    spark-ml-source-analysis

    spark-ml-source-analysis

    Spark ml algorithm principle analysis and specific source code

    spark-ml-source-analysis is a technical repository that analyzes the internal implementation of machine learning algorithms within Apache Spark’s MLlib library. The project aims to help developers and data scientists understand how distributed machine learning algorithms are implemented and optimized inside the Spark ecosystem. Instead of providing a runnable software system, the repository focuses on explaining algorithm principles and examining the underlying source code used in Spark’s machine learning package. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB