spark free download - SourceForge

Showing 43 open source projects for "spark"

View related business solutions

Software Development Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
1

Apache Spark

A unified analytics engine for large-scale data processing

...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.

Downloads: 3 This Week

Last Update: 2026-04-06
See Project
2

.NET for Apache Spark

A free, open-source, and cross-platform big data analytics framework

.NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations.

Downloads: 5 This Week

Last Update: 2026-02-13
See Project
3

SageMaker Spark Container

Docker image used to run data processing workloads

...The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

Downloads: 3 This Week

Last Update: 2025-12-04
See Project
4

Synapse Machine Learning

Simple and distributed Machine Learning

...These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. For production-grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

Downloads: 0 This Week

Last Update: 2026-04-04
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

Deequ

Deequ is a library built on top of Apache Spark

Deequ is a library built atop Apache Spark that enables defining “unit tests for data” — that is, formal constraints or checks on datasets to ensure data quality along dimensions such as completeness, uniqueness, value ranges, correlations, etc. It can scale to large datasets (billions of rows) by translating those data checks into Spark jobs. Deequ supports advanced features like a metrics repository for storing computed statistics over time, anomaly detection of data quality metrics, and the suggestion of likely constraints automatically for new datasets. ...

Downloads: 9 This Week

Last Update: 2026-03-30
See Project
6

Alire

Command-line tool from the Alire project and supporting library

Alire is a source-based package manager for the Ada and SPARK programming languages. It facilitates the building and sharing of projects within the Ada community, allowing developers to easily manage dependencies and publish their own libraries or programs. Alire aims to streamline the development process for Ada and SPARK by providing a standardized approach to package management.

Downloads: 10 This Week

Last Update: 2025-04-22
See Project
7

SageMaker Spark

A Spark library for Amazon SageMaker

SageMaker Spark depends on hadoop-aws-2.8.1. To run Spark applications that depend on SageMaker Spark, you need to build Spark with Hadoop 2.8. However, if you are running Spark applications on EMR, you can use Spark built with Hadoop 2.7.

Downloads: 0 This Week

Last Update: 2024-02-22
See Project
8

Apache Sedona

Cluster computing framework for processing large-scale geospatial data

Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads. ...

Downloads: 1 This Week

Last Update: 2026-01-05
See Project
9

XGBoost

Scalable and Flexible Gradient Boosting

...It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems. XGBoost can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.

Downloads: 14 This Week

Last Update: 2026-02-10
See Project
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
10

Volcano

A Cloud Native Batch System (Project under CNCF)

...It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high-performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open-source community. Until June 2021, Volcano has been widely used around the world at a variety of industries such as Internet/Cloud/Finance/ Manufacturing/Medical. ...

Downloads: 174 This Week

Last Update: 2026-03-30
See Project
11

Laravel Lang

List of 126 languages for Laravel Framework, Laravel Jetstream, etc.

List of 126 languages for Laravel Framework, Laravel Jetstream, Laravel Fortify, Laravel Breeze, Laravel Cashier, Laravel Nova, Laravel Spark and Laravel UI. It is recommended to use this particular package as it will allow you to very quickly update all the necessary dependencies that ensure application localization.

Downloads: 5 This Week

Last Update: 2026-03-21
See Project
12

Soot

Soot - A Java optimization framework

Soot is a Java optimization framework. It provides four intermediate representations for analyzing and transforming Java bytecode. Baf: a streamlined representation of bytecode which is simple to manipulate. Jimple: a typed 3-address intermediate representation suitable for optimization. Shimple: an SSA variation of Jimple. Grimp: an aggregated version of Jimple suitable for decompilation and code inspection.

Downloads: 7 This Week

Last Update: 2026-02-23
See Project
13

Serverless Java container

A Java wrapper to run Spring, Spring Boot, Jersey, and other apps

The AWS Serverless Java Container library is a framework that allows developers to run existing or new Java web applications—built with frameworks such as Spring, Jersey, Spark, Struts—inside AWS Lambda with minimal modifications. It bridges the gap between traditional servlet or web-framework models and serverless functions by mapping HTTP events from API Gateway into requests your framework understands and routing responses back appropriately. This means you can keep much of your familiar Java-based architecture (controllers, filters, dependency injection) and deploy it in a serverless environment without rewriting everything from scratch. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
14

Apache Beam

Unified programming model for Batch and Streaming

Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.

Downloads: 1 This Week

Last Update: 2026-03-30
See Project
15

Numba

NumPy aware dynamic Python compiler using LLVM

...Special decorators can create universal functions that broadcast over NumPy arrays just like NumPy functions do. Numba also works great with Jupyter notebooks for interactive computing, and with distributed execution frameworks, like Dask and Spark.

Downloads: 11 This Week

Last Update: 2026-03-31
See Project
16

Apache Bigtop

Bigtop is an Apache Foundation project for Infrastructure Engineers

Apache Bigtop is a project focused on building and packaging the Hadoop ecosystem and related big data components. It provides a consistent framework for testing, packaging, and deploying Hadoop distributions, including tools like HDFS, YARN, Spark, Hive, HBase, and more. By maintaining cross-platform builds (RPMs, DEBs, Docker images, and Kubernetes support), Bigtop makes it easier for organizations to deploy big data stacks in different environments. It also includes a set of integration tests and smoke tests to ensure compatibility and stability between ecosystem components. ...

Downloads: 9 This Week

Last Update: 2025-09-03
See Project
17

go-chart

go chart is a basic charting library in go

Package chart is a very simple golang native charting library that supports time-series and continuous line charts. Master should now be on the v3.x codebase, which overhauls the api significantly. Per usual, see examples for more information. Actual chart configurations and examples can be found in the ./examples/ directory. They are simple CLI programs that write to output.png (they are also updated with go generate. Everything on the chart.Chart object has defaults that can be overridden....

Downloads: 2 This Week

Last Update: 2024-08-23
See Project
18

Kedro

A Python framework for creating reproducible, maintainable code

Kedro is an open sourced Python framework for creating maintainable and modular data science code. Provides the scaffolding to build more complex data and machine-learning pipelines. In addition, there's a focus on spending less time on the tedious "plumbing" required to maintain data science code; this means that you have more time to solve new problems. Standardises team workflows; the modular structure of Kedro facilitates a higher level of collaboration when teams solve problems...

Downloads: 9 This Week

Last Update: 2026-04-07
See Project
19

BentoML

Unified Model Serving Framework

BentoML simplifies ML model deployment and serves your models at a production scale. Support multiple ML frameworks natively: Tensorflow, PyTorch, XGBoost, Scikit-Learn and many more! Define custom serving pipeline with pre-processing, post-processing and ensemble models. Standard .bento format for packaging code, models and dependencies for easy versioning and deployment. Integrate with any training pipeline or ML experimentation platform. Parallelize compute-intense model inference...

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
20

SQL Formatter

A whitespace formatter for different query languages

...It supports various SQL dialects: GCP BigQuery, IBM DB2, Apache Hive, MariaDB, MySQL, Couchbase N1QL, Oracle PL/SQL, PostgreSQL, Amazon Redshift, SingleStoreDB, Snowflake, Spark, SQL Server Transact-SQL, Trino/Presto. See language option docs for more details. The CLI tool will be installed under sql-formatter and may be invoked via npx sql-formatter. If you don't use a module bundler, clone the repository, run npm install and grab a file from /dist directory to use inside a script tag. This makes SQL Formatter available as a global variable window.sqlFormatter.

Downloads: 0 This Week

Last Update: 2026-03-28
See Project
21

Smallpond

A lightweight data processing framework built on DuckDB and 3FS

...The idea is to preserve DuckDB’s fast analytics engine but lift it from single-node to multi-node settings, giving you the ability to operate on large datasets (e.g. petabyte scale) without moving to a heavyweight system like Spark. Users write Python-like code (via DataFrame APIs or SQL strings) to express their transformations; behind the scenes, tasks are scheduled (often via Ray) and pushed into DuckDB instances operating on partitioned data. Because the storage layer (3FS) is optimized for random access and high throughput, smallpond can shuffle data, repartition, and manage intermediate results across nodes.

Downloads: 0 This Week

Last Update: 2025-10-04
See Project
22

Horovod

Distributed training framework for TensorFlow, Keras, PyTorch, etc.

...With Horovod, an existing training script can be scaled up to run on hundreds of GPUs in just a few lines of Python code. Horovod can be installed on-premise or run out-of-the-box in cloud platforms, including AWS, Azure, and Databricks. Horovod can additionally run on top of Apache Spark, making it possible to unify data processing and model training into a single pipeline. Once Horovod has been configured, the same infrastructure can be used to train models with any framework, making it easy to switch between TensorFlow, PyTorch, MXNet, and future frameworks as machine learning tech stacks continue to evolve. Start scaling your model training with just a few lines of Python code. ...

Downloads: 6 This Week

Last Update: 2023-06-12
See Project
23

FiloDB

Distributed Prometheus time series database

FiloDB is an open-source distributed, real-time, in-memory, massively scalable, multi-schema time series / event / operational database with Prometheus query support and some Spark support as well. The normal configuration for real-time ingestion is deployment as stand-alone processes in a cluster, ingesting directly from Apache Kafka. The processes form a cluster using peer-to-peer Akka Cluster technology. Designed to ingest many millions of entities, sharded across multiple processes, with distributed querying built in. ...

Downloads: 5 This Week

Last Update: 2022-10-15
See Project
24

WTFJS

A list of funny and tricky JavaScript examples

...It’s designed as both a fun read and a serious learning aid, helping developers build an intuition for how JavaScript evaluates expressions. By highlighting common misconceptions, it encourages safer coding patterns and more reliable mental models. Teachers, interviewers, and learners use it to spark discussion and deepen understanding of JavaScript’s semantics.

Downloads: 8 This Week

Last Update: 2025-09-05
See Project
25

osm4scala

Reading OpenStreetMap Pbf files.

Scala and polyglot Spark library (Scala, PySpark, SparkSQL, ... ) focused on reading OpenStreetMap Pbf files.

Downloads: 15 This Week

Last Update: 2022-12-26
See Project