Alternatives to Apache Airflow
Compare Apache Airflow alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Airflow in 2026. Compare features, ratings, user reviews, pricing, and more from Apache Airflow competitors and alternatives in order to make an informed decision for your business.
-
1
JAMS
JAMS Software
JAMS is an automation orchestration and job scheduling solution that works across applications, APIs, and scripting languages. Run, monitor, and manage critical IT processes—from simple batch jobs to cross-platform workflows—from a single pane of glass. JAMS can automate jobs on any platform - Windows, Linux, UNIX, IBM i, zOS, and OpenVMS and includes native application integrations to run jobs specific to databases, BI tools, and ERP systems. Its extensive automation features enable you to run jobs on any schedule, as well as trigger off the completion of other events. JAMS centrally monitors the status of all jobs, provides notifications of failure (or success), and maintains a detailed audit trail and log of every execution. -
2
Dataiku
Dataiku
Dataiku is an enterprise AI platform designed to help organizations move from fragmented AI efforts to fully scalable and governed AI success. It brings together people, data, and technology into a single system that enables collaboration between domain experts and technical teams. The platform allows users to build, deploy, and manage AI models, analytics workflows, and AI agents with greater efficiency. Dataiku emphasizes orchestration by connecting data sources, applications, and machine learning processes into unified pipelines. It also provides strong governance capabilities, helping organizations monitor performance, control costs, and reduce risks across AI initiatives. Businesses across industries use Dataiku to modernize analytics, automate workflows, and scale machine learning across teams. With proven results from global enterprises, the platform supports faster innovation and measurable ROI through AI-driven solutions. -
3
JS7 JobScheduler
SOS GmbH
JS7 JobScheduler is an Open Source workload automation system designed for performance, resilience and security. It provides unlimited performance for parallel execution of jobs and workflows. JS7 offers cross-platform job execution, managed file transfer, complex no-code job dependencies and a real REST API. Platforms - Cloud scheduling from Containers for Docker®, Kubernetes®, OpenShift® etc. - True multi-platform scheduling on premises for Windows®, Linux®, AIX®, Solaris®, macOS® etc. - Hybrid use for cloud and on premises User Interface - Modern, no-code GUI for inventory management, monitoring and control with web browsers - Near real-time information brings immediate visibility of status changes and log output of jobs and workflows - Multi-client capability, role based access management High Availability - Redundancy and Resilience based on asynchronous design and autonomous Agents - Clustering for all JS7 products, automatic fail-over and manual switch-over -
4
dbt
dbt Labs
dbt helps data teams transform raw data into trusted, analysis-ready datasets faster. With dbt, data analysts and data engineers can collaborate on version-controlled SQL models, enforce testing and documentation standards, lean on detailed metadata to troubleshoot and optimize pipelines, and deploy transformations reliably at scale. Built on modern software engineering best practices, dbt brings transparency and governance to every step of the data transformation workflow. Thousands of companies, from startups to Fortune 500 enterprises, rely on dbt to improve data quality and trust as well as drive efficiencies and reduce costs as they deliver AI-ready data across their organization. Whether you’re scaling data operations or just getting started, dbt empowers your team to move from raw data to actionable analytics with confidence. -
5
ActiveBatch Workload Automation
ActiveBatch by Redwood
ActiveBatch by Redwood makes setting up and launching automation easy with no custom scripting required. With a low-code Super REST API adapter, over 100 pre-built job steps and a user-friendly drag-and-drop workflow designer, you can integrate across any system, application and data source, on-prem, in the cloud or in hybrid environments. Maintain complete control and visibility and meet SLAs with monitoring of all automation from a single pane of glass and get custom alerts via emails or SMS. Managed Smart Queues dynamically scale resources for high-volume workloads, reducing process times while the self-service portal enables business users to run and monitor workflows independently. ActiveBatch meets security and compliance standards, with ISO 27001 and SOC 2, Type II certifications, encrypted connections and regular third-party tests, always keeping security at the forefront. Along with ongoing product advancements, get the added benefit of 24x7 support and on-site training. -
6
Union Cloud
Union.ai
Union.ai is an award-winning, Flyte-based data and ML orchestrator for scalable, reproducible ML pipelines. With Union.ai, you can write your code locally and easily deploy pipelines to remote Kubernetes clusters. “Flyte’s scalability, data lineage, and caching capabilities enable us to train hundreds of models on petabytes of geospatial data, giving us an edge in our business.” — Arno, CTO at Blackshark.ai “With Flyte, we want to give the power back to biologists. We want to stand up something that they can play around with different parameters for their models because not every … parameter is fixed. We want to make sure we are giving them the power to run the analyses.” — Krishna Yeramsetty, Principal Data Scientist at Infinome “Flyte plays a vital role as a key component of Gojek's ML Platform by providing exactly that." — Pradithya Aria Pura, Principal Engineer at GojStarting Price: Free (Flyte) -
7
Moxo
Moxo
Moxo’s service orchestration platform transforms complex B2B relationships into seamless experiences. Business processes often fragment across departments, clients, vendors, and partners, creating inefficiency and risks. Our platform streamlines these workflows—turning disjointed experiences into smooth, efficient operations that reduce costs and enhance client satisfaction. Moxo accelerates critical processes including client onboarding, document collection, and exception handling. The results: faster completion times, reduced compliance risks, and superior client experiences. Leading institutions across financial services, consulting, legal, healthcare, and real estate—including Citibank and BNP Paribas —trust Moxo to orchestrate their mission-critical business relationships. -
8
AWS Glue
Amazon
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months. Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products. AWS Glue runs in a serverless environment. There is no infrastructure to manage, and AWS Glue provisions, configures, and scales the resources required to run your data integration jobs. -
9
AWS Step Functions
Amazon
AWS Step Functions is a serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services into business-critical applications. Through its visual interface, you can create and run a series of checkpointed and event-driven workflows that maintain the application state. The output of one step acts as an input to the next. Each step in your application executes in order, as defined by your business logic. Orchestrating a series of individual serverless applications, managing retries, and debugging failures can be challenging. As your distributed applications become more complex, the complexity of managing them also grows. With its built-in operational controls, Step Functions manages sequencing, error handling, retry logic, and state, removing a significant operational burden from your team. AWS Step Functions lets you build visual workflows that enable fast translation of business requirements into technical requirements.Starting Price: $0.000025 -
10
Amazon CloudWatch
Amazon
Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services that run on AWS and on-premises servers. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications. CloudWatch alarms watch your metric values against thresholds that you specify or that it creates using ML models to detect anomalous behavior. -
11
UiPath
UiPath
Become a fully automated enterprise™ with the UiPath Platform. A fully automated enterprise is a digitally transformed enterprise. Create business resilience, speed, and agility, and unburden people from mundane work with the automation platform that has it all. Use the data from your business applications (like ERP and CRM) to give you a detailed understanding of complex business processes. You’ll know what to automate and how to do it best—and be able to prove impact, too. UiPath is an innovative Robotic Process Automation (RPA) and process mining enterprise platform that empowers organizations to efficiently automate business processes, helping companies become digital businesses faster and gain a valuable advantage on their path to AI. Scalable, extensible, and sustainable, UiPath lets users design their own workflows visually--no scripting or coding required. The platform also features full auditing capabilities, advanced analytical reporting, and customizable dashboards.Starting Price: $3990.00/year/user -
12
Astro by Astronomer
Astronomer
For data teams looking to increase the availability of trusted data, Astronomer provides Astro, a modern data orchestration platform, powered by Apache Airflow, that enables the entire data team to build, run, and observe data pipelines-as-code. Astronomer is the commercial developer of Airflow, the de facto standard for expressing data flows as code, used by hundreds of thousands of teams across the world. -
13
Beamer
Beamer
Update and engage users effortlessly. Announce your latest updates and get powerful feedback with an in-app notification center, widgets and changelog. Install in-app or on your website so users can get announcements in context. Public page with your own domain, custom appearance and SEO optimization. Share your important news and updates Create and schedule posts to keep your users and site visitors in the know. Use visual content like images, videos and gifs to get even more engagement. Use segmentation to send targeted notifications Create custom segments by industry, product, role, location, language, behavior and more. Send more relevant notifications and get better results. Use push notifications to bring users back Send web push notifications to users or website visitors to make sure they get your announcements - even if they aren’t on your site. Get feedback on your latest updates and news.Starting Price: $49 per month -
14
Dataplane
Dataplane
The concept behind Dataplane is to make it quicker and easier to construct a data mesh with robust data pipelines and automated workflows for businesses and teams of all sizes. In addition to being more user friendly, there has been an emphasis on scaling, resilience, performance and security.Starting Price: Free -
15
Datavolo
Datavolo
Capture all your unstructured data for all your LLM needs. Datavolo replaces single-use, point-to-point code with fast, flexible, reusable pipelines, freeing you to focus on what matters most, doing incredible work. Datavolo is the dataflow infrastructure that gives you a competitive edge. Get fast, unencumbered access to all of your data, including the unstructured files that LLMs rely on, and power up your generative AI. Get pipelines that grow with you, in minutes, not days, without custom coding. Instantly configure from any source to any destination at any time. Trust your data because lineage is built into every pipeline. Make single-use pipelines and expensive configurations a thing of the past. Harness your unstructured data and unleash AI innovation with Datavolo, powered by Apache NiFi and built specifically for unstructured data. Our founders have spent a lifetime helping organizations make the most of their data.Starting Price: $36,000 per year -
16
Control-M
BMC Software
Control-M is an end-to-end workflow orchestration platform that simplifies how organizations build, schedule, and manage application and data workflows across hybrid environments. It provides a single, unified view that eliminates complexity and ensures critical processes run reliably and on time. With built-in integrations for cloud, mainframe, DevOps tools, and leading data platforms, teams can orchestrate everything from batch jobs to modern data pipelines. Control-M enhances operational efficiency through proactive monitoring, SLA insights, and predictive analytics that prevent delays before they impact the business. Developers and operations teams gain shared visibility and self-service controls, enabling faster delivery cycles and reduced manual effort. By consolidating workflow management into one system, Control-M improves reliability, accelerates innovation, and reduces operational costs. -
17
Dagster
Dagster Labs
Dagster is a next-generation orchestration platform for the development, production, and observation of data assets. Unlike other data orchestration solutions, Dagster provides you with an end-to-end development lifecycle. Dagster gives you control over your disparate data tools and empowers you to build, test, deploy, run, and iterate on your data pipelines. It makes you and your data teams more productive, your operations more robust, and puts you in complete control of your data processes as you scale. Dagster brings a declarative approach to the engineering of data pipelines. Your team defines the data assets required, quickly assessing their status and resolving any discrepancies. An assets-based model is clearer than a tasks-based one and becomes a unifying abstraction across the whole workflow.Starting Price: $0 -
18
Azure Data Factory
Microsoft
Integrate data silos with Azure Data Factory, a service built for all data integration needs and skill levels. Easily construct ETL and ELT processes code-free within the intuitive visual environment, or write your own code. Visually integrate data sources using more than 90+ natively built and maintenance-free connectors at no added cost. Focus on your data—the serverless integration service does the rest. Data Factory provides a data integration and transformation layer that works across your digital transformation initiatives. Data Factory can help independent software vendors (ISVs) enrich their SaaS apps with integrated hybrid data as to deliver data-driven user experiences. Pre-built connectors and integration at scale enable you to focus on your users while Data Factory takes care of the rest. -
19
Azure Logic Apps
Microsoft
Built on a containerized runtime that increases scale and portability while automating business-critical workflows anywhere. Modernize your BizTalk Server applications by moving them to Logic Apps using the BizTalk migration tool. Connect logic apps to your virtual networks to seamlessly and securely integrate cloud-based and on-premises solutions. Containerize your workflows to deploy and run your applications anywhere—in the cloud, on premises, or the infrastructure of your choice. Apply CI/CD best practices to your workflows and take advantage of built-in tools for seamless and secure deployments. Deploy and run logic applications in Azure, any container, and on premises. Enable private endpoints, simplified virtual network access, and deployment slots. Develop, debug, and test on Windows, MacOS, and Linux using Visual Studio Code. Deploy multiple workflows to a single logic app, simplifying automated deployments and CI/CD pipelines. -
20
IBM watsonx.data integration is a data integration platform designed to help organizations transform raw data into AI-ready data at scale. The platform enables data teams to build, manage, and optimize data pipelines across multiple environments, including on-premises systems and hybrid or multi-cloud infrastructures. With a unified control plane, watsonx.data integration supports multiple integration styles such as batch processing, real-time streaming, and data replication within a single solution. The platform also offers no-code, low-code, and pro-code development options, allowing both technical and non-technical users to design and manage data pipelines efficiently. By simplifying data integration workflows and reducing reliance on multiple tools, watsonx.data integration helps organizations deliver reliable data for analytics and AI applications.
-
21
Kestra
Kestra
Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate in the data pipeline creation process. The UI automatically adjusts the YAML definition any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is defined declaratively in code, even if some workflow components are modified in other ways. -
22
Flyte
Union.ai
The workflow automation platform for complex, mission-critical data and ML processes at scale. Flyte makes it easy to create concurrent, scalable, and maintainable workflows for machine learning and data processing. Flyte is used in production at Lyft, Spotify, Freenome, and others. At Lyft, Flyte has been serving production model training and data processing for over four years, becoming the de-facto platform for teams like pricing, locations, ETA, mapping, autonomous, and more. In fact, Flyte manages over 10,000 unique workflows at Lyft, totaling over 1,000,000 executions every month, 20 million tasks, and 40 million containers. Flyte has been battle-tested at Lyft, Spotify, Freenome, and others. It is entirely open-source with an Apache 2.0 license under the Linux Foundation with a cross-industry overseeing committee. Configuring machine learning and data workflows can get complex and error-prone with YAML.Starting Price: Free -
23
Kedro
Kedro
Kedro is the foundation for clean data science code. It borrows concepts from software engineering and applies them to machine-learning projects. A Kedro project provides scaffolding for complex data and machine-learning pipelines. You spend less time on tedious "plumbing" and focus instead on solving new problems. Kedro standardizes how data science code is created and ensures teams collaborate to solve problems easily. Make a seamless transition from development to production with exploratory code that you can transition to reproducible, maintainable, and modular experiments. A series of lightweight data connectors is used to save and load data across many different file formats and file systems.Starting Price: Free -
24
KNIME Analytics Platform
KNIME
One enterprise-grade software platform, two complementary tools. Open source KNIME Analytics Platform for creating data science and commercial KNIME Server for productionizing data science. KNIME Analytics Platform is the open source software for creating data science. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. KNIME Server is the enterprise software for team-based collaboration, automation, management, and deployment of data science workflows as analytical applications and services. Non experts are given access to data science via KNIME WebPortal or can use REST APIs. Do even more with your data using extensions for KNIME Analytics Platform. Some are developed and maintained by us at KNIME, others by the community and our trusted partners. We also have integrations with many open source projects. -
25
Flowable
Flowable
Grow your company and attract new customers through outstanding customer experience and operational excellence. In today’s competitive environment leading organizations around the world are using Intelligent Business Automation solutions from Flowable to change the way they do business. Driving Customer Retention and Acquisition by delivering outstanding customer experience. Increasing Operational Excellence by driving business efficiencies and reducing working costs. Delivering increased Business Agility to adapt to changing market conditions. Enforcing Business Compliance to ensure business continuity. Flowable’s conversational engagement capabilities let you deliver a compelling mix of automated and personal service via popular chat platforms such as WhatsApp – even in highly-regulated industries. Flowable is lightning fast, with many years of real-world use. It has full support for process, case and decision modeling, and easily handles complex case management scenarios. -
26
Mage
Mage
Mage is a tool that transforms your data into predictions. Build, train, and deploy predictive models in minutes. No AI experience required. Increase user engagement by ranking content on your user’s home feed. Increase conversion by showing the most relevant products for a user to buy. Increase retention by predicting which users will stop using your app. Increase conversion by matching users in a marketplace. Data is the most important part in building AI. Mage will guide you through this process with suggestions on how to improve your data, making you an AI expert. AI and its predictions are difficult to understand. Mage explains every metric in-depth, teaching you how your AI model thinks. Get real-time predictions with a few lines of code. Mage makes it easy for you to integrate your AI model in any application.Starting Price: Free -
27
Meltano
Meltano
Meltano provides the ultimate flexibility in deployment options. Own your data stack, end to end. Ever growing connector library of 300+ connectors have been running in production for years. Run workflows in isolated environments, execute end-to-end tests, and version control everything. Open source gives you the power to build your ideal data stack. Define your entire project as code and collaborate confidently with your team. The Meltano CLI enables you to rapidly create your project, making it easy to start replicating data. Meltano is designed to be the best way to run dbt to manage your transformations. Your entire data stack is defined in your project, making it simple to deploy it to production. Validate your changes in development before moving to CI, and in staging before moving to production. -
28
Apache Flink
Apache Software Foundation
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. Apache Flink excels at processing unbounded and bounded data sets. Precise control of time and state enable Flink’s runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance. Flink is designed to work well each of the previously listed resource managers. -
29
Hevo
Hevo Data
Hevo Data is a no-code, bi-directional data pipeline platform specially built for modern ETL, ELT, and Reverse ETL Needs. It helps data teams streamline and automate org-wide data flows that result in a saving of ~10 hours of engineering time/week and 10x faster reporting, analytics, and decision making. The platform supports 100+ ready-to-use integrations across Databases, SaaS Applications, Cloud Storage, SDKs, and Streaming Services. Over 500 data-driven companies spread across 35+ countries trust Hevo for their data integration needs. Try Hevo today and get your fully managed data pipelines up and running in just a few minutes.Starting Price: $249/month -
30
SnapLogic
SnapLogic
Quickly ramp up, learn and use SnapLogic to create, multi-point, enterprise- wide app and data integrations. Easily expose and manage pipeline APIs that extend your world. Eliminate slower, manual, error-prone methods and deliver faster results for business processes such as customer onboarding, employee onboarding and off-boarding, quote to cash, ERP SKU forecasting, support ticket creation, and more. Monitor, manage, secure, and govern your data pipelines, application integrations, and API calls––all from a single pane of glass. Launch automated workflows for any department, across your enterprise, in minutes – not days. To deliver superior employee experiences, the SnapLogic platform can bring together employee data across all your enterprise HR apps and data stores. Learn how SnapLogic can help you quickly set up seamless experiences powered by automated processes. -
31
StackStorm
StackStorm
StackStorm connects all your apps, services, and workflows. From simple if/then rules to complicated workflows, StackStorm lets you automate DevOps your way. No need to change your existing processes or workflows, StackStorm connects what you already have. Community is what makes a good product great. StackStorm is used by a lot of people around the world, and you can always count on getting answers to your questions. Stackstorm can be used to automate and streamline nearly any part of your business. Here are some of the most common applications. When failures happen, StackStorm can act as Tier 1 support: It troubleshoots, fixes known problems, and escalates to humans when needed. Continuous deployment can get complex, beyond Jenkins or other specialized opinionated tools. Automate advanced CI/CD pipelines your way. ChatOps brings automation and collaboration together; transforming devops teams to get things done better, faster, and with style. -
32
Lyftrondata
Lyftrondata
Whether you want to build a governed delta lake, data warehouse, or simply want to migrate from your traditional database to a modern cloud data warehouse, do it all with Lyftrondata. Simply create and manage all of your data workloads on one platform by automatically building your pipeline and warehouse. Analyze it instantly with ANSI SQL, BI/ML tools, and share it without worrying about writing any custom code. Boost the productivity of your data professionals and shorten your time to value. Define, categorize, and find all data sets in one place. Share these data sets with other experts with zero codings and drive data-driven insights. This data sharing ability is perfect for companies that want to store their data once, share it with other experts, and use it multiple times, now and in the future. Define dataset, apply SQL transformations or simply migrate your SQL data processing logic to any cloud data warehouse. -
33
Metaflow
Netflix
Successful data science projects are delivered by data scientists who can build, improve, and operate end-to-end workflows independently, focusing more on data science, less on engineering. Use Metaflow with your favorite data science libraries, such as Tensorflow or SciKit Learn, and write your models in idiomatic Python code with not much new to learn. Metaflow also supports the R language. Metaflow helps you design your workflow, run it at scale, and deploy it to production. It versions and tracks all your experiments and data automatically. It allows you to inspect results easily in notebooks. Metaflow comes packaged with the tutorials, so getting started is easy. You can make copies of all the tutorials in your current directory using the metaflow command line interface. -
34
Oracle Data Integrator
Oracle
Oracle Data Integrator is a comprehensive data integration platform that covers all data integration requirements: from high-volume, high-performance batch loads, to event-driven, trickle-feed integration processes, to SOA-enabled data services. Oracle Data Integrator (ODI) 12c, the latest version of Oracle’s strategic Data Integration offering, provides superior developer productivity and improved user experience with a redesigned flow-based declarative user interface and deeper integration with Oracle GoldenGate. ODI12c further builds on its flexible and high-performance architecture with comprehensive big data support and added parallelism when executing data integration processes. It includes interoperability with Oracle Warehouse Builder (OWB) for a quick and simple migration for OWB customers to ODI12c. Additionally, ODI can be monitored from a single solution along with other Oracle technologies and applications through the integration with Oracle Enterprise Manager 12c. -
35
Orchestra
Orchestra
Orchestra is a Unified Control Plane for Data and AI Operations, designed to help data teams build, deploy, and monitor workflows with ease. It offers a declarative framework that combines code and GUI, allowing users to implement workflows 10x faster and reduce maintenance time by 50%. With real-time metadata aggregation, Orchestra provides full-stack data observability, enabling proactive alerting and rapid recovery from pipeline failures. It integrates seamlessly with tools like dbt Core, dbt Cloud, Coalesce, Airbyte, Fivetran, Snowflake, BigQuery, Databricks, and more, ensuring compatibility with existing data stacks. Orchestra's modular architecture supports AWS, Azure, and GCP, making it a versatile solution for enterprises and scale-ups aiming to streamline their data operations and build trust in their AI initiatives. -
36
Prefect
Prefect
Prefect is a workflow orchestration and automation platform designed for the modern context-driven era. It enables teams to turn Python functions into production-ready workflows with minimal effort. Prefect provides open-source foundations alongside managed platforms for enterprise-scale automation. The platform supports building and orchestrating data pipelines, workflows, and AI applications with full observability. Prefect Cloud offers managed orchestration with autoscaling, enterprise authentication, and built-in governance. Prefect Horizon extends automation to AI infrastructure by enabling deployment of MCP servers for AI agents. Trusted by leading organizations, Prefect helps teams scale automation without operational complexity. -
37
Stitch
Qlik
Stitch is a cloud-based platform for ETL – extract, transform, and load. More than a thousand companies use Stitch to move billions of records every day from SaaS applications and databases into data warehouses and data lakes. -
38
IBM StreamSets
IBM
IBM® StreamSets enables users to create and manage smart streaming data pipelines through an intuitive graphical interface, facilitating seamless data integration across hybrid and multicloud environments. This is why leading global companies rely on IBM StreamSets to support millions of data pipelines for modern analytics, intelligent applications and hybrid integration. Decrease data staleness and enable real-time data at scale—handling millions of records of data, across thousands of pipelines within seconds. Insulate data pipelines from change and unexpected shifts with drag-and-drop, prebuilt processors designed to automatically identify and adapt to data drift. Create streaming pipelines to ingest structured, semistructured or unstructured data and deliver it to a wide range of destinations.Starting Price: $1000 per month -
39
ZenML
ZenML
Simplify your MLOps pipelines. Manage, deploy, and scale on any infrastructure with ZenML. ZenML is completely free and open-source. See the magic with just two simple commands. Set up ZenML in a matter of minutes, and start with all the tools you already use. ZenML standard interfaces ensure that your tools work together seamlessly. Gradually scale up your MLOps stack by switching out components whenever your training or deployment requirements change. Keep up with the latest changes in the MLOps world and easily integrate any new developments. Define simple and clear ML workflows without wasting time on boilerplate tooling or infrastructure code. Write portable ML code and switch from experimentation to production in seconds. Manage all your favorite MLOps tools in one place with ZenML's plug-and-play integrations. Prevent vendor lock-in by writing extensible, tooling-agnostic, and infrastructure-agnostic code.Starting Price: Free -
40
n8n
n8n
Build complex automations 10x faster, without fighting APIs. Your days spent slogging through a spaghetti of scripts are over. Use JavaScript when you need flexibility and UI for everything else. n8n allows you to build flexible workflows focused on deep data integration. And with sharable templates and a user-friendly UI, the less technical people on your team can collaborate on them too. Unlike other tools, complexity is not a limitation. So you can build whatever you want — without stressing over budget. Connect APIs with no code to automate basic tasks. Or write vanilla Javascript when you need to manipulate complex data. You can implement multiple triggers. Branch and merge your workflows. And even pause flows to wait for external events. Interface easily with any API or service with custom HTTP requests. Avoid breaking live workflows by separating dev and prod environments with unique sets of auth data.Starting Price: $20 per month -
41
Windmill
Windmill
Windmill is an open source developer platform and workflow engine that transforms scripts into auto-generated UIs, APIs, and cron jobs, enabling the composition of workflows or data pipelines for building complex, data-intensive applications with ease. Supporting various languages, Windmill allows users to write and deploy software up to ten times faster, operating with high reliability and observability on a self-hostable job orchestrator. It features auto-generated user interfaces based on script parameters, a low-code app editor for creating custom UIs, and a flow editor for constructing workflows using a drag-and-drop interface. Windmill manages dependencies automatically, offers robust permissioning and monitoring, and provides various triggers including webhooks, schedules, CLI, Slack, and emails. Users can develop scripts locally with their preferred code editors, preview them, and deploy using the CLI.Starting Price: $120 per month -
42
Pipedream
Pipedream
The fastest way to integrate APIs and run code. Pipedream is a serverless integration and compute platform that makes it easy to connect apps and develop event-driven workflows. Event sources turn any API into a real-time event stream. Create event sources to listen for new Tweets, Github events, Airtable records, RSS items, webhook events and more. Inspect events in a human-friendly way, trigger Node.js workflows on every event, or consume events in your own app via API. Workflows are composed of Node.js code steps that run on every event. Write your own Node.js (and use any npm package) or reuse actions that scaffold popular APIs. Trigger via sources or a custom URL, email address, SDK code or schedule. Auth apps once, connect to those apps in any workflow. Pipedream supports OAuth and key-based auth, and handles the OAuth flow and token refresh for you. Just link accounts to steps and reference the relevant auth info in code.Starting Price: Free -
43
Activiti
Activiti
Helping businesses solve automation challenges in distributed, highly-scalable and cost effective infrastructures. Activiti is the leading lightweight, java-centric open-source BPMN engine supporting real-world process automation needs. Activiti Cloud is now the new generation of business automation platform offering a set of cloud native building blocks designed to run on distributed infrastructures. Inmutable, scalable & pain free Process & Decision Runtimes designed to integrate with your cloud native infrastructure. Scalable, storage independent and extensible audit service. Scalable, storage independent and extensible query service. Simplified system to system interactions that can scale in distributed environments. Distributed & Scalable application aggregation layer. Cloud ready secure WebSocket and subscription handling as part of GraphQL integration. -
44
Airbyte
Airbyte
Airbyte is an open-source data integration platform designed to help businesses synchronize data from various sources to their data warehouses, lakes, or databases. The platform provides over 550 pre-built connectors and enables users to easily create custom connectors using low-code or no-code tools. Airbyte's solution is optimized for large-scale data movement, enhancing AI workflows by seamlessly integrating unstructured data into vector databases like Pinecone and Weaviate. It offers flexible deployment options, ensuring security, compliance, and governance across all models.Starting Price: $2.50 per credit -
45
Alooma
Google
Alooma enables data teams to have visibility and control. It brings data from your various data silos together into BigQuery, all in real time. Set up and flow data in minutes or customize, enrich, and transform data on the stream before it even hits the data warehouse. Never lose an event. Alooma's built in safety nets ensure easy error handling without pausing your pipeline. Any number of data sources, from low to high volume, Alooma’s infrastructure scales to your needs. -
46
Apache Gobblin
Apache Software Foundation
A distributed data integration framework that simplifies common aspects of Big Data integration such as data ingestion, replication, organization, and lifecycle management for both streaming and batch data ecosystems. Runs as a standalone application on a single box. Also supports embedded mode. Runs as an mapreduce application on multiple Hadoop versions. Also supports Azkaban for launching mapreduce jobs. Runs as a standalone cluster with primary and worker nodes. This mode supports high availability and can run on bare metals as well. Runs as an elastic cluster on public cloud. This mode supports high availability. Gobblin as it exists today is a framework that can be used to build different data integration applications like ingest, replication, etc. Each of these applications is typically configured as a separate job and executed through a scheduler like Azkaban. -
47
Apache Spark
Apache Software Foundation
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources. -
48
Argo
Argo
Open-source tools for Kubernetes to run workflows, manage clusters and do GitOps right. Kubernetes-native workflow engine supporting DAG and step-based workflows. Declarative continuous delivery with a fully-loaded UI. Advanced Kubernetes deployment strategies such as Canary and Blue-Green made easy. Argo Workflows is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD. Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a graph (DAG). Easily run compute-intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes. Run CI/CD pipelines natively on Kubernetes without configuring complex software development products. Designed from the ground up for containers without the overhead and limitations of legacy VM and server-based environments. -
49
Amazon MWAA
Amazon
Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows.” With Managed Workflows, you can use Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Managed Workflows automatically scales its workflow execution capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to data.Starting Price: $0.49 per hour -
50
Google Cloud Composer
Google
Cloud Composer's managed nature and Apache Airflow compatibility allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources. End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline. Author, schedule, and monitor your workflows through a single orchestration tool—whether your pipeline lives on-premises, in multiple clouds, or fully within Google Cloud. Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud. Create workflows that connect data, processing, and services across clouds to give you a unified data environment.Starting Price: $0.074 per vCPU hour