Showing 272 open source projects for "data transformation"

View related business solutions
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • Wiz: #1 Cloud Security Software for Modern Cloud Protection Icon
    Wiz: #1 Cloud Security Software for Modern Cloud Protection

    Protect Everything You Build and Run in the Cloud

    Use the Wiz Cloud Security Platform to build faster in the cloud, enabling security, dev and devops to work together in a self-service model built for the scale and speed of your cloud development.
    Learn More
  • 1
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    ...It includes beginner and intermediate boot camps, interview guides, data cleaning and transformation resources, and curated lists of newsletters and industry communities, making it useful both for self-study and technical interview preparation. The repository is actively maintained and widely starred, reflecting its role as a go-to reference for newcomers and experienced practitioners alike.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Polyhedra

    Polyhedra

    Polyhedral Computation Interface

    Polyhedra provides an unified interface for Polyhedral Computation Libraries such as CDDLib.jl. This manipulation notably includes the transformation from (resp. to) an inequality representation of a polyhedron to (resp. from) its generator representation (convex hull of points + conic hull of rays) and projection/elimination of a variable with e.g. Fourier-Motzkin.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Jesta I.S. | Enterprise Software For Retail and Supply Chain Icon
    Jesta I.S. | Enterprise Software For Retail and Supply Chain

    Transition from fragmented entry-level or legacy systems to an enterprise suite.

    Unify your people and operations across all departments and channels. Discover end-to-end retail, wholesale, and supply chain management software suites designed to scale.
    Learn More
  • 5
    Chain.jl

    Chain.jl

    A Julia package for piping a value through transformation expressions

    A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    Malli

    Malli

    High-performance data-driven data specification library

    Malli is a powerful, data-driven schema library for Clojure and ClojureScript, offering rich support for specification, validation, parsing, error reporting, and generative testing. Designed for performance, Malli leverages efficient runtime representations and code generation, seamlessly integrating with Clojure’s data-oriented architecture. It supports function schemas, JSON transformation, and OpenAPI generation for strong API contracts.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    collapse

    collapse

    Advanced and Fast Data Transformation in R

    collapse is a high-performance R package designed for fast and efficient data transformation, aggregation, reshaping, and statistical computation. Built to offer a more performant alternative to dplyr and data.table, it is particularly well-suited for large datasets and econometric applications. It operates on base R data structures like data frames and vectors and uses highly optimized C++ code under the hood to deliver significant speed improvements. collapse also includes tools for grouped operations, weighted statistics, and time series manipulation, making it a compact yet powerful utility for data scientists and researchers working in R.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a table-based abstraction. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Accounting practice management software Icon
    Accounting practice management software

    Accountants, accounting firms, tax attorneys, tax professionals

    Canopy is a cloud-based practice management software for accounting and tax firms, offering tools for client engagement, document management, workflow automation, and time & billing. Its Client Engagement platform centralizes interactions with a secure portal, customizable branding, and email integration, while the Document Management system enables organized, paperless file storage. The Workflow module enhances visibility into tasks and projects through templates, task assignments, and automation, reducing human error. Additionally, the Time & Billing feature tracks billable hours, generates invoices, and processes payments, ensuring accurate financial management. With its comprehensive features, Canopy streamlines operations, reduces stress, and enhances client experiences.
    Learn More
  • 10
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 11
    PeerDB

    PeerDB

    Fast, Simple and a cost effective tool to replicate data from Postgres

    PeerDB is an open-source platform for real-time replication and transformation of data from PostgreSQL to analytical warehouses like BigQuery and Snowflake. It supports Change Data Capture (CDC) and provides seamless syncing and transformation logic with low latency. PeerDB is ideal for teams building real-time data pipelines without relying on expensive proprietary solutions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    sttr

    sttr

    Cross-platform, cli app to perform various operations on string

    sttr is command-line software that allows you to quickly run various transformation operations on the string.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 13
    Datacap

    Datacap

    DataCap is integrated software for data transformation

    Datacap is an open-source data catalog and governance tool that helps organizations manage and document their data assets. It provides metadata management, lineage tracking, and collaboration features to ensure data transparency and quality. Datacap is designed for teams that need a lightweight, self-hosted solution to organize and govern their data ecosystems.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 14
    Typia

    Typia

    Super-fast/easy runtime validations and serializations

    Super-fast/easy runtime validations and serializations through transformation.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    SQL Notebook

    SQL Notebook

    SQL Notebook — Casual data exploration in SQL

    SQL Notebook is a free Windows application for querying and analyzing data across multiple sources, including SQLite, PostgreSQL, Excel, and CSV files. It combines a SQL editor with a notebook interface, allowing for data exploration, transformation, and visualization in one place. SQL Notebook is ideal for analysts and data enthusiasts.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 16
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    Greenmask

    Greenmask

    PostgreSQL database anonymization and synthetic data generation tool

    Greenmask is a powerful open-source utility that is designed for logical database backup dumping, obfuscation, and restoration. It offers extensive functionality for backup, anonymization, and data masking. Greenmask is written in pure Go and includes ported PostgreSQL libraries that allows for platform independence. This tool is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    NeuralOperators.jl

    NeuralOperators.jl

    DeepONets, Neural Operators, Physics-Informed Neural Ops in Julia

    Neural operator is a novel deep learning architecture. It learns an operator, which is a mapping between infinite-dimensional function spaces. It can be used to resolve partial differential equations (PDE). Instead of solving by finite element method, a PDE problem can be resolved by training a neural network to learn an operator mapping from infinite-dimensional space (u, t) to infinite-dimensional space f(u, t). Neural operator learns a continuous function between two continuous function...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    Tidier.jl

    Tidier.jl

    Meta-package for data analysis in Julia, modeled after the R tidyverse

    Tidier.jl is a Julia package that brings tidyverse-style data manipulation and analysis to Julia, inspired by R's dplyr and tidyverse. It allows users to write expressive and concise data transformation code using chaining (|>) and intuitive syntax. Built on top of DataFrames.jl, Tidier.jl aims to make data wrangling more accessible to users familiar with R or looking for cleaner data pipelines in Julia.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    DataFramesMeta.jl

    DataFramesMeta.jl

    Metaprogramming tools for DataFrames

    Metaprogramming tools for DataFrames.jl objects to provide more convenient syntax. DataFrames.jl has the functions select, transform, and combine, as well as the in-place select! and transform! for manipulating data frames. DataFramesMeta.jl provides the macros @select, @transform, @combine, @select!, and @transform! to mirror these functions with more convenient syntax. Inspired by dplyr in R and LINQ in C#.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    GraphRAG

    GraphRAG

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    CoordinateTransformations.jl

    CoordinateTransformations.jl

    A fresh approach to coordinate transformations

    CoordinateTransformations is a Julia package to manage simple or complex networks of coordinate system transformations. Transformations can be easily applied, inverted, composed, and differentiated (both with respect to the input coordinates and with respect to transformation parameters such as rotation angle). Transformations are designed to be light-weight and efficient enough for, e.g., real-time graphical applications, while support for both explicit and automatic differentiation makes...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    ...Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.
    Downloads: 21 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB