Search Results for "data warehouse projects"

Showing 1302 open source projects for "data warehouse projects"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    Databend is an open source cloud-native data warehouse designed for large-scale analytics and modern data workloads. Built in Rust, the system focuses on high performance, scalability, and efficient data processing for analytical queries. It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 2
    Jitsu

    Jitsu

    Jitsu is an open-source Segment alternative

    Jitsu is a fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days. Installing Jitsu is a matter of selecting your framework and adding few lines of code to your app. Jitsu is built to be framework agnostic, so regardless of your stack, we have a solution that'll work for your team. Connect data warehouse (Snowflake, Clickhouse, BigQuery, S3, Redshift ot Postgres) and query your data instantly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    rudderstack

    rudderstack

    Privacy and Security focused Segment-alternative, in Golang

    ...Our SDKs track anonymous and known users at the source and reconcile users in your warehouse and SaaS tools. Go beyond event streaming and control all of your customer data on your own terms. Learn how we can help you build a customer data platform. RudderStack treats your data warehouse as a first-class citizen among destinations, with advanced features and configurable, near real-time sync. RudderStack is built API-first.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    YAO

    YAO

    Yao A low code engine to create web services and dashboard

    Yao allows developers to create web services by processes. Yao is a low-code engine that creates a database model, writes API services, and describes dashboard interface just by JSON for web & hardware, no code, and 10x productivity. Yao is based on the flow-based programming idea, developed in the Go language, and supports multiple ways to expand the data stream processor. This makes Yao extremely versatile, which can replace programming languages ​​in most scenarios, and is 10 times more...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Data Science Articles from CodeCut

    Data Science Articles from CodeCut

    Collection of useful data science topics along with articles

    The Data-science repository from CodeCutTech is a curated collection of educational content focused on practical tools and workflows used in modern data science projects. Instead of providing a single software package, the repository aggregates articles, tutorials, and examples covering many topics within the data science ecosystem. The materials address areas such as MLOps, data management, project organization, testing practices, visualization techniques, and productivity tools used by data scientists. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 7
    Conduit

    Conduit

    Conduit streams data between data stores. Kafka Connect replacement

    ...Eliminate the multi-step process you go through today. Just download the binary and start building. Conduit connectors give you the ability to pull and push data to any production datastore you need. If a datastore is missing, the simple SDK allows you to extend Conduit where you need it. Conduit pipelines listen for changes to a database, data warehouse, etc., and allows your data applications to act upon those changes in real-time. Run it in a way that works for you; use it as a standalone service or orchestrate it within your infrastructure.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 8
    PostHog

    PostHog

    PostHog provides open-source web & product analytics

    PostHog is an all‑in‑one open‑source platform for product and web analytics—offering event-based analytics, session recording, feature flagging, A/B testing, cohorts, and more—that you can self‑host, with full support for data privacy and enterprise compliance. Sync data from external tools like Stripe, Hubspot, your data warehouse, and more. Query it alongside your product data. Run custom filters and transformations on your incoming data. Send it to 25+ tools or any webhook in real time or batch export large amounts to your warehouse. Capture traces, generations, latency, and cost for your LLM-powered app.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    BigQuery Utils

    BigQuery Utils

    Useful scripts, udfs, views, and other utilities for migration

    BigQuery Utils is a large utility repository focused on helping users operate, optimize, and migrate workloads in BigQuery through reusable assets rather than a single application. It brings together scripts, user-defined functions, views, stored procedures, dashboards, notebooks, and supporting tools that address common data warehouse and analytics tasks. The repository is especially useful for organizations that need practical building blocks for migration from other database systems, since it includes compatibility-oriented utilities and migration-focused UDFs that mimic behavior from platforms such as Oracle, Redshift, and Netezza. It also supports day-to-day operational work by offering optimization scripts, billing queries, performance testing examples, and dashboards built on INFORMATION_SCHEMA metadata so teams can better understand slot usage, reservations, job execution, and errors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    Machine Learning and Data Science Apps

    Machine Learning and Data Science Apps

    A curated list of applied machine learning and data science notebooks

    ...Most examples are written in Python and frequently use Jupyter notebooks to present practical implementations and experiments. The project encourages contributions from data scientists and domain experts who want to share applied analytics projects and techniques that address real business challenges.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Countries, Languages & Continents data

    Countries, Languages & Continents data

    Countries, Languages & Continents data (capital and currency)

    ...Continents & countries: ISO 3166-1 alpha-2 code, name, ISO 639-1 language, capital and currency, native name, calling codes. Lists are available in JSON, CSV and SQL formats. Also, contains separate JSON files with additional country Emoji flags data. This version changes a lot in the data structures and placement of the files. So, if your projects depend on the old structure — specify previous versions, <2.0.0. Country item languages field is an Array in JSON files to easily count and match items with a Language item. But currency and phone calling codes may be a comma-separated String. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    DataEase

    DataEase

    Data visualization analysis tool

    ...Supports rich chart types (Apache ECharts / AntV), supports drag-and-drop method to quickly create dashboards. Support direct connection mode, local mode (based on Apache Doris / Kettle implementation). Support various data sources such as data warehouse/data lake, OLAP database, OLTP database, Excel data file, API, etc. Open source and open: zero threshold, quick access and installation online; quick access to user feedback, new versions released monthly. pport multiple data sharing methods to ensure data security.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    Datachain enables multimodal API calls and local AI inferences to run in parallel over many samples as chained operations. The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Spice.ai OSS

    Spice.ai OSS

    A self-hostable CDN for databases

    Spice is a portable runtime offering developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake. Spice connects, fuses, and delivers data to applications, machine-learning models, and AI backends, functioning as an application-specific, tier-optimized Database CDN. The Spice runtime, written in Rust, is built-with industry-leading technologies such as Apache DataFusion, Apache Arrow, Apache Arrow Flight, SQLite, and DuckDB. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 16
    ConcourseDB

    ConcourseDB

    Distributed database warehouse for transactions, search and analytics

    ConcourseDB is a distributed, self-tuning database designed for real-time applications, offering strong consistency and ACID compliance without requiring complex configurations. It provides dynamic schema support and automatic indexing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    MentDB Projects

    MentDB Projects

    Generalized Interoperability and Strong AI

    MentDB is an open-source platform driving research into next-generation AI and universal data exchange. Our architecture is built around the revolutionary Mentalese Query Language (MQL). MentDB Weak (Generalized Interoperability): A unified data layer enabling seamless data exchange and application integration (SOA, ETL, Data Quality). We eliminate data silos through a single, generalized data language. MentDB Strong (Strong AI / AGI): The framework for exploring and building Machine...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    Metaflow

    Metaflow

    A framework for real-life data science

    Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    dasel

    dasel

    Select, put and delete data from JSON, TOML, YAML, XML and CSV files

    ...Dasel can be imported and used just like any other go package. This can be very useful if you need to manipulate data from your own applications. From then on the rest of the docs and comments should be enough to get you going. Uses a standard query/selector syntax across all data formats. Zero runtime dependencies. Available on Linux, Mac and Windows. Available to import and use in your own projects. Run via Docker.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 20
    PHP Debug Bar

    PHP Debug Bar

    Debug bar for PHP

    The DebugBar integrates easily in any projects and can display profiling data from any part of your application. It comes built-in with data collectors for standard PHP features and popular projects. The DebugBar has two parts: the main DebugBar object with data collectors and the renderer. Data collectors are objects collecting a specific set of data. To makes things easy, the StandardDebugBar has all the built-in collectors activated. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    GeoNode

    GeoNode

    GeoNode is an open source platform for geospatial data

    GeoNode is a geospatial content management system, a platform for the management and publication of geospatial data. It brings together mature and stable open-source software projects under a consistent and easy-to-use interface allowing non-specialized users to share data and create interactive maps. Data management tools built into GeoNode allow for integrated creation of data, metadata, and map visualization. Each dataset in the system can be shared publicly or restricted to allow access to only specific users. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 22
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Bootstrap

    Bootstrap

    HTML, CSS and JS framework for developing responsive websites and apps

    Bootstrap is an HTML, CSS, and JS framework designed for developing responsive, mobile first projects on the web. It's one of the most popular front-end frameworks and for good reason: it simply makes web development faster and easier. People of all skill levels can create projects of all sizes with Bootstrap, and for all types of devices too. With Bootstrap, you get a host of nifty features such as precompiled CSS, impressive scalability, dozens of custom HTML and CSS components and more.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 24
    Gigapipe

    Gigapipe

    The Open-Source Polyglot Observability Warehouse

    Gigapipe is an open-source, polyglot observability platform designed to unify logs, metrics, traces, and profiling data into a single, lightweight system. It serves as an all-in-one alternative to traditional observability stacks by implementing compatibility with widely used standards such as Loki, Prometheus, Tempo, and Pyroscope, allowing it to integrate seamlessly with existing tools and workflows. The platform supports ingestion from multiple sources, including OpenTelemetry and various...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Data Annotator for Machine Learning

    Data Annotator for Machine Learning

    Data annotator for machine learning

    Data annotator for machine learning allows you to centrally create, manage and administer annotation projects for machine learning. Data Annotator for Machine Learning (DAML) is an application that helps machine learning teams facilitate the creation and management of annotations. Active learning with uncertain sampling to query unlabeled data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB