Vortex is a high-performance toolkit designed for working with compressed Apache Arrow arrays, providing functionality for in-memory, on-disk, and over-the-wire data handling. It aims to be an advanced successor to Apache Parquet, offering dramatically faster random access reads and scans, while maintaining similar compression ratios. Vortex's modular design allows for extensibility, enabling developers to implement custom encodings for efficient data management, particularly for large-scale columnar datasets.
Features
- Zero-copy integration with Arrow: Enables efficient conversion between Vortex and Apache Arrow arrays
- Extensible encoding: Supports a range of highly data-parallel encodings, including FastLanes and FSST
- Cascading compression: Allows data to be recursively compressed with nested encodings for optimized storage
- Pluggable compression strategies: Default strategy uses BtrBlocks but can be replaced with custom approaches
- Random access performance: Delivers 100-200x faster random access reads compared to traditional formats like Parquet
- Statistics and compute: Includes compute kernels and lazily computed statistics for enhanced query performance
Categories
Data FormatsLicense
Apache License V2.0Follow Vortex
Other Useful Business Software
Run applications fast and securely in a fully managed environment
Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of Vortex!