Optimized Analytics Package for Spark Platform (OAP)

All

25 repositories

velox
Public
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
C++
•
Apache License 2.0
•1.4k•28•10•21•Updated Aug 10, 2025Aug 10, 2025
raydp
Public
RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
ray spark
Python
•
Apache License 2.0
•74•343•40•13•Updated Jul 13, 2025Jul 13, 2025
oap-mllib
Public
Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
Scala
•
Apache License 2.0
•12•22•36•1•Updated Jun 20, 2025Jun 20, 2025
.github
Public
Other
•0•0•0•0•Updated Aug 19, 2024Aug 19, 2024
vllm-fork
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•9.3k•0•0•0•Updated Jul 23, 2024Jul 23, 2024
text2sql-gluten
Public archive
Python
•4•5•0•0•Updated Jul 11, 2024Jul 11, 2024
pyspark-ai
Public
English SDK for Apache Spark
Python
•
Apache License 2.0
•135•1•0•1•Updated Jul 9, 2024Jul 9, 2024
libhdfs3
Public
HDFS file read access for ClickHouse
C++
•
Apache License 2.0
•59•2•0•0•Updated Jul 5, 2024Jul 5, 2024
oap-tools
Public archive
Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
Jupyter Notebook
•
Apache License 2.0
•13•17•9•2•Updated Mar 27, 2024Mar 27, 2024
spark-ai-kit
Public
Gluten: Plugin to Double SparkSQL's Performance
Scala
•
Apache License 2.0
•564•0•0•0•Updated Mar 26, 2024Mar 26, 2024
remote-shuffle
Public archive
Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
Scala
•
Apache License 2.0
•13•21•3•0•Updated Mar 15, 2024Mar 15, 2024
protobuf
Public
A Intel customized Protocol Buffers - Google's data interchange format
C++
•
Other
•16k•0•0•1•Updated Nov 21, 2023Nov 21, 2023
Gluten-Trino
Public archive
Gluten: Plugin to Boost Trino's Performance
Java
•
Apache License 2.0
•19•74•6•1•Updated Oct 25, 2023Oct 25, 2023
cloudtik
Public archive
Cloud Scale Platform for Distributed Analytics and AI
machine-learning cloud ai spark deep-learning analytics alibabacloud kubernetes aws azure
Python
•
Apache License 2.0
•5•24•1•1•Updated Oct 12, 2023Oct 12, 2023
pmem-shuffle
Public archive
Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote persistent memory (for read) to provide extremely high performance and low latency shuffle solutions for Spark*.
C++
•
Apache License 2.0
•9•14•15•1•Updated Sep 18, 2023Sep 18, 2023
recdp
Public archive
Python
•
Apache License 2.0
•5•2•1•0•Updated Sep 18, 2023Sep 18, 2023
oap-project.github.io
Public archive
The OAP project web site
HTML
•
Apache License 2.0
•4•0•0•0•Updated Sep 5, 2023Sep 5, 2023
arrow
Public
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…
C++
•
Apache License 2.0
•3.8k•6•0•21•Updated May 18, 2023May 18, 2023
gazelle_plugin
Public archive
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
spark arrow native-sql-engine vectorized-simd-optimizations native-kernels
Scala
•
Apache License 2.0
•74•257•191•24•Updated Feb 21, 2023Feb 21, 2023
solution-navigator
Public archive
Example solutions or code for using OAP features.
Jupyter Notebook
•
Apache License 2.0
•3•0•0•0•Updated Jan 25, 2023Jan 25, 2023
sql-ds-cache
Public archive
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
Scala
•
Apache License 2.0
•26•37•15•4•Updated Jan 3, 2023Jan 3, 2023
libhdfs3-downstream
Public archive
a native c/c++ hdfs client (downstream fork from apache-hawq)
C++
•
Apache License 2.0
•51•0•0•0•Updated Jan 3, 2023Jan 3, 2023
arrow-data-source
Public archive
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
Scala
•
Apache License 2.0
•10•6•3•0•Updated Jan 3, 2023Jan 3, 2023
pmem-spill
Public archive
Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.
Scala
•
Apache License 2.0
•5•7•11•1•Updated Dec 15, 2021Dec 15, 2021
pmem-common
Public archive
Common library for accessing PMEM native library functions including memkind, vmemcache and so on.
Java
•
Apache License 2.0
•7•3•3•1•Updated Dec 14, 2021Dec 14, 2021