Skip to content
Change the repository type filter

All

    Repositories list

    • velox

      Public
      A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
      C++
      1.4k281021Updated Aug 10, 2025Aug 10, 2025
    • raydp

      Public
      RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
      Python
      743434013Updated Jul 13, 2025Jul 13, 2025
    • oap-mllib

      Public
      Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
      Scala
      1222361Updated Jun 20, 2025Jun 20, 2025
    • .github

      Public
      0000Updated Aug 19, 2024Aug 19, 2024
    • vllm-fork

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      9.3k000Updated Jul 23, 2024Jul 23, 2024
    • text2sql-gluten

      Public archive
      Python
      4500Updated Jul 11, 2024Jul 11, 2024
    • English SDK for Apache Spark
      Python
      135101Updated Jul 9, 2024Jul 9, 2024
    • libhdfs3

      Public
      HDFS file read access for ClickHouse
      C++
      59200Updated Jul 5, 2024Jul 5, 2024
    • oap-tools

      Public archive
      Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
      Jupyter Notebook
      131792Updated Mar 27, 2024Mar 27, 2024
    • Gluten: Plugin to Double SparkSQL's Performance
      Scala
      564000Updated Mar 26, 2024Mar 26, 2024
    • remote-shuffle

      Public archive
      Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
      Scala
      132130Updated Mar 15, 2024Mar 15, 2024
    • protobuf

      Public
      A Intel customized Protocol Buffers - Google's data interchange format
      C++
      16k001Updated Nov 21, 2023Nov 21, 2023
    • Gluten-Trino

      Public archive
      Gluten: Plugin to Boost Trino's Performance
      Java
      197461Updated Oct 25, 2023Oct 25, 2023
    • cloudtik

      Public archive
      Cloud Scale Platform for Distributed Analytics and AI
      Python
      52411Updated Oct 12, 2023Oct 12, 2023
    • pmem-shuffle

      Public archive
      Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote persistent memory (for read) to provide extremely high performance and low latency shuffle solutions for Spark*.
      C++
      914151Updated Sep 18, 2023Sep 18, 2023
    • recdp

      Public archive
      Python
      5210Updated Sep 18, 2023Sep 18, 2023
    • oap-project.github.io

      Public archive
      The OAP project web site
      HTML
      4000Updated Sep 5, 2023Sep 5, 2023
    • arrow

      Public
      Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…
      C++
      3.8k6021Updated May 18, 2023May 18, 2023
    • gazelle_plugin

      Public archive
      Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
      Scala
      7425719124Updated Feb 21, 2023Feb 21, 2023
    • solution-navigator

      Public archive
      Example solutions or code for using OAP features.
      Jupyter Notebook
      3000Updated Jan 25, 2023Jan 25, 2023
    • sql-ds-cache

      Public archive
      Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
      Scala
      2637154Updated Jan 3, 2023Jan 3, 2023
    • libhdfs3-downstream

      Public archive
      a native c/c++ hdfs client (downstream fork from apache-hawq)
      C++
      51000Updated Jan 3, 2023Jan 3, 2023
    • arrow-data-source

      Public archive
      Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
      Scala
      10630Updated Jan 3, 2023Jan 3, 2023
    • pmem-spill

      Public archive
      Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.
      Scala
      57111Updated Dec 15, 2021Dec 15, 2021
    • pmem-common

      Public archive
      Common library for accessing PMEM native library functions including memkind, vmemcache and so on.
      Java
      7331Updated Dec 14, 2021Dec 14, 2021