Skip to content
Change the repository type filter

All

    Repositories list

    • spark

      Public
      Apache Spark - A unified analytics engine for large-scale data processing
      Scala
      29k000Updated Mar 17, 2025Mar 17, 2025
    • Open, Multi-modal Catalog for Data & AI
      Python
      503000Updated Mar 12, 2025Mar 12, 2025
    • juicefs

      Public
      JuiceFS is a distributed POSIX file system built on top of Redis and S3.
      Go
      1.1k000Updated Nov 20, 2024Nov 20, 2024
    • delta

      Public
      An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
      Scala
      1.9k000Updated Nov 20, 2024Nov 20, 2024
    • nixpkgs

      Public
      Nix Packages collection & NixOS
      Nix
      16k000Updated Oct 10, 2024Oct 10, 2024
    • Apache YuniKorn Core
      Go
      251000Updated Jun 20, 2024Jun 20, 2024
    • rill

      Public
      Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
      Go
      144000Updated Apr 16, 2024Apr 16, 2024
    • volcano

      Public
      A Cloud Native Batch System (Project under CNCF)
      Go
      1.2k000Updated Jan 24, 2024Jan 24, 2024
    • Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
      TypeScript
      1.4k001Updated Jan 5, 2024Jan 5, 2024
    • kyuubi

      Public
      Apache Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
      Scala
      955000Updated Nov 21, 2023Nov 21, 2023
    • Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
      Go
      1.4k000Updated Jun 5, 2023Jun 5, 2023
    • Smarty
      95000Updated May 16, 2023May 16, 2023
    • The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
      Mustache
      850000Updated Apr 20, 2023Apr 20, 2023
    • Smarty
      32000Updated Jan 30, 2023Jan 30, 2023
    • presidio

      Public
      Context aware, pluggable and customizable data protection and de-identification SDK for text and images
      Python
      706000Updated Jan 17, 2023Jan 17, 2023
    • submarine

      Public
      Submarine is Cloud Native Machine Learning Platform.
      Java
      254000Updated Dec 18, 2022Dec 18, 2022
    • JuiceFS CSI Driver
      Go
      94000Updated Aug 5, 2022Aug 5, 2022
    • hadoop

      Public
      Apache Hadoop
      Java
      9.1k000Updated May 17, 2022May 17, 2022
    • Scala
      412000Updated Feb 26, 2022Feb 26, 2022