Skip to content

Implement Durable Graph Database Provider Components for golem:graph WIT Interface #22

Open
@jdegoes

Description

@jdegoes

I have attached to this ticket a WIT file that describes a generic interface for graph database operations. This interface can be implemented by various providers, either by emulating features not present in a given provider, utilizing the provider's native support for a feature, or indicating an error if a particular combination is not natively supported by a provider.

The intent of this WIT specification is to allow developers of WASM components (on wasmCloud, Spin, or Golem) to leverage graph database capabilities to build graph-powered applications, knowledge graphs, and relationship analysis services in a portable and provider-agnostic fashion.

This ticket involves constructing implementations of this WIT interface for the following providers:

  • Neo4j: The leading property graph database with comprehensive Cypher query language support, ACID transactions, and rich schema management capabilities.
  • ArangoDB: A multi-model database with strong graph capabilities, collection-based data organization, and AQL query language support.
  • JanusGraph: A distributed, highly scalable graph database with pluggable storage backends and comprehensive Gremlin traversal support.

These implementations must be written in Rust and compilable to WASM Components (WASI 0.23 only, since Golem does not yet support WASI 0.3). The standard Rust toolchain for WASM component development can be employed (see cargo component and the Rust examples of components in this and other Golem repositories).

Additionally, these implementations should incorporate custom durability semantics using the Golem durability API and the Golem host API. This approach ensures that durability is managed at the level of individual graph operations (create-vertex, create-edge, query execution, transaction commit/rollback), providing a higher-level and clearer operation log, which aids in debugging and monitoring. See golem:llm and golem:embed for more details and durable implementations in this same repository.

The final deliverables associated with this ticket are:

  • Neo4j implementation: A WASM Component (WASI 0.23), named graph-neo4j.wasm, with a full test suite and custom durability implementation at the level of graph operations.
  • ArangoDB implementation: A WASM Component (WASI 0.23), named graph-arangodb.wasm, with a full test suite and custom durability implementation at the level of graph operations.
  • JanusGraph implementation: A WASM Component (WASI 0.23), named graph-janusgraph.wasm, with a full test suite and custom durability implementation at the level of graph operations.

Note: If you have a strong recommendation to swap out one or two of these with other popular / common graph databases, then as long as you get permission beforehand, that's okay with me. However, we definitely need Neo4j and ArangoDB.

These components will require runtime configuration, notably connection strings, authentication credentials, database names, and endpoint URLs. For configuring this information, the components can use environment variables for now (in the future, they will use wasi-runtime-config, but Golem does not support this yet, whereas Golem has good support for environment variables).

Moreover, the Rust components need to be tested within Golem to ensure compatibility with Golem 1.2.x.

This WIT has been designed by examining and comparing the APIs of Neo4j, ArangoDB, NebulaGraph, Amazon Neptune, TigerGraph, and JanusGraph. However, given there are no implementations, it is possible the provided WIT is not the optimal abstraction across all these providers. Therefore, deviations from the proposed design can be made. However, to be accepted, any deviation must be fully justified and deemed by Golem core contributors to be an improvement from the original specification.

Implementation Guidelines

Each provider implementation should handle the following key mapping considerations:

  • Vertex Types: Map the vertex-type field appropriately (to labels for Neo4j, collections for ArangoDB, vertex labels for JanusGraph, etc.)
  • Transaction Semantics: Implement native transactions where supported, or emulate via sequential operations with appropriate error handling
  • Schema Management: Utilize native schema capabilities where available, or return unsupported-operation errors for unsupported schema operations
  • Query Language Support: Route queries through the generic query interface using each provider's native query language (Cypher, AQL, Gremlin, etc.)
  • Error Mapping: Map provider-specific errors to the unified graph-error enumeration
  • Property Type Conversion: Handle conversion between the unified property type system and provider-specific type systems

Testing Requirements

Each implementation must include comprehensive test suites covering:

  • Basic CRUD operations (vertex/edge creation, retrieval, update, deletion)
  • Transaction lifecycle (begin, commit, rollback)
  • Schema operations (type definition, index creation, constraint management)
  • Query execution with various complexity levels
  • Traversal operations (pathfinding, neighborhood exploration)
  • Error handling for unsupported operations
  • Connection management and configuration
  • Durability semantics verification
package golem:graph@1.0.0;

/// Core data types and structures unified across graph databases
interface types {
    /// Universal property value types that can be represented across all graph databases
    variant property-value {
        null-value,
        boolean(bool),
        int8(s8),
        int16(s16), 
        int32(s32),
        int64(s64),
        uint8(u8),
        uint16(u16),
        uint32(u32),
        uint64(u64),
        float32(f32),
        float64(f64),
        string(string),
        bytes(list<u8>),
        
        // Temporal types (unified representation)
        date(date),
        time(time),
        datetime(datetime),
        duration(duration),
        
        // Geospatial types (unified GeoJSON-like representation)
        point(point),
        linestring(linestring),
        polygon(polygon),
        
        // Collection types
        list(list<property-value>),
        map(list<tuple<string, property-value>>),
        set(list<property-value>),
    }

    /// Temporal types with unified representation
    record date {
        year: u32,
        month: u8,  // 1-12
        day: u8,    // 1-31
    }

    record time {
        hour: u8,        // 0-23
        minute: u8,      // 0-59
        second: u8,      // 0-59
        nanosecond: u32, // 0-999,999,999
    }

    record datetime {
        date: date,
        time: time,
        timezone-offset-minutes: option<s16>, // UTC offset in minutes
    }

    record duration {
        seconds: s64,
        nanoseconds: u32,
    }

    /// Geospatial types (WGS84 coordinates)
    record point {
        longitude: f64,
        latitude: f64,
        altitude: option<f64>,
    }

    record linestring {
        coordinates: list<point>,
    }

    record polygon {
        exterior: list<point>,
        holes: option<list<list<point>>>,
    }

    /// Universal element ID that can represent various database ID schemes
    variant element-id {
        string(string),
        int64(s64),
        uuid(string),
        composite(list<property-value>),
    }

    /// Property map - consistent with insertion format
    type property-map = list<tuple<string, property-value>>;

    /// Vertex representation
    record vertex {
        id: element-id,
        vertex-type: string,         // Primary type (collection/tag/label)
        additional-labels: list<string>, // Secondary labels (Neo4j-style)
        properties: property-map,
    }

    /// Edge representation
    record edge {
        id: element-id,
        edge-type: string,           // Edge type/relationship type
        from-vertex: element-id,
        to-vertex: element-id,
        properties: property-map,
    }

    /// Path through the graph
    record path {
        vertices: list<vertex>,
        edges: list<edge>,
        length: u32,
    }

    /// Direction for traversals
    enum direction {
        outgoing,
        incoming,
        both,
    }

    /// Comparison operators for filtering
    enum comparison-operator {
        equal,
        not-equal,
        less-than,
        less-than-or-equal,
        greater-than,
        greater-than-or-equal,
        contains,
        starts-with,
        ends-with,
        regex-match,
        in-list,
        not-in-list,
    }

    /// Filter condition for queries
    record filter-condition {
        property: string,
        operator: comparison-operator,
        value: property-value,
    }

    /// Sort specification
    record sort-spec {
        property: string,
        ascending: bool,
    }
}

/// Error handling unified across all graph database providers
interface errors {
    /// Comprehensive error types that can represent failures across different graph databases
    variant graph-error {
        // Feature/operation not supported by current provider
        unsupported-operation(string),
        
        // Connection and authentication errors
        connection-failed(string),
        authentication-failed(string),
        authorization-failed(string),
        
        // Data and schema errors
        element-not-found(element-id),
        duplicate-element(element-id),
        schema-violation(string),
        constraint-violation(string),
        invalid-property-type(string),
        invalid-query(string),
        
        // Transaction errors
        transaction-failed(string),
        transaction-conflict,
        transaction-timeout,
        deadlock-detected,
        
        // System errors
        timeout,
        resource-exhausted(string),
        internal-error(string),
        service-unavailable(string),
    }
}

/// Connection management and graph instance creation
interface connection {
    use errors.{graph-error};

    /// Configuration for connecting to graph databases
    record connection-config {
        // Connection parameters
        hosts: list<string>,
        port: option<u16>,
        database-name: option<string>,
        
        // Authentication
        username: option<string>,
        password: option<string>,
        
        // Connection behavior
        timeout-seconds: option<u32>,
        max-connections: option<u32>,
        
        // Provider-specific configuration as key-value pairs
        provider-config: list<tuple<string, string>>,
    }

    /// Main graph database resource
    resource graph {
        /// Create a new transaction for performing operations
        begin-transaction: func() -> result<transaction, graph-error>;
        
        /// Create a read-only transaction (may be optimized by provider)
        begin-read-transaction: func() -> result<transaction, graph-error>;
        
        /// Test connection health
        ping: func() -> result<_, graph-error>;
        
        /// Close the graph connection
        close: func() -> result<_, graph-error>;
        
        /// Get basic graph statistics if supported
        get-statistics: func() -> result<graph-statistics, graph-error>;
    }

    /// Basic graph statistics
    record graph-statistics {
        vertex-count: option<u64>,
        edge-count: option<u64>,
        label-count: option<u32>,
        property-count: option<u64>,
    }

    /// Connect to a graph database with the specified configuration
    connect: func(config: connection-config) -> result<graph, graph-error>;
}

/// All graph operations performed within transaction contexts
interface transactions {
    use types.{vertex, edge, path, element-id, property-map, property-value, filter-condition, sort-spec, direction};
    use errors.{graph-error};

    /// Transaction resource - all operations go through transactions
    resource transaction {
        // === VERTEX OPERATIONS ===
        
        /// Create a new vertex
        create-vertex: func(vertex-type: string, properties: property-map) -> result<vertex, graph-error>;
        
        /// Create vertex with additional labels (for multi-label systems like Neo4j)
        create-vertex-with-labels: func(vertex-type: string, additional-labels: list<string>, properties: property-map) -> result<vertex, graph-error>;
        
        /// Get vertex by ID
        get-vertex: func(id: element-id) -> result<option<vertex>, graph-error>;
        
        /// Update vertex properties (replaces all properties)
        update-vertex: func(id: element-id, properties: property-map) -> result<vertex, graph-error>;
        
        /// Update specific vertex properties (partial update)
        update-vertex-properties: func(id: element-id, updates: property-map) -> result<vertex, graph-error>;
        
        /// Delete vertex (and optionally its edges)
        delete-vertex: func(id: element-id, delete-edges: bool) -> result<_, graph-error>;
        
        /// Find vertices by type and optional filters
        find-vertices: func(
            vertex-type: option<string>,
            filters: option<list<filter-condition>>,
            sort: option<list<sort-spec>>,
            limit: option<u32>,
            offset: option<u32>
        ) -> result<list<vertex>, graph-error>;

        // === EDGE OPERATIONS ===
        
        /// Create a new edge
        create-edge: func(
            edge-type: string,
            from-vertex: element-id,
            to-vertex: element-id,
            properties: property-map
        ) -> result<edge, graph-error>;
        
        /// Get edge by ID
        get-edge: func(id: element-id) -> result<option<edge>, graph-error>;
        
        /// Update edge properties
        update-edge: func(id: element-id, properties: property-map) -> result<edge, graph-error>;
        
        /// Update specific edge properties (partial update)
        update-edge-properties: func(id: element-id, updates: property-map) -> result<edge, graph-error>;
        
        /// Delete edge
        delete-edge: func(id: element-id) -> result<_, graph-error>;
        
        /// Find edges by type and optional filters
        find-edges: func(
            edge-types: option<list<string>>,
            filters: option<list<filter-condition>>,
            sort: option<list<sort-spec>>,
            limit: option<u32>,
            offset: option<u32>
        ) -> result<list<edge>, graph-error>;

        // === TRAVERSAL OPERATIONS ===
        
        /// Get adjacent vertices through specified edge types
        get-adjacent-vertices: func(
            vertex-id: element-id,
            direction: direction,
            edge-types: option<list<string>>,
            limit: option<u32>
        ) -> result<list<vertex>, graph-error>;
        
        /// Get edges connected to a vertex
        get-connected-edges: func(
            vertex-id: element-id,
            direction: direction,
            edge-types: option<list<string>>,
            limit: option<u32>
        ) -> result<list<edge>, graph-error>;

        // === BATCH OPERATIONS ===
        
        /// Create multiple vertices in a single operation
        create-vertices: func(vertices: list<vertex-spec>) -> result<list<vertex>, graph-error>;
        
        /// Create multiple edges in a single operation
        create-edges: func(edges: list<edge-spec>) -> result<list<edge>, graph-error>;
        
        /// Upsert vertex (create or update)
        upsert-vertex: func(
            id: option<element-id>,
            vertex-type: string,
            properties: property-map
        ) -> result<vertex, graph-error>;
        
        /// Upsert edge (create or update)
        upsert-edge: func(
            id: option<element-id>,
            edge-type: string,
            from-vertex: element-id,
            to-vertex: element-id,
            properties: property-map
        ) -> result<edge, graph-error>;

        // === TRANSACTION CONTROL ===
        
        /// Commit the transaction
        commit: func() -> result<_, graph-error>;
        
        /// Rollback the transaction
        rollback: func() -> result<_, graph-error>;
        
        /// Check if transaction is still active
        is-active: func() -> bool;
    }

    /// Vertex specification for batch creation
    record vertex-spec {
        vertex-type: string,
        additional-labels: option<list<string>>,
        properties: property-map,
    }

    /// Edge specification for batch creation
    record edge-spec {
        edge-type: string,
        from-vertex: element-id,
        to-vertex: element-id,
        properties: property-map,
    }
}

/// Schema management operations (optional/emulated for schema-free databases)
interface schema {
    use types.{property-value};
    use errors.{graph-error};
    use connection.{edge-type-definition};

    /// Property type definitions for schema
    enum property-type {
        boolean,
        int32,
        int64,
        float32,
        float64,
        string,
        bytes,
        date,
        datetime,
        point,
        list,
        map,
    }

    /// Index types
    enum index-type {
        exact,      // Exact match index
        range,      // Range queries (>, <, etc.)
        text,       // Text search
        geospatial, // Geographic queries
    }

    /// Property definition for schema
    record property-definition {
        name: string,
        type: property-type,
        required: bool,
        unique: bool,
        default-value: option<property-value>,
    }

    /// Vertex label schema
    record vertex-label-schema {
        label: string,
        properties: list<property-definition>,
        /// Container/collection this label maps to (for container-based systems)
        container: option<string>,
    }

    /// Edge label schema
    record edge-label-schema {
        label: string,
        properties: list<property-definition>,
        from-labels: option<list<string>>, // Allowed source vertex labels
        to-labels: option<list<string>>,   // Allowed target vertex labels
        /// Container/collection this label maps to (for container-based systems)
        container: option<string>,
    }

    /// Index definition
    record index-definition {
        name: string,
        label: string,          // Vertex or edge label
        properties: list<string>, // Properties to index
        type: index-type,
        unique: bool,
        /// Container/collection this index applies to
        container: option<string>,
    }

    /// Schema management resource
    resource schema-manager {
        /// Define or update vertex label schema
        define-vertex-label: func(schema: vertex-label-schema) -> result<_, graph-error>;
        
        /// Define or update edge label schema
        define-edge-label: func(schema: edge-label-schema) -> result<_, graph-error>;
        
        /// Get vertex label schema
        get-vertex-label-schema: func(label: string) -> result<option<vertex-label-schema>, graph-error>;
        
        /// Get edge label schema
        get-edge-label-schema: func(label: string) -> result<option<edge-label-schema>, graph-error>;
        
        /// List all vertex labels
        list-vertex-labels: func() -> result<list<string>, graph-error>;
        
        /// List all edge labels
        list-edge-labels: func() -> result<list<string>, graph-error>;
        
        /// Create index
        create-index: func(index: index-definition) -> result<_, graph-error>;
        
        /// Drop index
        drop-index: func(name: string) -> result<_, graph-error>;
        
        /// List indexes
        list-indexes: func() -> result<list<index-definition>, graph-error>;
        
        /// Get index by name
        get-index: func(name: string) -> result<option<index-definition>, graph-error>;
        
        /// Define edge type for structural databases (ArangoDB-style)
        define-edge-type: func(definition: edge-type-definition) -> result<_, graph-error>;
        
        /// List edge type definitions
        list-edge-types: func() -> result<list<edge-type-definition>, graph-error>;
        
        /// Create container/collection for organizing data
        create-container: func(name: string, container-type: container-type) -> result<_, graph-error>;
        
        /// List containers/collections
        list-containers: func() -> result<list<container-info>, graph-error>;
    }

    /// Container/collection types
    enum container-type {
        vertex-container,
        edge-container,
    }

    /// Container information
    record container-info {
        name: string,
        type: container-type,
        element-count: option<u64>,
    }

    /// Get schema manager for the graph
    get-schema-manager: func() -> result<schema-manager, graph-error>;
}

/// Generic query interface for database-specific query languages
interface query {
    use types.{vertex, edge, path, property-value};
    use errors.{graph-error};
    use transactions.{transaction};

    /// Query result that maintains symmetry with data insertion formats
    variant query-result {
        vertices(list<vertex>),
        edges(list<edge>),
        paths(list<path>),
        values(list<property-value>),
        maps(list<list<tuple<string, property-value>>>), // For tabular results
    }

    /// Query parameters for parameterized queries
    type query-parameters = list<tuple<string, property-value>>;

    /// Query execution options
    record query-options {
        timeout-seconds: option<u32>,
        max-results: option<u32>,
        explain: bool,     // Return execution plan instead of results
        profile: bool,     // Include performance metrics
    }

    /// Query execution result with metadata
    record query-execution-result {
        result: query-result,
        execution-time-ms: option<u32>,
        rows-affected: option<u32>,
        explanation: option<string>,  // Execution plan if requested
        profile-data: option<string>, // Performance data if requested
    }

    /// Execute a database-specific query string
    execute-query: func(
        transaction: borrow<transaction>,
        query: string,
        parameters: option<query-parameters>,
        options: option<query-options>
    ) -> result<query-execution-result, graph-error>;
}

/// Graph traversal and pathfinding operations
interface traversal {
    use types.{vertex, edge, path, element-id, direction, filter-condition};
    use errors.{graph-error};
    use transactions.{transaction};

    /// Path finding options
    record path-options {
        max-depth: option<u32>,
        edge-types: option<list<string>>,
        vertex-types: option<list<string>>,
        vertex-filters: option<list<filter-condition>>,
        edge-filters: option<list<filter-condition>>,
    }

    /// Neighborhood exploration options
    record neighborhood-options {
        depth: u32,
        direction: direction,
        edge-types: option<list<string>>,
        max-vertices: option<u32>,
    }

    /// Subgraph containing related vertices and edges
    record subgraph {
        vertices: list<vertex>,
        edges: list<edge>,
    }

    /// Find shortest path between two vertices
    find-shortest-path: func(
        transaction: borrow<transaction>,
        from: element-id,
        to: element-id,
        options: option<path-options>
    ) -> result<option<path>, graph-error>;

    /// Find all paths between two vertices (up to limit)
    find-all-paths: func(
        transaction: borrow<transaction>,
        from: element-id,
        to: element-id,
        options: option<path-options>,
        limit: option<u32>
    ) -> result<list<path>, graph-error>;

    /// Get k-hop neighborhood around a vertex
    get-neighborhood: func(
        transaction: borrow<transaction>,
        center: element-id,
        options: neighborhood-options
    ) -> result<subgraph, graph-error>;

    /// Check if path exists between vertices
    path-exists: func(
        transaction: borrow<transaction>,
        from: element-id,
        to: element-id,
        options: option<path-options>
    ) -> result<bool, graph-error>;

    /// Get vertices at specific distance from source
    get-vertices-at-distance: func(
        transaction: borrow<transaction>,
        source: element-id,
        distance: u32,
        direction: direction,
        edge-types: option<list<string>>
    ) -> result<list<vertex>, graph-error>;
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions