Skip to content

Implement Durable Search Provider Components for golem:search WIT Interface #32

@jdegoes

Description

@jdegoes

This ticket involves implementing the golem:search interface for several major document and vector search providers. This WIT interface defines a unified abstraction over full-text and metadata-based search functionality, enabling developers to interact with a consistent and well-typed API regardless of provider differences.

The interface supports indexed document storage, structured schema definition, full-text and filtered search, result highlighting, faceting, and pagination. It is designed to degrade gracefully when providers do not support a particular capability, using optional fields and structured error variants.

This task is to implement the interface as a series of WASM components (WASI 0.23) in Rust, following Golem conventions for component development, durability integration, and structured error handling.

Note: The golem:search interface was created after analyzing the WIT-based APIs of leading search systems. If you find improvements or simplifications that could be made, you’re encouraged to propose them with justification.


Target Providers

The following providers are prioritized for implementation:

  • ElasticSearch
    Popular distributed search engine, powerful full-text capabilities, supports scroll-based pagination and flexible schema.

  • OpenSearch
    AWS-backed fork of ElasticSearch with extended features including index lifecycle management, snapshots, and cluster APIs.

  • Algolia
    Developer-friendly hosted search API optimized for instant search and relevance tuning, with support for filters, pagination, and ranking.

  • Typesense
    Lightweight open-source search engine focused on simplicity and speed, supports schema enforcement, vector fields, and filters.

  • Meilisearch
    Modern, fast, and open-source search engine with support for faceting, typo tolerance, and ranked search, now includes vector support.


Deliverables

Each provider must be implemented as a standalone WASM component with full test coverage and integration with Golem’s durability APIs.

Component Artifacts

  • search-elastic.wasm
  • search-opensearch.wasm
  • search-algolia.wasm
  • search-typesense.wasm
  • search-meilisearch.wasm

Each component must:

  • Fully implement the golem:search interface per the WIT spec
  • Compile cleanly with cargo component targeting WASI 0.23
  • Use environment variables for configuration and authentication
  • Integrate Golem durability for consistent and resumable execution
  • Handle unsupported features using search-error.unsupported or option<T> fields

Testing Requirements

All components must be tested for:

  • Index creation and deletion (if supported)
  • Document insert, update, delete, and retrieval
  • Full-text search with filters, sorting, and pagination
  • Highlighted results and facet metadata (where supported)
  • Schema inspection and validation
  • Search streaming behavior and pagination correctness
  • Graceful fallback for unsupported operations
  • Error mappings: invalid input, rate limiting, timeouts, network failures
  • Integration with Golem durability and config handling

Configuration via Environment Variables

Common

  • SEARCH_PROVIDER_ENDPOINT
  • SEARCH_PROVIDER_TIMEOUT (default: 30)
  • SEARCH_PROVIDER_MAX_RETRIES (default: 3)
  • SEARCH_PROVIDER_LOG_LEVEL

Provider-Specific Examples

  • ALGOLIA_APP_ID, ALGOLIA_API_KEY
  • MEILISEARCH_API_KEY
  • ELASTIC_PASSWORD, ELASTIC_CLOUD_ID

Graceful Degradation Strategy

The interface leverages option<T> and search-error.unsupported to enable partial implementations:

  • Providers that don’t support index creation can return unsupported
  • Schema-inspecting APIs may return empty or inferred schema info
  • Facets, highlights, or document scores may be omitted if not available
  • Streaming search can fallback to paginated batches internally
  • Provider-specific features can be safely ignored unless explicitly declared in provider-params

This work enables robust and interoperable search functionality across multiple ecosystems, paving the way for plug-and-play search capabilities in the Golem platform.

package golem:search@1.0.0;

/// Core types and error handling for universal search interfaces
interface types {
  /// Common structured errors for search operations
  variant search-error {
    index-not-found,
    invalid-query(string),
    unsupported,
    internal(string),
    timeout,
    rate-limited,
  }

  /// Identifier types
  type index-name = string;
  type document-id = string;
  type json = string;

  /// Document payload
  record doc {
    id: document-id,
    content: json,
  }

  /// Highlight configuration
  record highlight-config {
    fields: list<string>,
    pre-tag: option<string>,
    post-tag: option<string>,
    max-length: option<u32>,
  }

  /// Advanced search tuning
  record search-config {
    timeout-ms: option<u32>,
    boost-fields: list<tuple<string, f32>>,
    attributes-to-retrieve: list<string>,
    language: option<string>,
    typo-tolerance: option<bool>,
    exact-match-boost: option<f32>,
    provider-params: option<json>,
  }

  /// Search request
  record search-query {
    q: option<string>,
    filters: list<string>,
    sort: list<string>,
    facets: list<string>,
    page: option<u32>,
    per-page: option<u32>,
    offset: option<u32>,
    highlight: option<highlight-config>,
    config: option<search-config>,
  }

  /// Search hit
  record search-hit {
    id: document-id,
    score: option<f64>,
    content: option<json>,
    highlights: option<json>,
  }

  /// Search result set
  record search-results {
    total: option<u32>,
    page: option<u32>,
    per-page: option<u32>,
    hits: list<search-hit>,
    facets: option<json>,
    took-ms: option<u32>,
  }

  /// Field schema types
  enum field-type {
    text,
    keyword,
    integer,
    float,
    boolean,
    date,
    geo-point,
  }

  /// Field definition
  record schema-field {
    name: string,
    type: field-type,
    required: bool,
    facet: bool,
    sort: bool,
    index: bool,
  }

  /// Index schema
  record schema {
    fields: list<schema-field>,
    primary-key: option<string>,
  }
}

/// Unified search interface
interface core {
  use types.{
    index-name, document-id, doc, search-query, search-results,
    search-hit, schema, search-error
  };

  // Index lifecycle
  create-index: func(name: index-name, schema: option<schema>) -> result<_, search-error>;
  delete-index: func(name: index-name) -> result<_, search-error>;
  list-indexes: func() -> result<list<index-name>, search-error>;

  // Document operations
  upsert: func(index: index-name, doc: doc) -> result<_, search-error>;
  upsert-many: func(index: index-name, docs: list<doc>) -> result<_, search-error>;
  delete: func(index: index-name, id: document-id) -> result<_, search-error>;
  delete-many: func(index: index-name, ids: list<document-id>) -> result<_, search-error>;
  get: func(index: index-name, id: document-id) -> result<option<doc>, search-error>;

  // Query
  search: func(index: index-name, query: search-query) -> result<search-results, search-error>;
  stream-search: func(index: index-name, query: search-query) -> result<stream<search-hit>, search-error>;

  // Schema inspection
  get-schema: func(index: index-name) -> result<schema, search-error>;
  update-schema: func(index: index-name, schema: schema) -> result<_, search-error>;
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions