Skip to content

Create Rust WASM Components that Provide API to Common Embeddings Providers #16

Open
@jdegoes

Description

@jdegoes

I have attached to this ticket a WIT file that describes a generic interface for text embedding models. This interface can be implemented by various providers, either by emulating features not present in a given provider, utilizing the provider's native support for a feature, or indicating an error if a particular combination is not natively supported by a provider.

The intent of this WIT specification is to allow developers of WASM components (on wasmCloud, Spin, or Golem) to leverage text embedding capabilities to build agents and services in a portable and provider-agnostic fashion.

This ticket involves constructing implementations of this WIT interface for the following providers:

  • OpenAI: Offers the text-embedding-3-large model, widely adopted for generating high-quality text embeddings.​
  • Cohere: Provides embedding models optimized for semantic search and text classification tasks.​
  • Hugging Face: Hosts a vast repository of open-source embedding models, facilitating flexibility and community support.​
  • Voyage AI: Delivers cutting-edge embedding models and rerankers, enhancing search and retrieval capabilities.​

These implementations must be written in Rust and compilable to WASM Components (WASI 0.2 only, since Golem does not yet support WASI 0.3). The standard Rust toolchain for WASM component development can be employed (see cargo component and the Rust examples of components in this and other Golem repositories).

Additionally, these implementations should incorporate custom durability semantics using the Golem durability API and the Golem host API. This approach ensures that durability is managed at the level of individual embedding requests, providing a higher-level and clearer operation log, which aids in debugging and monitoring. See #1507 for more details and associated pull request on implementing a library with custom durability.

The final deliverables associated with this ticket are:

  • OpenAI implementation: A WASM Component (WASI 0.2), named embed-openai.wasm, with a full test suite and custom durability implementation at the level of embedding requests.​
  • Cohere implementation: A WASM Component (WASI 0.2), named embed-cohere.wasm, with a full test suite and custom durability implementation at the level of embedding requests.​
  • Hugging Face implementation: A WASM Component (WASI 0.2), named embed-huggingface.wasm, with a full test suite and custom durability implementation at the level of embedding requests.​
  • Voyage AI implementation: A WASM Component (WASI 0.2), named embed-voyageai.wasm, with a full test suite and custom durability implementation at the level of embedding requests.​

These components will require runtime configuration, notably API keys. For configuring this information, the components can use environment variables for now (in the future, they will use wasi-runtime-config, but Golem does not support this yet, whereas Golem has good support for environment variables).

Moreover, the Rust components need to be tested within Golem to ensure compatibility with Golem 1.2.x.

This WIT has been designed by examining and comparing the APIs of OpenAI, Cohere, Hugging Face, and Voyage AI. However, given there are no implementations, it is possible the provided WIT is not the optimal abstraction across all these providers. Therefore, deviations from the proposed design can be made. However, to be accepted, any deviation must be fully justified and deemed by Golem core contributors to be an improvement from the original specification.

package golem:embed@1.0.0;

interface embed {
  // --- Enums ---

  enum task-type {
    retrieval-query,
    retrieval-document,
    semantic-similarity,
    classification,
    clustering,
    question-answering,
    fact-verification,
    code-retrieval,
  }

  enum output-format {
    float-array,
    binary,
    base64,
  }

  enum output-dtype {
    float32,
    int8,
    uint8,
    binary,
    ubinary,
  }

  enum error-code {
    invalid-request,
    model-not-found,
    unsupported,
    provider-error,
    rate-limit-exceeded,
    internal-error,
    unknown,
  }

  // --- Content ---

  record image-url {
    url: string,
  }

  variant content-part {
    text(string),
    image(image-url),
  }

  // --- Configuration ---

  record kv {
    key: string,
    value: string,
  }

  record config {
    model: option<string>,
    task-type: option<task-type>,
    dimensions: option<u32>,
    truncation: option<boolean>,
    output-format: option<output-format>,
    output-dtype: option<output-dtype>,
    user: option<string>,
    provider-options: list<kv>,
  }

  // --- Embedding Response ---

  record usage {
    input-tokens: option<u32>,
    total-tokens: option<u32>,
  }

  record embedding {
    index: u32,
    vector: list<float32>,
  }

  record embedding-response {
    embeddings: list<embedding>,
    usage: option<usage>,
    model: string,
    provider-metadata-json: option<string>,
  }

  // --- Rerank Response ---

  record rerank-result {
    index: u32,
    relevance-score: float32,
    document: option<string>,
  }

  record rerank-response {
    results: list<rerank-result>,
    usage: option<usage>,
    model: string,
    provider-metadata-json: option<string>,
  }

  // --- Error Handling ---

  record error {
    code: error-code,
    message: string,
    provider-error-json: option<string>,
  }

  // --- Core Functions ---

  generate: func(
    inputs: list<content-part>,
    config: config
  ) -> result<embedding-response, error>;

  rerank: func(
    query: string,
    documents: list<string>,
    config: config
  ) -> result<rerank-response, error>;
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions