Support for external Model Registries #2402

syntaxsdev · 2025-06-05T17:08:44Z

syntaxsdev
Jun 5, 2025

Llama-stack could benefit from external model registries.

As of my knowledge, the only "close" concept of that is when you query (v1/models) which lives under each inference provider.
This means these models are essentially a sort of access for what is "available at runtime". What about the models available in colder storage?

I realize the value add might be a little limited for now, but in combination with future efforts with modelsio and the topic on Kubeflow as a provider discussions it could be a great feature.

Essentially, something along the lines of:

model_registry:
  type: registry_type
  config:
    url: http://localhost
    port: 8000

Which could either be added under existing (v1/models) or a newer endpoint - say (v1/registry/models) since v1/models is reserved for models attached to a provider AFAIK (like ollama, vllm, ...etc)

You can already define manual, static models via this example:

llama-stack/llama_stack/templates/ollama/run.yaml

Lines 124 to 129 in 3251b44

    
           - metadata: 
        
               embedding_dimension: 384 
        
             model_id: all-MiniLM-L6-v2 
        
             provider_id: ollama 
        
             provider_model_id: all-minilm:latest 
        
             model_type: embedding

What are you alls thought on this? For starts, a very common registry we can support would be the KF Model Registry.

bbrowning · 2025-06-06T00:06:58Z

bbrowning
Jun 6, 2025
Collaborator

Today, the models endpoint is a list of models available for inference or embedding usage within Llama Stack. If a client could also see some representation of models on disk or in some remote location, is there an example of what the Llama Stack client would do with that information? In other words, where would they use these models in "colder storage" for subsequent requests to the Llama Stack APIs?

3 replies

syntaxsdev Jun 9, 2025
Author

@bbrowning Thanks for your reply.
Registries can provide more than just a listing endpoint. It can also provide an endpoint to registering new models, such as ones post training or fine tuned. In addition, we can support efforts like this:

In addition, with registries, they can provide other metadata such as:

checkpoints to which model is running
other indexable data (like querying models based on a benchmark score stored in metadata)
allow out of tree workflows to using those "colder stored" models to become available to be served or inferenced

Some model registries already support this kinda workflow (Huggingface, Kubeflow's Model Registry and Kserve, for example)

bbrowning Jun 10, 2025
Collaborator

This is about model registries as a general concept, and I agree they do provide useful features. I may have misunderstood this proposal initially - are you proposing Llama Stack become a full-fledged model registry (or at least the API for one?) with multiple backend model registry providers? Or are you proposing concrete changes to the existing Models API of Llama Stack to enable additional use-cases?

syntaxsdev Jun 11, 2025
Author

So this discussion is to get that started. I believe it could be better as an API now that I've done more research around it.

At first maybe II thought changes to make Models API more agnostic, but I've pivoted that the "registry" concept itself should potentially be an API. This comment has some general use case examples I see as immediately viable in case you haven't seen it.

Based on that, I want to know what the community thinks is better for this implementation if decide to use?

mattf · 2025-06-08T00:44:10Z

mattf
Jun 8, 2025

consider ollama, you have 3 levels of models available to you -

ollama ps shows the models currently loaded for inference
ollama list shows models downloaded to your system
https://ollama.com/models shows models you could download

as a llama stack user you don't need to concern yourself with (1) & (2), but if you want a model from (3) you need to ollama pull it and then llama-stack-client model register it (or https://llama-stack.readthedocs.io/en/latest/references/api_reference/index.html#/paths/v1-models/post).

note: embedding models are an exception and will be pulled automatically, https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L352

consider that models from https://ollama.com/models won't necessarily work if used from the vllm provider or nvidia provider.

how would the registry work with ollama?

1 reply

syntaxsdev Jun 9, 2025
Author

@mattf
Since the registry would be independent of the inference provider, there's no direct connection "yet" to the providers. However, if you have a model llama.cpp compatible (or in Modelfile) format, you can perform out of tree code with that registry to pull that model and make it available to ollama. The modelsio discussion is also good, implementing both could reduce that out of tree code as well.. which would bring it closer to ideal functionality.

As far as how it "plugs ins - in general", that's the info I hope to get out of this discussion. I have played around with implementing patches using DistributionRegistry and other components that allows for a "plugin" style solution, however - the contributors might feel that a Provider or other type of structure might be better if they agree.

syntaxsdev · 2025-06-10T16:01:51Z

syntaxsdev
Jun 10, 2025
Author

@bbrowning An example workflow:

client.registry.register_model(dir=..., name=..., version=..., metadata=...)
client.registry.register_artifact(...)

client.registry.list() # -> list of colder stored models/artifacts

->

[
  Model(identifier='llama3.2:1b-financial', metadata={"financial_literacy": 0.92}, ...),
  Model(identifier='llama3.2:1b-law', metadata={"law": 0.87})
]

Usage (some out of tree code):

financial_llm = next((model for model in client.registry.list() if model.metadata.financial_literacy > 0.9), None)
await out_of_tree_deployment(financial_llm, ...) # some function

Eventually, we can build on top with some sort of modelsio or similar concept.
That will allow us to:

Allow trained or custom local models to be stored directly to S3 or OCI formats
Combines well with registry concept, can eventually provide direct impl for authenticating or handle its own storing

That can also lead us to being able to potentially integrate with Inference providers and we can directly see which checkpoints, versions are deployed.

[
  Model(identifier='llama3.2:1b-financial', metadata={"financial_literacy": 0.92}, deployed=False, ...),
  Model(identifier='llama3.2:1b-law', metadata={"law": 0.87}, deployed=True)
]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for external Model Registries #2402

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Support for external Model Registries #2402

Uh oh!

syntaxsdev Jun 5, 2025

Replies: 3 comments · 4 replies

Uh oh!

bbrowning Jun 6, 2025 Collaborator

Uh oh!

syntaxsdev Jun 9, 2025 Author

Uh oh!

bbrowning Jun 10, 2025 Collaborator

Uh oh!

syntaxsdev Jun 11, 2025 Author

Uh oh!

mattf Jun 8, 2025

Uh oh!

Uh oh!

syntaxsdev Jun 9, 2025 Author

Uh oh!

Uh oh!

syntaxsdev Jun 10, 2025 Author

syntaxsdev
Jun 5, 2025

Replies: 3 comments 4 replies

bbrowning
Jun 6, 2025
Collaborator

syntaxsdev Jun 9, 2025
Author

bbrowning Jun 10, 2025
Collaborator

syntaxsdev Jun 11, 2025
Author

mattf
Jun 8, 2025

syntaxsdev Jun 9, 2025
Author

syntaxsdev
Jun 10, 2025
Author