Support for external Model Registries #2402
Replies: 3 comments 4 replies
-
Today, the models endpoint is a list of models available for inference or embedding usage within Llama Stack. If a client could also see some representation of models on disk or in some remote location, is there an example of what the Llama Stack client would do with that information? In other words, where would they use these models in "colder storage" for subsequent requests to the Llama Stack APIs? |
Beta Was this translation helpful? Give feedback.
-
consider ollama, you have 3 levels of models available to you -
as a llama stack user you don't need to concern yourself with (1) & (2), but if you want a model from (3) you need to note: embedding models are an exception and will be pulled automatically, https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L352 consider that models from https://ollama.com/models won't necessarily work if used from the vllm provider or nvidia provider. how would the registry work with ollama? |
Beta Was this translation helpful? Give feedback.
-
@bbrowning An example workflow: client.registry.register_model(dir=..., name=..., version=..., metadata=...)
client.registry.register_artifact(...)
client.registry.list() # -> list of colder stored models/artifacts -> [
Model(identifier='llama3.2:1b-financial', metadata={"financial_literacy": 0.92}, ...),
Model(identifier='llama3.2:1b-law', metadata={"law": 0.87})
] Usage (some out of tree code): financial_llm = next((model for model in client.registry.list() if model.metadata.financial_literacy > 0.9), None)
await out_of_tree_deployment(financial_llm, ...) # some function Eventually, we can build on top with some sort of
That can also lead us to being able to potentially integrate with Inference providers and we can directly see which checkpoints, versions are deployed. [
Model(identifier='llama3.2:1b-financial', metadata={"financial_literacy": 0.92}, deployed=False, ...),
Model(identifier='llama3.2:1b-law', metadata={"law": 0.87}, deployed=True)
] |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Llama-stack could benefit from external model registries.
As of my knowledge, the only "close" concept of that is when you query (v1/models) which lives under each inference provider.
This means these models are essentially a sort of access for what is "available at runtime". What about the models available in colder storage?
I realize the value add might be a little limited for now, but in combination with future efforts with modelsio and the topic on Kubeflow as a provider discussions it could be a great feature.
Essentially, something along the lines of:
Which could either be added under existing (
v1/models
) or a newer endpoint - say (v1/registry/models
) since v1/models is reserved for models attached to a provider AFAIK (like ollama, vllm, ...etc)You can already define manual, static models via this example:
llama-stack/llama_stack/templates/ollama/run.yaml
Lines 124 to 129 in 3251b44
What are you alls thought on this? For starts, a very common registry we can support would be the KF Model Registry.
Beta Was this translation helpful? Give feedback.
All reactions