SPEZILLM - Using Gemini with the OPENAI or writing own module #90

JJjulle · 2025-04-08T13:09:04Z

JJjulle
Apr 8, 2025

What Stanford Spezi module is your challenge related to?

Other module

Description

I am looking at the possibilities of integrating the Gemini API, mostly due to cost and their multimodal capabilities.
I can see that Gemini provide some OpenAI compatibility, however as I might want to be able to both input video, image, audio and text I am not sure that it will be able. Thus, it might be better to create a new module following the Google AI Swift SDK.
Does anyone have any experiences or does the Stanford team have any guidances?

Reproduction

discussion

Expected behavior

discussion

Additional context

No response

Code of Conduct

I agree to follow this projects's Code of Conduct and Contributing Guidelines

Answered by philippzagar

Apr 17, 2025

Hi @JJjulle,

Thank you for reaching out and for exploring SpeziLLM! We’re excited to hear it’s being considered for your application 🚀

There are two main approaches to integrating the Gemini API with SpeziLLM, depending on the level of integration you’re aiming for:

1. Leverage Gemini’s OpenAI Compatibility Layer (Minimal Changes)

As you mentioned, Gemini offers compatibility with the OpenAI API specification. This allows for a relatively straightforward integration:

You can slightly modify the SpeziLLMOpenAI target to support a configurable API endpoint when initializing either LLMOpenAIPlatform or LLMOpenAISchema. By pointing to Gemini’s OpenAI-compatible endpoint: https://generativelan…

View full answer

PSchmiedmayer · 2025-04-10T00:31:43Z

PSchmiedmayer
Apr 10, 2025
Maintainer

@philippzagar Would be able to provide some more context on this.

0 replies

philippzagar · 2025-04-17T15:31:42Z

philippzagar
Apr 17, 2025
Collaborator

Hi @JJjulle,

Thank you for reaching out and for exploring SpeziLLM! We’re excited to hear it’s being considered for your application 🚀

There are two main approaches to integrating the Gemini API with SpeziLLM, depending on the level of integration you’re aiming for:

1. Leverage Gemini’s OpenAI Compatibility Layer (Minimal Changes)

As you mentioned, Gemini offers compatibility with the OpenAI API specification. This allows for a relatively straightforward integration:

You can slightly modify the SpeziLLMOpenAI target to support a configurable API endpoint when initializing either LLMOpenAIPlatform or LLMOpenAISchema. By pointing to Gemini’s OpenAI-compatible endpoint: https://generativelanguage.googleapis.com/v1beta/openai/ (as documented here), you can use Gemini with SpeziLLM without major changes to the SpeziLLM package. This is the quickest and most lightweight integration path.

2. Build a Native Gemini Integration (Larger Changes)

Alternatively, you can develop a native Gemini integration within SpeziLLM by creating a new target, for example SpeziLLMGoogleGemini. This would involve:

Implementing the core SpeziLLM protocols: LLMPlatform, LLMSchema, LLMSession, etc. for the Gemini API ...
... by using either:
- A custom OpenAPI spec (as Google does not currently provide an official one, see Missing OpenAPI spec google-gemini/cookbook#261) that is used to generate a Swift client via the swift-openapi-generator, or
- The official Gemini protobuf definitions and grpc-swift to generate a gRPC client.

You can use the existing SpeziLLMOpenAI implementation as a reference for structuring your implementation of the SpeziLLM protocols.

Keep in mind that this path involves more significant development effort and will be more time-consuming.

Multimodal Input Support (Image, Video, Audio)

Currently, SpeziLLM is primarily built around text-based interactions using chat-style models, as defined in SpeziChat’s ChatEntity. Support for multimodal input (images, video, audio) is not yet available but is part of our future roadmap. However, this will require substantial architectural changes and will take time to fully support.

Feel free to let us know which direction you’d like to pursue or if you need any further guidance on using or extending SpeziLLM. We’re happy to assist.

1 reply

JJjulle Apr 20, 2025
Author

Thank you very much Philipp!
Great answer, I'll work on it and see what the best fit will be based on the input from the team and what our investigation of the needs will lead to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stanford Spezi

SPEZILLM - Using Gemini with the OPENAI or writing own module #90

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Stanford Spezi

SPEZILLM - Using Gemini with the OPENAI or writing own module #90

Uh oh!

JJjulle Apr 8, 2025

What Stanford Spezi module is your challenge related to?

Description

Reproduction

Expected behavior

Additional context

Code of Conduct

1. Leverage Gemini’s OpenAI Compatibility Layer (Minimal Changes)

Replies: 2 comments · 1 reply

Uh oh!

PSchmiedmayer Apr 10, 2025 Maintainer

Uh oh!

philippzagar Apr 17, 2025 Collaborator

1. Leverage Gemini’s OpenAI Compatibility Layer (Minimal Changes)

2. Build a Native Gemini Integration (Larger Changes)

Multimodal Input Support (Image, Video, Audio)

Uh oh!

JJjulle Apr 20, 2025 Author

JJjulle
Apr 8, 2025

Replies: 2 comments 1 reply

PSchmiedmayer
Apr 10, 2025
Maintainer

philippzagar
Apr 17, 2025
Collaborator

JJjulle Apr 20, 2025
Author