Skip to content

🎅 I WISH genai gateway HAD... #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mirodrr2 opened this issue Jan 2, 2025 · 16 comments
Open

🎅 I WISH genai gateway HAD... #16

mirodrr2 opened this issue Jan 2, 2025 · 16 comments

Comments

@mirodrr2
Copy link
Contributor

mirodrr2 commented Jan 2, 2025

This is a ticket to track a wishlist of items you wish genai gateway had.

COMMENT BELOW 👇
With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs
Respond with ❤️ to any request you would also like to see

@mirodrr2 mirodrr2 pinned this issue Jan 2, 2025
@FireballDWF
Copy link

framework to compare outputs/latency from different LLMs in an "experiment" format (feedback from trusek@)

@athewsey
Copy link

Since the solution is CDK-based, it'd be great to package and publish the construct(s) in a construct library (on PyPI, NPM, etc) so we can just use it as a component within our own deployments!

@lsawaniewski
Copy link

Hi
First of all, thanks for the great work, I've been looking for a solution like this for a while.

Now a question/request.
If I understand correctly, for bedrock I have 2 endpoints here:

  • /bedrock/model/{model_id}/converse

  • /bedrock/model/{model_id}/converse-stream

My specific use-case needs support for endpoints:

  • /bedrock/model/{model_id}/invoke
  • /bedrock/model/{model_id}/invoke-with-response-stream

Unfortunately, simply replacing it with converse won't do the trick, so maybe with your current knowledge you can add support for them or indicate what should be included if I wanted to implement them on my side?
Or maybe I missed something?
I would be grateful for a hint.

@mirodrr2
Copy link
Contributor Author

mirodrr2 commented Mar 10, 2025

Hi
First of all, thanks for the great work, I've been looking for a solution like this for a while.

Now a question/request.
If I understand correctly, for bedrock I have 2 endpoints here:

  • /bedrock/model/{model_id}/converse

  • /bedrock/model/{model_id}/converse-stream

My specific use-case needs support for endpoints:

  • /bedrock/model/{model_id}/invoke
  • /bedrock/model/{model_id}/invoke-with-response-stream

Unfortunately, simply replacing it with converse won't do the trick, so maybe with your current knowledge you can add support for them or indicate what should be included if I wanted to implement them on my side?
Or maybe I missed something?
I would be grateful for a hint.

Right now, we only support converse api. I'm wondering: why do you need the older invoke APIs? Would like to better understand your needs there.

I'm open to adding it in the future if it would bring some value. I would have to add logic to translate from the invoke api format to the OpenAI format before passing the request to litellm, just as I have done for the converse APIs

If you're looking into doing this yourself in your own project (or opening a pr in this one), look at the middleware/app.py file: https://github.com/aws-samples/genai-gateway/blob/main/middleware/app.py

That's where I'm currently doing the translation logic for the converse endpoints

@lsawaniewski
Copy link

I'm wondering: why do you need the older invoke APIs?

Here the answer is quite trivial, I'm trying to create a gateway for an application whose code I can't change, and it uses these old endpoints. The only thing I can influence is the URL.

At this point I can add that I tried to simply change the path in the gateway and the conversion from Bedrock -> OpenAI looked good at first glance. Things started to break down when returning to Bedrock format (especially for stream).

@mirodrr2
Copy link
Contributor Author

Okay, I will look into how difficult it is to support invoke model. My main concern is that, unlike converse which has a consistent format across all models, I think invoke model differs a lot per model. So it may not be simple to implement

@mirodrr2
Copy link
Contributor Author

If you tell me what model you're using, perhaps I can just initially support that one to get you unblocked

The way I see it, is I will basically need to support all bedrock invoke model formats, detect which format is being used, and then do the conversion based on that? Wondering if you have any thoughts/suggestions here
@lsawaniewski

@lsawaniewski
Copy link

@mirodrr2 thanks for the instant response!
TBH, I focused more on Response Syntax in docs because that was crashing for me. It's hard to say what it looks like for requests and different models.

What I care about most is integration with LiteLLM, I don't have (yet) too many requirements regarding models - mainly gpt-4o and claude (3.5, 3.7).

@mirodrr2
Copy link
Contributor Author

Okay, as a 1.0 version for support for this feature, I can focus on supporting invoke model in Claude format non-streaming conversion to litellm (which would allow you to call gpt-4o and any other model you want as well)

Can't give you a timeline, but would that unblock you? Or do you also need streaming

@lsawaniewski
Copy link

The mentioned app hits both endpoints so it's a partial success, but I appreciate any help.

@mirodrr2
Copy link
Contributor Author

mirodrr2 commented Mar 10, 2025

Also, are you not able to change the code at all?

Because you will still need to make some adjustments to your client instantiation code to inject the api key as detailed in the readme

There might be some way to disable litellm auth though. If you're in an isolated network, that could be a possible solution

@lsawaniewski
Copy link

Yes, I am aware of that, but it is a separate issue that I will have to somehow get around.

For now, for initial testing purposes, I simply hardcoded it on the gateway side, but you are absolutely right. Isolating the network seems like a sensible approach in this case.

@mirodrr2
Copy link
Contributor Author

@lsawaniewski , can you send me a code sample of the exact format you're using to call invoke_model? Would help me with testing

@lsawaniewski
Copy link

@mirodrr2 I'll check what I can do, but I have limited access to it myself and I only rely on logs/requests on the gateway side. 🙄
Btw, I wouldn't like to hijack the whole thread just for my request, wouldn't you like to create a separate issue?

@mirodrr2
Copy link
Contributor Author

Please create an issue if you can. Good idea

@mirodrr2
Copy link
Contributor Author

Made an issue:
#109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants