API Cost Discussion #1527

spammenotinoz · 2024-03-06T01:22:28Z

spammenotinoz
Mar 6, 2024

I enjoy this interface and the GPT4 Turbo model over ChatGPT, however even with low GPT-4 Turbo usage, my API costs are high ($3-6 per day), so is a lot more expensive for myself than ChatGPT. Which is great for this project as it offers easy access to cheaper models.

Curious though, even for low usage, would there be any benefit it deploying an API cache, something like "https://github.com/zilliztech/GPTCache"

Or are there any services that offer lower cost access to GPT-4 via caching, other means?

sebiweise · 2024-03-06T06:14:43Z

sebiweise
Mar 6, 2024

You could also try to use Cloudflare's AI Gateway here.
It's really easy to use and as far as I know it's free :)

1 reply

spammenotinoz Mar 6, 2024
Author

Thank-you, that is a really interesting idea.
I should be able to deploy the gateway and configure within ChatBot-AI by hi-jacking the Azure_OpenAI environment variables to use the new gateway endpoint.
Will it reduce my API usage, is another thing.. but interesting approach.

spammenotinoz · 2024-03-06T09:05:09Z

spammenotinoz
Mar 6, 2024
Author

Thank-you, seems to be working very well!!
I did four queries across two models and 1 cache hit!!
I didn't go down the Azure Path, instead modified "app/api/chat/openai/route.ts"

const openai = new OpenAI({
  apiKey: profile.openai_api_key || "",
  organization: profile.openai_organization_id,
  baseURL: "<my openai gateway, not universal gateway from cloudflare>"      //added this line  
})

Ideally should be an environment variable but hey, just testing. Thankyou again for your great idea.

2 replies

haydenkong May 28, 2024

Hello, How do you implement CloudFlare API gateway to chatbot ui?

spammenotinoz May 28, 2024
Author

Refer #1565

spammenotinoz · 2024-03-08T06:46:15Z

spammenotinoz
Mar 8, 2024
Author

Hi, I re-opened this for further ideas and discussions.
Additional actions I have taken;

Hide the GPT4 Model (GPT-4 turbo is still available), this accounts for 98% of mu usage
Set the default model to GPT-3.5 Turbo #Update: Default is now claude-3-haiku-20240307. Anthropic give you $5 credits, they go a long way.
To improve cache hits, I commented out the line that adds the "Today is "
// fullPrompt += Today is ${new Date().toLocaleDateString()}.\n\n
Note: Caching hasn't really been beneficial from a cost savings, but means I don't pay for test prompts\when I am consistent and it provides greater visibility into the operation so sticking with it.
8 days in and I am at $41 for my families usage. Might need to go back to sharing a plus account or blocking the GPT4 Turbo model.
Teenage kids!!!

0 replies

spammenotinoz · 2024-03-08T10:48:14Z

spammenotinoz
Mar 8, 2024
Author

Update: by using the API gateway I was able to identify and hopefully address the ROOT cause.
TRAINING: My kids only ever used the same conversation. For the uneducated like myself, behind the scenes in a conversation for each and EVERY prompt, ChatBot-UI sends the entire conversation history back.
When you think about it, it makes sense, how else can ChatGPT continue a conversation and stay on topic.
Downside is for users that don't regularly start new conversations, that is a metric ton of tokens!!!!!

0 replies

Jonneal3 · 2024-03-09T03:08:14Z

Jonneal3
Mar 9, 2024

@spammenotinoz Have you looked into hosted versions of Llama or Mistral? They are open source so its likely to be way cheaper than GPT. Also, Groq hosts them and they are insanely fast too. I see @mckaywrigley just added groq to the models as well!

3 replies

spammenotinoz Mar 11, 2024
Author

Yes, I personally use Groq. I also use models from OpenRouter, where I find the pricing fair and reasonable compared to running my own local models.
Guess others don't care when they don't pay the bills. I should just pay the $8 each and send them to Mckay's hosted version. I would do that but would send him broke.

Firstly I can't code, so I may be way off.
But from a quick review all the heavy lifting and logic is already there.

The existing code includes all previous prompts, newest until oldest until context.length is reached.
So I just did a rough change to only perform 30 iterations. But may set the limit as a system variable or something. Could have simply reduced the context.length but that would also limit attachments.

Jonneal3 Mar 13, 2024

@spammenotinoz Yeah i was going to recommend OpenRouter. Thats baked into ChatbotUI somewhere as well, as I think thats where the models get created in the codebase so far as i can tell.

Whos Matt? lol

Have you looked into anyscale?

spammenotinoz Mar 14, 2024
Author

Sorry I meant Mckay not Matt, late night. I can't code, but get somewhat addicted to a problem, have fun working through and next thing it's morning.
Re: Anyscale, thanks for the TIP, I may need to disable GPT-4-Turbo if the problem re-occurs.

spammenotinoz · 2024-03-13T04:25:27Z

spammenotinoz
Mar 13, 2024
Author

The code change appears to have had a big impact on the cost\token count.

0 replies

spammenotinoz · 2024-03-14T23:49:06Z

spammenotinoz
Mar 14, 2024
Author

Anyone interested in GPT-4\GPT-4 Turbo cost reduction and better outputs when attachments are used, please see this PULL request from
@fkesheh #1518 it's working really well.
Specifically for myself, this address a problem of users with large files in their conversation, which in a long conversation can quickly compound. Simplying it but the rephrase element, uses GPT-3.5-Turbo (super cheap) to extract the relevant content, and supply the reduced results to the more expensive models such as GPT-4\GPT-4 Turbo which can be better at the creative output.
Well that is my take.
Second part (which I have disabled for now), seems to use GPT-3.5-Turbo to re-rank the results from GPT-4 to ensure they are relevant. Again seems to work super well and improve relevance).

0 replies

Uh oh!

API Cost Discussion #1527

Uh oh!

spammenotinoz Mar 6, 2024

Replies: 7 comments · 6 replies

Uh oh!

sebiweise Mar 6, 2024

Uh oh!

spammenotinoz Mar 6, 2024 Author

Uh oh!

Uh oh!

spammenotinoz Mar 6, 2024 Author

Uh oh!

haydenkong May 28, 2024

Uh oh!

spammenotinoz May 28, 2024 Author

Uh oh!

Uh oh!

spammenotinoz Mar 8, 2024 Author

Uh oh!

spammenotinoz Mar 8, 2024 Author

Uh oh!

Jonneal3 Mar 9, 2024

Uh oh!

Uh oh!

spammenotinoz Mar 11, 2024 Author

Uh oh!

Jonneal3 Mar 13, 2024

Uh oh!

spammenotinoz Mar 14, 2024 Author

Uh oh!

spammenotinoz Mar 13, 2024 Author

Uh oh!

Uh oh!

spammenotinoz Mar 14, 2024 Author

spammenotinoz
Mar 6, 2024

Replies: 7 comments 6 replies

sebiweise
Mar 6, 2024

spammenotinoz Mar 6, 2024
Author

spammenotinoz
Mar 6, 2024
Author

spammenotinoz May 28, 2024
Author

spammenotinoz
Mar 8, 2024
Author

spammenotinoz
Mar 8, 2024
Author

Jonneal3
Mar 9, 2024

spammenotinoz Mar 11, 2024
Author

spammenotinoz Mar 14, 2024
Author

spammenotinoz
Mar 13, 2024
Author

spammenotinoz
Mar 14, 2024
Author