Batches V2 Implementation / Managed Files #9632

krrishdholakia · 2025-03-29T17:34:22Z

krrishdholakia
Mar 29, 2025
Maintainer

Writing down an idea for an improvement on the current batches implementation. Key challanges:

allow going across providers / deployments when using files
consistent cost tracking with the same 'litellm model name'

tldr; if we store the .jsonl in the Proxy DB, we could easily go across providers by doing the create/batch operation when required on that deployment

1. Setup config.yaml

model_list:
   - model_name: gpt-4o-batch
      litellm_params:
        model: azure/gpt-4o-mle-deployment
        api_key: os.environ/AZURE_API_KEY
        api_base: os.environ/AZURE_API_BASE
      model_info:
        mode: batch

2. User sends file to proxy

curl http://localhost:4000/v1/files \
  -H "Authorization: Bearer sk-1234" \
  -F purpose="fine-tune" \
  -F file="@mydata.jsonl" # {"model": "gpt-4o-batch", "litellm_params": {..}..}

File stored in proxy DB, as model is a proxy model name.
All models under ‘model_name’ validated to check if they can serve purpose

3. User can create batch request

curl http://localhost:4000/v1/batches \
        -H "Authorization: Bearer sk-1234" \
        -H "Content-Type: application/json" \
        -d '{
            "input_file_id": "file-abc123",
            "endpoint": "/v1/chat/completions",
            "completion_window": "24h"
    }'

Available deployment picked
File created in model region
Batch created for file
Response returned

krrishdholakia · 2025-03-29T17:35:46Z

krrishdholakia
Mar 29, 2025
Maintainer Author

cc: @taralika @msabramo @suresiva @dkondoetsy

0 replies

krrishdholakia · 2025-03-29T22:59:19Z

krrishdholakia
Mar 29, 2025
Maintainer Author

Gemini has a nice implementation at the SDK level of something like this.

Noting down for reference

from google import genai

client = genai.Client()

myfile = client.files.upload(file='media/sample.mp3')

response = client.models.generate_content(
  model='gemini-2.0-flash',
  contents=['Describe this audio clip', myfile]
)

print(response.text)

0 replies

jaivep2323 · 2025-03-31T14:10:56Z

jaivep2323
Mar 31, 2025

Thanks for brainstorming on this @krrishdholakia. I have 2 questions:

A file can be as large as 200MB. Are you planning on saving the entire file in DB? I feel that might consume a lot of storage in the DB. May be you can just extract the required info from the file and store in DB.
With this design are you proposing the users should register the model in batch mode with LiteLLM and use the LiteLLM Public/ VIrtual Name in the uploaded file instead of the Azure deployment names? This might not work since when OpenAI opens the file for execution it will see the LIteLLM model name which might be entirely different from the deployment name and the batches will fail. In your example if you upload the mydata.jsonl file to OpenAI without replacing the model name gpt-4o-batch with gpt-4o-mle-deployment the batch will fail with error like "model_not_found". May be I am not understanding you correctly.

1 reply

krrishdholakia Mar 31, 2025
Maintainer Author

Hey @jaivep2323

Agreed. I think we can store the file in an s3/gcs bucket, and just store the reference to it in the DB.
Yes - this would allow using just the public model name in the .jsonl. We would replace the values in the .jsonl with the provider specific model name. Benefit here is that the developer does not need to know the azure deployment name for eg., just what's available via /model/info

jaivep2323 · 2025-03-31T19:21:42Z

jaivep2323
Mar 31, 2025

Thanks for clarifying Krrish!

Agreed. I think we can store the file in an s3/gcs bucket, and just store the reference to it in the DB.

Follow up question on the first part. Does that mean you would enforce a requirement that users should provide a S3/ GCS/ Azure Blob location in order to use Batch processing with LiteLLM?

Yes - this would allow using just the public model name in the .jsonl. We would replace the values in the .jsonl with the provider specific model name. Benefit here is that the developer does not need to know the azure deployment name for eg., just what's available via /model/info

This would be very helpful!

0 replies

krrishdholakia · 2025-03-31T19:51:56Z

krrishdholakia
Mar 31, 2025
Maintainer Author

Updating based on feedback from another user:

"don't increase latency for /chat/completion request with file id"
"can assume developer knows which models they want to target ahead of time"

Updated file creation flow:

file = client.files.create(
    file=wav_data,
    purpose="user_data",
    extra_body={"target_model_names": ["gemini-2.0-flash", "gpt-3.5-turbo"]} # copy files in this moment, store 'file_id' reference in db
)

Just specify target model names in file creation
LiteLLM will write the file to the relevant places
LiteLLM generates a unique file id - maintains a mapping to provider-specific location

I think this will remove the need for a s3/gcs bucket for litellm @jaivep2323

2 replies

jaivep2323 Apr 1, 2025

Yeah, that makes sense. One last clarification. Will LiteLLM reject the request if the target model name is not found in the LiteLLM model list? just to ensure we capture the case where users pass gpt 3.5 to the target_model_names list but their file has gpt 4.5, letting it run this way will lead to cost mismatches.

krrishdholakia Apr 1, 2025
Maintainer Author

Good point - if the target_model_name is not in the list or one they have access to - it should raise an error, as it wouldn't be clear where to write this data.

krrishdholakia · 2025-04-03T00:03:43Z

krrishdholakia
Apr 3, 2025
Maintainer Author

Noticed:

Vertex AI / Bedrock don't have direct API's for File upload - we need to upload to GCS / S3.
Initial VertexAI file API support is for Batches only - this doesn't work for the use-case of uploading files for querying in chat completions

0 replies

krrishdholakia · 2025-04-03T05:36:30Z

krrishdholakia
Apr 3, 2025
Maintainer Author

Initial PR here

0 replies

krrishdholakia · 2025-04-03T19:00:25Z

krrishdholakia
Apr 3, 2025
Maintainer Author

DAY 2 Work: Files API
- GCS File storage
- Store file in db
- CRUD endpoints for File ID
- Show stored Files on UI

0 replies

krrishdholakia · 2025-04-12T15:25:44Z

krrishdholakia
Apr 12, 2025
Maintainer Author

More improvements:

Support CRUD Endpoints for LiteLLM File ID - Support CRUD endpoints for Managed Files #9924
Store LiteLLM File ID in DB Litellm add managed files db #9930

0 replies

krrishdholakia · 2025-05-10T18:59:45Z

krrishdholakia
May 10, 2025
Maintainer Author

v1.69.0 will be live tomorrow. This release brings LiteLLM Managed File support to Batches, making it easier to call + control access to Azure batch endpoints.

LiteLLM Managed Files let's you:

Loadbalance across multiple Azure Batch deployments
Control batch model access by key/user/team (same as chat completion models)

Here's a guide on how to use LiteLLM with LiteLLM Managed Files - https://docs.litellm.ai/docs/proxy/managed_batches.

0 replies

krrishdholakia · 2025-05-17T16:12:35Z

krrishdholakia
May 17, 2025
Maintainer Author

Closing as this is now live.

0 replies

Uh oh!

Batches V2 Implementation / Managed Files #9632

Uh oh!

Uh oh!

krrishdholakia Mar 29, 2025 Maintainer

1. Setup config.yaml

2. User sends file to proxy

3. User can create batch request

Replies: 11 comments · 3 replies

Uh oh!

Uh oh!

krrishdholakia Mar 29, 2025 Maintainer Author

Uh oh!

krrishdholakia Mar 29, 2025 Maintainer Author

Uh oh!

Uh oh!

jaivep2323 Mar 31, 2025

Uh oh!

krrishdholakia Mar 31, 2025 Maintainer Author

Uh oh!

jaivep2323 Mar 31, 2025

Uh oh!

krrishdholakia Mar 31, 2025 Maintainer Author

Uh oh!

jaivep2323 Apr 1, 2025

Uh oh!

krrishdholakia Apr 1, 2025 Maintainer Author

Uh oh!

krrishdholakia Apr 3, 2025 Maintainer Author

Uh oh!

krrishdholakia Apr 3, 2025 Maintainer Author

Uh oh!

krrishdholakia Apr 3, 2025 Maintainer Author

Uh oh!

krrishdholakia Apr 12, 2025 Maintainer Author

Uh oh!

krrishdholakia May 10, 2025 Maintainer Author

Uh oh!

krrishdholakia May 17, 2025 Maintainer Author

krrishdholakia
Mar 29, 2025
Maintainer

Replies: 11 comments 3 replies

krrishdholakia
Mar 29, 2025
Maintainer Author

krrishdholakia
Mar 29, 2025
Maintainer Author

jaivep2323
Mar 31, 2025

krrishdholakia Mar 31, 2025
Maintainer Author

jaivep2323
Mar 31, 2025

krrishdholakia
Mar 31, 2025
Maintainer Author

krrishdholakia Apr 1, 2025
Maintainer Author

krrishdholakia
Apr 3, 2025
Maintainer Author

krrishdholakia
Apr 3, 2025
Maintainer Author

krrishdholakia
Apr 3, 2025
Maintainer Author

krrishdholakia
Apr 12, 2025
Maintainer Author

krrishdholakia
May 10, 2025
Maintainer Author

krrishdholakia
May 17, 2025
Maintainer Author