Skip to content

Controlled requests to external inference provider. #1661

Closed
@swtb3-ryder

Description

@swtb3-ryder

Requested feature

When calling an external ollama server, we don't want to overwhelm it. Could we use httpx sephamore to control the number of active requests in a given batch?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions