Enhanced Batching and Progress Reporting in SearchIndexingBufferedSender

**Is your feature request related to a problem? Please describe.**
When using `SearchIndexingBufferedSender` to upload a large number of documents (such as 1 billion vectors with text), the current implementation doesn't automatically handle the 64 MB payload size limit and the 32,000 documents count limit per batch. This leads to a need for manual intervention to ensure that documents are indexed successfully, which can be particularly frustrating when running long indexing jobs, potentially overnight.

**Describe the solution you'd like**
I propose the following enhancements to the `SearchIndexingBufferedSender` to make the experience of indexing large datasets more seamless and robust:

- Automatic Handling of Batch Size: Implement an internal mechanism that calculates the cumulative size of the document payload and the count, and automatically flushes the batch when either limit is approached. This should prevent users from encountering issues related to payload size or document count limits.
- Customizable Batch Size: Expose a parameter that allows users to specify their desired batch size or payload size limit. This would give users more control over the batching process and could be used to adjust performance based on their specific use case.
- Progress Reporting: Introduce built-in progress reporting for the upload operation, similar to the [tqdm](https://pypi.org/project/tqdm/) library in Python, which provides a visual progress bar. This feature would be extremely helpful for long-running indexing jobs, giving users a clear indication of the operation's progress.

**Describe alternatives you've considered**
An alternative approach would be to manually chunk the data and implement custom logic for progress reporting and error handling. However, this adds complexity and requires additional development effort, which could be avoided if the SDK provided these capabilities out of the box.

**Additional context**
The goal is to enable users to confidently run a single function to index a massive number of documents without needing to worry about the internal limitations of the service. Enhancing the `SearchIndexingBufferedSender` with the above features would significantly improve the developer experience and the reliability of the indexing process for large datasets.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhanced Batching and Progress Reporting in SearchIndexingBufferedSender #33469

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhanced Batching and Progress Reporting in SearchIndexingBufferedSender #33469

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions