-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Is your feature request related to a problem? Please describe.
Enable response caching for the two generator classes so that if an exception is raised (e.g. RateLimitError
) partway through generation, re-generation of those already-generated responses is not required.
Describe the solution you'd like
Ideally, this would involve a batch_size
or similar parameter for the generate_responses
methods. The prompts would be partitioned and generation would occur in batches (e.g. in a loop). If an exception is raised in batch k, responses 1 through (k-1) would still be available to the user. We are thinking of using the following approach: cache the successfully generated responses from batches 1 through (k-1) and start at batch k in subsequent run of generate_responses
method if failure occurs. Ideally, this would be using something temporary on the filesystem rather than something in memory, like an instance attribute.
Describe alternatives you've considered
Status quo
Additional context
It may be useful to add a time dimension to help avoid RateLimitError
. Specifically, this could involve pausing before starting batch k if batch (k-1) completed in fewer than n seconds. This could be accomplished with a min_time_per_batch
parameter.