Skip to content

Commit cdeb7af

Browse files
Add API rate limit handler (#371)
* Add rate limit handler * Update LLM interfaces * Update Changelog and docs * Add unit tests for rate limit handler * Improve rate limit handler * Remove decorators from absract methods and add them to methods of LLM provider classes * Improve documentation * Improve wait strategy for concurrent mode * Fix tenacity dependency * Ruff * Simplify is rate limit error * Update doc related to VertexAILLM * Update custom_llm.py * Fix linter issues * Fix more linter issues
1 parent e6c5c9a commit cdeb7af

File tree

17 files changed

+677
-17
lines changed

17 files changed

+677
-17
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- Support for Python 3.13
1414
- Added support for automatic schema extraction from text using LLMs. In the `SimpleKGPipeline`, when the user provides no schema, the automatic schema extraction is enabled by default.
1515
- Added ability to return a user-defined message if context is empty in GraphRAG (which skips the LLM call).
16+
- Added automatic rate limiting with retry logic and exponential backoff for all LLM providers using tenacity. The `RateLimitHandler` interface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely.
1617

1718
### Fixed
1819

docs/source/api.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,28 @@ MistralAILLM
347347
:members:
348348

349349

350+
Rate Limiting
351+
=============
352+
353+
RateLimitHandler
354+
----------------
355+
356+
.. autoclass:: neo4j_graphrag.llm.rate_limit.RateLimitHandler
357+
:members:
358+
359+
RetryRateLimitHandler
360+
---------------------
361+
362+
.. autoclass:: neo4j_graphrag.llm.rate_limit.RetryRateLimitHandler
363+
:members:
364+
365+
NoOpRateLimitHandler
366+
--------------------
367+
368+
.. autoclass:: neo4j_graphrag.llm.rate_limit.NoOpRateLimitHandler
369+
:members:
370+
371+
350372
PromptTemplate
351373
==============
352374

@@ -473,6 +495,8 @@ Errors
473495

474496
* :class:`neo4j_graphrag.exceptions.LLMGenerationError`
475497

498+
* :class:`neo4j_graphrag.exceptions.RateLimitError`
499+
476500
* :class:`neo4j_graphrag.exceptions.SchemaValidationError`
477501

478502
* :class:`neo4j_graphrag.exceptions.PdfLoaderError`
@@ -597,6 +621,13 @@ LLMGenerationError
597621
:show-inheritance:
598622

599623

624+
RateLimitError
625+
==============
626+
627+
.. autoclass:: neo4j_graphrag.exceptions.RateLimitError
628+
:show-inheritance:
629+
630+
600631
SchemaValidationError
601632
=====================
602633

docs/source/user_guide_rag.rst

Lines changed: 87 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,15 +125,15 @@ To use VertexAI, instantiate the `VertexAILLM` class:
125125
126126
generation_config = GenerationConfig(temperature=0.0)
127127
llm = VertexAILLM(
128-
model_name="gemini-1.5-flash-001", generation_config=generation_config
128+
model_name="gemini-2.5-flash", generation_config=generation_config
129129
)
130130
llm.invoke("say something")
131131
132132
133133
.. note::
134134

135135
In order to run this code, the `google-cloud-aiplatform` Python package needs to be installed:
136-
`pip install "neo4j_grpahrag[vertexai]"`
136+
`pip install "neo4j_graphrag[google]"`
137137

138138

139139
See :ref:`vertexaillm`.
@@ -294,6 +294,91 @@ Here's an example using the Python Ollama client:
294294
See :ref:`llminterface`.
295295

296296

297+
Rate Limit Handling
298+
===================
299+
300+
All LLM implementations include automatic rate limiting that uses retry logic with exponential backoff by default. This feature helps handle API rate limits from LLM providers gracefully by automatically retrying failed requests with increasing wait times between attempts.
301+
302+
Default Rate Limit Handler
303+
--------------------------
304+
305+
Rate limiting is enabled by default for all LLM instances with the following configuration:
306+
307+
- **Max attempts**: 3
308+
- **Min wait**: 1.0 seconds
309+
- **Max wait**: 60.0 seconds
310+
- **Multiplier**: 2.0 (exponential backoff)
311+
312+
.. code:: python
313+
314+
from neo4j_graphrag.llm import OpenAILLM
315+
316+
# Rate limiting is automatically enabled
317+
llm = OpenAILLM(model_name="gpt-4o")
318+
319+
# The LLM will automatically retry on rate limit errors
320+
response = llm.invoke("Hello, world!")
321+
322+
.. note::
323+
324+
To change the default configuration of `RetryRateLimitHandler`:
325+
326+
.. code:: python
327+
328+
from neo4j_graphrag.llm import OpenAILLM
329+
from neo4j_graphrag.llm.rate_limit import RetryRateLimitHandler
330+
331+
# Customize rate limiting parameters
332+
llm = OpenAILLM(
333+
model_name="gpt-4o",
334+
rate_limit_handler=RetryRateLimitHandler(
335+
max_attempts=10, # Increase max retry attempts
336+
min_wait=2.0, # Increase minimum wait time
337+
max_wait=120.0, # Increase maximum wait time
338+
multiplier=3.0 # More aggressive backoff
339+
)
340+
)
341+
342+
Custom Rate Limiting
343+
--------------------
344+
345+
You can customize the rate limiting behavior by creating your own rate limit handler:
346+
347+
.. code:: python
348+
349+
from neo4j_graphrag.llm import AnthropicLLM
350+
from neo4j_graphrag.llm.rate_limit import RateLimitHandler
351+
352+
class CustomRateLimitHandler(RateLimitHandler):
353+
"""Implement your custom rate limiting strategy."""
354+
# Implement required methods: handle_sync, handle_async
355+
pass
356+
357+
# Create custom rate limit handler and pass it to the LLM interface
358+
custom_handler = CustomRateLimitHandler()
359+
360+
llm = AnthropicLLM(
361+
model_name="claude-3-sonnet-20240229",
362+
rate_limit_handler=custom_handler,
363+
)
364+
365+
Disabling Rate Limiting
366+
-----------------------
367+
368+
For high-throughput applications or when you handle rate limiting externally, you can disable it:
369+
370+
.. code:: python
371+
372+
from neo4j_graphrag.llm import CohereLLM, NoOpRateLimitHandler
373+
374+
# Disable rate limiting completely
375+
llm = CohereLLM(
376+
model_name="command-r-plus",
377+
rate_limit_handler=NoOpRateLimitHandler(),
378+
)
379+
llm.invoke("Hello, world!")
380+
381+
297382
Configuring the Prompt
298383
========================
299384

examples/customize/llms/custom_llm.py

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
import random
22
import string
3-
from typing import Any, List, Optional, Union
3+
from typing import Any, Awaitable, Callable, List, Optional, TypeVar, Union
44

55
from neo4j_graphrag.llm import LLMInterface, LLMResponse
6+
from neo4j_graphrag.llm.rate_limit import (
7+
RateLimitHandler,
8+
# rate_limit_handler,
9+
# async_rate_limit_handler,
10+
)
611
from neo4j_graphrag.message_history import MessageHistory
712
from neo4j_graphrag.types import LLMMessage
813

@@ -13,6 +18,8 @@ def __init__(
1318
):
1419
super().__init__(model_name, **kwargs)
1520

21+
# Optional: Apply rate limit handling to synchronous invoke method
22+
# @rate_limit_handler
1623
def invoke(
1724
self,
1825
input: str,
@@ -24,6 +31,8 @@ def invoke(
2431
)
2532
return LLMResponse(content=content)
2633

34+
# Optional: Apply rate limit handling to asynchronous ainvoke method
35+
# @async_rate_limit_handler
2736
async def ainvoke(
2837
self,
2938
input: str,
@@ -33,6 +42,33 @@ async def ainvoke(
3342
raise NotImplementedError()
3443

3544

36-
llm = CustomLLM("")
45+
llm = CustomLLM(
46+
""
47+
) # if rate_limit_handler and async_rate_limit_handler decorators are used, the default rate limit handler will be applied automatically (retry with exponential backoff)
3748
res: LLMResponse = llm.invoke("text")
3849
print(res.content)
50+
51+
# If rate_limit_handler and async_rate_limit_handler decorators are used and you want to use a custom rate limit handler
52+
# Type variables for function signatures used in rate limit handlers
53+
F = TypeVar("F", bound=Callable[..., Any])
54+
AF = TypeVar("AF", bound=Callable[..., Awaitable[Any]])
55+
56+
57+
class CustomRateLimitHandler(RateLimitHandler):
58+
def __init__(self) -> None:
59+
super().__init__()
60+
61+
def handle_sync(self, func: F) -> F:
62+
# error handling here
63+
return func
64+
65+
def handle_async(self, func: AF) -> AF:
66+
# error handling here
67+
return func
68+
69+
70+
llm_with_custom_rate_limit_handler = CustomLLM(
71+
"", rate_limit_handler=CustomRateLimitHandler()
72+
)
73+
result: LLMResponse = llm_with_custom_rate_limit_handler.invoke("text")
74+
print(result.content)

poetry.lock

Lines changed: 5 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ scipy = [
6060
{ version = "^1.13.0", python = ">=3.9,<3.13" },
6161
{ version = "^1.15.0", python = ">=3.13,<3.14" }
6262
]
63+
tenacity = "^9.1.2"
6364

6465
[tool.poetry.group.dev.dependencies]
6566
urllib3 = "<2"

src/neo4j_graphrag/exceptions.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,3 +138,7 @@ class InvalidHybridSearchRankerError(Neo4jGraphRagError):
138138

139139
class SearchQueryParseError(Neo4jGraphRagError):
140140
"""Exception raised when there is a query parse error in the text search string."""
141+
142+
143+
class RateLimitError(LLMGenerationError):
144+
"""Exception raised when API rate limit is exceeded."""

src/neo4j_graphrag/llm/__init__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,13 @@
1818
from .mistralai_llm import MistralAILLM
1919
from .ollama_llm import OllamaLLM
2020
from .openai_llm import AzureOpenAILLM, OpenAILLM
21+
from .rate_limit import (
22+
RateLimitHandler,
23+
NoOpRateLimitHandler,
24+
RetryRateLimitHandler,
25+
rate_limit_handler,
26+
async_rate_limit_handler,
27+
)
2128
from .types import LLMResponse
2229
from .vertexai_llm import VertexAILLM
2330

@@ -31,4 +38,10 @@
3138
"VertexAILLM",
3239
"AzureOpenAILLM",
3340
"MistralAILLM",
41+
# Rate limiting components
42+
"RateLimitHandler",
43+
"NoOpRateLimitHandler",
44+
"RetryRateLimitHandler",
45+
"rate_limit_handler",
46+
"async_rate_limit_handler",
3447
]

src/neo4j_graphrag/llm/anthropic_llm.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@
1919

2020
from neo4j_graphrag.exceptions import LLMGenerationError
2121
from neo4j_graphrag.llm.base import LLMInterface
22+
from neo4j_graphrag.llm.rate_limit import (
23+
RateLimitHandler,
24+
rate_limit_handler,
25+
async_rate_limit_handler,
26+
)
2227
from neo4j_graphrag.llm.types import (
2328
BaseMessage,
2429
LLMResponse,
@@ -62,6 +67,7 @@ def __init__(
6267
self,
6368
model_name: str,
6469
model_params: Optional[dict[str, Any]] = None,
70+
rate_limit_handler: Optional[RateLimitHandler] = None,
6571
**kwargs: Any,
6672
):
6773
try:
@@ -71,7 +77,7 @@ def __init__(
7177
"""Could not import Anthropic Python client.
7278
Please install it with `pip install "neo4j-graphrag[anthropic]"`."""
7379
)
74-
super().__init__(model_name, model_params)
80+
super().__init__(model_name, model_params, rate_limit_handler)
7581
self.anthropic = anthropic
7682
self.client = anthropic.Anthropic(**kwargs)
7783
self.async_client = anthropic.AsyncAnthropic(**kwargs)
@@ -93,6 +99,7 @@ def get_messages(
9399
messages.append(UserMessage(content=input).model_dump())
94100
return messages # type: ignore
95101

102+
@rate_limit_handler
96103
def invoke(
97104
self,
98105
input: str,
@@ -129,6 +136,7 @@ def invoke(
129136
except self.anthropic.APIError as e:
130137
raise LLMGenerationError(e)
131138

139+
@async_rate_limit_handler
132140
async def ainvoke(
133141
self,
134142
input: str,

0 commit comments

Comments
 (0)