Skip to content

Add API rate limit handler #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- Support for Python 3.13
- Added support for automatic schema extraction from text using LLMs. In the `SimpleKGPipeline`, when the user provides no schema, the automatic schema extraction is enabled by default.
- Added ability to return a user-defined message if context is empty in GraphRAG (which skips the LLM call).
- Added automatic rate limiting with retry logic and exponential backoff for all LLM providers using tenacity. The `RateLimitHandler` interface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely.

### Fixed

Expand Down
31 changes: 31 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,28 @@ MistralAILLM
:members:


Rate Limiting
=============

RateLimitHandler
----------------

.. autoclass:: neo4j_graphrag.llm.rate_limit.RateLimitHandler
:members:

RetryRateLimitHandler
---------------------

.. autoclass:: neo4j_graphrag.llm.rate_limit.RetryRateLimitHandler
:members:

NoOpRateLimitHandler
--------------------

.. autoclass:: neo4j_graphrag.llm.rate_limit.NoOpRateLimitHandler
:members:


PromptTemplate
==============

Expand Down Expand Up @@ -473,6 +495,8 @@ Errors

* :class:`neo4j_graphrag.exceptions.LLMGenerationError`

* :class:`neo4j_graphrag.exceptions.RateLimitError`

* :class:`neo4j_graphrag.exceptions.SchemaValidationError`

* :class:`neo4j_graphrag.exceptions.PdfLoaderError`
Expand Down Expand Up @@ -597,6 +621,13 @@ LLMGenerationError
:show-inheritance:


RateLimitError
==============

.. autoclass:: neo4j_graphrag.exceptions.RateLimitError
:show-inheritance:


SchemaValidationError
=====================

Expand Down
85 changes: 85 additions & 0 deletions docs/source/user_guide_rag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,91 @@ Here's an example using the Python Ollama client:
See :ref:`llminterface`.


Rate Limit Handling
===================

All LLM implementations include automatic rate limiting that uses retry logic with exponential backoff by default. This feature helps handle API rate limits from LLM providers gracefully by automatically retrying failed requests with increasing wait times between attempts.

Default Rate Limit Handler
--------------------------

Rate limiting is enabled by default for all LLM instances with the following configuration:

- **Max attempts**: 3
- **Min wait**: 1.0 seconds
- **Max wait**: 60.0 seconds
- **Multiplier**: 2.0 (exponential backoff)

.. code:: python

from neo4j_graphrag.llm import OpenAILLM

# Rate limiting is automatically enabled
llm = OpenAILLM(model_name="gpt-4o")

# The LLM will automatically retry on rate limit errors
response = llm.invoke("Hello, world!")

.. note::

To change the default configuration of `RetryRateLimitHandler`:

.. code:: python

from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.llm.rate_limit import RetryRateLimitHandler

# Customize rate limiting parameters
llm = OpenAILLM(
model_name="gpt-4o",
rate_limit_handler=RetryRateLimitHandler(
max_attempts=10, # Increase max retry attempts
min_wait=2.0, # Increase minimum wait time
max_wait=120.0, # Increase maximum wait time
multiplier=3.0 # More aggressive backoff
)
)

Custom Rate Limiting
--------------------

You can customize the rate limiting behavior by creating your own rate limit handler:

.. code:: python

from neo4j_graphrag.llm import AnthropicLLM
from neo4j_graphrag.llm.rate_limit import RateLimitHandler

class CustomRateLimitHandler(RateLimitHandler):
"""Implement your custom rate limiting strategy."""
# Implement required methods: handle_sync, handle_async
pass

# Create custom rate limit handler and pass it to the LLM interface
custom_handler = CustomRateLimitHandler()

llm = AnthropicLLM(
model_name="claude-3-sonnet-20240229",
rate_limit_handler=custom_handler,
)

Disabling Rate Limiting
-----------------------

For high-throughput applications or when you handle rate limiting externally, you can disable it:

.. code:: python

from neo4j_graphrag.llm import CohereLLM, NoOpRateLimitHandler

# Disable rate limiting completely
llm = CohereLLM(
model_name="command-r-plus",
rate_limit_handler=NoOpRateLimitHandler(),
)
llm.invoke("Hello, world!")


Configuring the Prompt
========================

Expand Down
10 changes: 5 additions & 5 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ scipy = [
{ version = "^1.13.0", python = ">=3.9,<3.13" },
{ version = "^1.15.0", python = ">=3.13,<3.14" }
]
tenacity = "^9.1.2"

[tool.poetry.group.dev.dependencies]
urllib3 = "<2"
Expand Down
4 changes: 4 additions & 0 deletions src/neo4j_graphrag/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,7 @@ class InvalidHybridSearchRankerError(Neo4jGraphRagError):

class SearchQueryParseError(Neo4jGraphRagError):
"""Exception raised when there is a query parse error in the text search string."""


class RateLimitError(LLMGenerationError):
"""Exception raised when API rate limit is exceeded."""
13 changes: 13 additions & 0 deletions src/neo4j_graphrag/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@
from .mistralai_llm import MistralAILLM
from .ollama_llm import OllamaLLM
from .openai_llm import AzureOpenAILLM, OpenAILLM
from .rate_limit import (
RateLimitHandler,
NoOpRateLimitHandler,
RetryRateLimitHandler,
rate_limit_handler,
async_rate_limit_handler,
)
from .types import LLMResponse
from .vertexai_llm import VertexAILLM

Expand All @@ -31,4 +38,10 @@
"VertexAILLM",
"AzureOpenAILLM",
"MistralAILLM",
# Rate limiting components
"RateLimitHandler",
"NoOpRateLimitHandler",
"RetryRateLimitHandler",
"rate_limit_handler",
"async_rate_limit_handler",
]
10 changes: 9 additions & 1 deletion src/neo4j_graphrag/llm/anthropic_llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@

from neo4j_graphrag.exceptions import LLMGenerationError
from neo4j_graphrag.llm.base import LLMInterface
from neo4j_graphrag.llm.rate_limit import (
RateLimitHandler,
rate_limit_handler,
async_rate_limit_handler,
)
from neo4j_graphrag.llm.types import (
BaseMessage,
LLMResponse,
Expand Down Expand Up @@ -62,6 +67,7 @@ def __init__(
self,
model_name: str,
model_params: Optional[dict[str, Any]] = None,
rate_limit_handler: Optional[RateLimitHandler] = None,
**kwargs: Any,
):
try:
Expand All @@ -71,7 +77,7 @@ def __init__(
"""Could not import Anthropic Python client.
Please install it with `pip install "neo4j-graphrag[anthropic]"`."""
)
super().__init__(model_name, model_params)
super().__init__(model_name, model_params, rate_limit_handler)
self.anthropic = anthropic
self.client = anthropic.Anthropic(**kwargs)
self.async_client = anthropic.AsyncAnthropic(**kwargs)
Expand All @@ -93,6 +99,7 @@ def get_messages(
messages.append(UserMessage(content=input).model_dump())
return messages # type: ignore

@rate_limit_handler
def invoke(
self,
input: str,
Expand Down Expand Up @@ -129,6 +136,7 @@ def invoke(
except self.anthropic.APIError as e:
raise LLMGenerationError(e)

@async_rate_limit_handler
async def ainvoke(
self,
input: str,
Expand Down
12 changes: 12 additions & 0 deletions src/neo4j_graphrag/llm/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,28 +21,40 @@
from neo4j_graphrag.types import LLMMessage

from .types import LLMResponse, ToolCallResponse
from .rate_limit import (
DEFAULT_RATE_LIMIT_HANDLER,
)

from neo4j_graphrag.tool import Tool

from .rate_limit import RateLimitHandler


class LLMInterface(ABC):
"""Interface for large language models.

Args:
model_name (str): The name of the language model.
model_params (Optional[dict]): Additional parameters passed to the model when text is sent to it. Defaults to None.
rate_limit_handler (Optional[RateLimitHandler]): Handler for rate limiting. Defaults to retry with exponential backoff.
**kwargs (Any): Arguments passed to the model when for the class is initialised. Defaults to None.
"""

def __init__(
self,
model_name: str,
model_params: Optional[dict[str, Any]] = None,
rate_limit_handler: Optional[RateLimitHandler] = None,
**kwargs: Any,
):
self.model_name = model_name
self.model_params = model_params or {}

if rate_limit_handler is not None:
self._rate_limit_handler = rate_limit_handler
else:
self._rate_limit_handler = DEFAULT_RATE_LIMIT_HANDLER

@abstractmethod
def invoke(
self,
Expand Down
10 changes: 9 additions & 1 deletion src/neo4j_graphrag/llm/cohere_llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@

from neo4j_graphrag.exceptions import LLMGenerationError
from neo4j_graphrag.llm.base import LLMInterface
from neo4j_graphrag.llm.rate_limit import (
RateLimitHandler,
rate_limit_handler,
async_rate_limit_handler,
)
from neo4j_graphrag.llm.types import (
BaseMessage,
LLMResponse,
Expand Down Expand Up @@ -60,6 +65,7 @@ def __init__(
self,
model_name: str = "",
model_params: Optional[dict[str, Any]] = None,
rate_limit_handler: Optional[RateLimitHandler] = None,
**kwargs: Any,
) -> None:
try:
Expand All @@ -69,7 +75,7 @@ def __init__(
"""Could not import cohere python client.
Please install it with `pip install "neo4j-graphrag[cohere]"`."""
)
super().__init__(model_name, model_params)
super().__init__(model_name, model_params, rate_limit_handler)
self.cohere = cohere
self.cohere_api_error = cohere.core.api_error.ApiError

Expand All @@ -96,6 +102,7 @@ def get_messages(
messages.append(UserMessage(content=input).model_dump())
return messages # type: ignore

@rate_limit_handler
def invoke(
self,
input: str,
Expand Down Expand Up @@ -127,6 +134,7 @@ def invoke(
content=res.message.content[0].text if res.message.content else "",
)

@async_rate_limit_handler
async def ainvoke(
self,
input: str,
Expand Down
Loading