Skip to content

CloudWatch logs get_log_events infinite loop with nextForwardToken for expired log groups #4591

@dxdc

Description

@dxdc

Describe the bug

Description

I'm experiencing the same infinite loop issue with nextForwardToken described in #4472, but I've identified a specific case that triggers this behavior: querying log groups with expired logs (logs older than the retention period).

Root Cause Identified

This infinite loop occurs when:

  1. Log group has a retention period set (e.g., 1 year)
  2. Querying logs that are older than the retention period (expired logs)
  3. get_log_events continues returning new nextForwardToken values indefinitely
  4. No events are returned in the response, but the API keeps providing new tokens

Comparison with AWS CLI

The AWS CLI handles this scenario correctly:

  • aws logs get-log-events for expired logs returns no events and terminates properly
  • Does not get stuck in infinite pagination loops

This suggests the issue may be in how boto3/botocore implements the pagination logic versus the AWS CLI implementation.

Impact

  • Applications get stuck in infinite loops when processing old log streams
  • Unnecessary API calls and costs
  • Resource consumption (CPU, memory) from endless pagination
  • Difficult to implement robust log processing without workarounds

Workaround

Current workaround is to implement additional checks:

consecutive_empty_responses = 0
max_empty_responses = 3  # Adjust based on your needs

while next_token != response.get("nextForwardToken", ""):
    # ... existing code ...
    
    if not response["events"]:
        consecutive_empty_responses += 1
        if consecutive_empty_responses >= max_empty_responses:
            print("Breaking due to consecutive empty responses (likely expired logs)")
            break
    else:
        consecutive_empty_responses = 0  # Reset counter

This issue affects production systems that process historical logs and need reliable pagination behavior. A fix or clear documentation would be greatly appreciated.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

According to AWS CloudWatch Logs documentation, when no more events are available, the API should either:

  • Stop returning nextForwardToken, OR
  • Return the same nextForwardToken value to indicate pagination is complete

Expected (similar to AWS CLI):

Iteration: 1 - Next token: None
No events found in iteration 1
Total iterations: 1
Total events: 0
Duration: 0.15 seconds

Current Behavior

When querying expired logs (older than retention period):

  • get_log_events returns {'events': [], 'nextForwardToken': 'new_token_value'}
  • Each subsequent call returns a different nextForwardToken
  • This creates an infinite loop as the condition next_token != response.get("nextForwardToken", "") never becomes true
  • No events are ever returned, but pagination continues indefinitely
Iteration: 1 - Next token: None
No events found in iteration 1
Iteration: 2 - Next token: f/38843677690954359307332779238523611073553355858145312768/s
No events found in iteration 2
Iteration: 3 - Next token: f/38843871704337782941158364500062913595036750815702024192/s
No events found in iteration 3
... (continues indefinitely)

Reproduction Steps

  1. Create a CloudWatch log group with 1-year retention
  2. Wait for logs to expire (or use existing expired logs)
  3. Attempt to query expired logs using get_log_events with pagination:
import boto3
import time

client = boto3.client('logs', region_name='us-east-1')

params = {
    'logGroupName': '/your/log/group',
    'logStreamName': 'expired-log-stream',
    'startFromHead': True,
    'limit': 1000
}

events = []
response = {}
next_token = None
count = 0
start_time = time.time()

# This loop will run indefinitely for expired logs
while next_token != response.get("nextForwardToken", ""):
    count += 1
    next_token = response.get("nextForwardToken")
    
    print(f"Iteration: {count} - Next token: {next_token}")
    
    if next_token:
        params["nextToken"] = next_token
    
    response = client.get_log_events(**params)
    
    if response["events"]:
        print(f"Events found: {len(response['events'])}")
        events.extend(response["events"])
    else:
        print(f"No events found in iteration {count}")
    
    # Safety break for demonstration (remove to see infinite loop)
    if count > 20:
        print("Breaking to prevent infinite loop...")
        break

print(f"Total iterations: {count}")
print(f"Total events: {len(events)}")
print(f"Duration: {time.time() - start_time:.2f} seconds")

Possible Solution

  1. Fix in boto3/botocore: Modify pagination logic to detect when expired logs are being queried and handle accordingly
  2. AWS API fix: Address this at the CloudWatch Logs API level to match AWS CLI behavior
  3. Documentation update: Clearly document this edge case and provide recommended workarounds

Additional Information/Context

Related Issues

Questions

  1. Is this considered a bug in the AWS CloudWatch Logs API itself?
  2. Should boto3 implement special handling for this scenario?
  3. Are there plans to align boto3 behavior with AWS CLI for this case?

SDK version used

boto3==1.40.4

Environment details (OS name and version, etc.)

macOS 15.6

Metadata

Metadata

Labels

bugThis issue is a confirmed bug.cloudwatchlogsp3This is a minor priority issueservice-apiThis issue is caused by the service API, not the SDK implementation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions