feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

roomote · 2025-07-24T15:36:31Z

This PR implements a safeguard to prevent consuming the entire context window when reading large files with maxReadFileLine set to -1.

Problem

When maxReadFileLine is set to -1, the readFileTool reads the full contents of a file. This can be problematic if the file is huge since it could consume the entire context window.

Solution

Added a basic safeguard that:

Checks if a file has more than 1000 lines when maxReadFileLine is -1
Uses tiktoken to count tokens in the file content
If token count exceeds 50,000 (approximately 50% of a typical 100k context window), automatically switches to partial read mode
Reads only the first 2000 lines with an informative notice explaining why
Includes a fallback for very large files (>5000 lines) when token counting fails

Changes

Modified src/core/tools/readFileTool.ts to add the safeguard logic
Added comprehensive tests in src/core/tools/__tests__/readFileTool.spec.ts

Testing

All existing tests pass
Added new test cases covering:
- Files with high token count that trigger the safeguard
- Files with low token count that do not trigger the safeguard
- Files under the line threshold that skip token counting entirely
- Very large files when token counting fails
- Safeguard not applying when maxReadFileLine is not -1
- Line ranges bypassing the safeguard
- Boundary conditions for thresholds

Performance Considerations

Token counting is only performed for files over 1000 lines to minimize overhead
The safeguard preserves the existing behavior for smaller files
The thresholds are configurable constants that can be adjusted if needed

Important

Adds a safeguard in readFileTool to limit reading large files to 2000 lines when maxReadFileLine is -1, using token count and file size checks.

Behavior:
- Adds safeguard in readFileTool to limit reading to 2000 lines if maxReadFileLine is -1 and file is large.
- Uses tiktoken to count tokens; if over 50,000 tokens, applies safeguard.
- For files >1MB, applies safeguard if token counting fails.
- Provides notice in output when safeguard is applied.
Testing:
- Adds tests in readFileTool.spec.ts for large files, token count thresholds, and safeguard application.
- Tests include scenarios for small files, large files, and token counting failures.
Performance:
- Token counting only for files >1000 lines to reduce overhead.
- Configurable thresholds for file size and token count.

^{This description was created by}^{for 89790bc. You can customize this summary. It will automatically update as commits are pushed.}

…ine is -1 - Add token counting check using tiktoken for files over 1000 lines - Automatically switch to partial read (first 2000 lines) when token count exceeds 50k - Add fallback safeguard for very large files (>5000 lines) when token counting fails - Include informative notice explaining why partial read is being used - Add comprehensive test coverage for all safeguard scenarios This prevents consuming the entire context window when reading very large files.

cte · 2025-07-24T15:40:04Z

@roomote-agent This looks promising, but can we get access to the actual context window size of the current model via the reference to the Task object and use that instead of assuming a 100K window size? Additionally, 1,000 lines is a bit low; let's bump that up to 10K lines.

roomote · 2025-07-24T15:41:50Z

Hi @cte! I see your feedback. Ill work on:

Getting the actual context window size from the Task object instead of assuming 100K
Increasing the line threshold from 1,000 to 10,000 lines

Let me check out the branch and make these changes.

ellipsis-dev · 2025-07-24T15:42:08Z

src/core/tools/readFileTool.ts

-				const lineRangeAttr = ` lines="1-${totalLines}"`
+				// Handle normal file read with safeguard for large files
+				// Define thresholds for the safeguard
+				const LARGE_FILE_LINE_THRESHOLD = 1000 // Consider files with more than 1000 lines as "large"


Consider moving the safeguard threshold constants (LARGE_FILE_LINE_THRESHOLD, MAX_TOKEN_THRESHOLD, FALLBACK_MAX_LINES) to a shared configuration or constants module to avoid magic numbers and improve maintainability.

^{This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.}

ellipsis-dev · 2025-07-24T15:42:08Z

src/core/tools/readFileTool.ts

+						if (tokenCount > MAX_TOKEN_THRESHOLD) {
+							shouldApplySafeguard = true
+							linesToRead = FALLBACK_MAX_LINES
+							safeguardNotice = `<notice>This file contains ${totalLines} lines and approximately ${tokenCount.toLocaleString()} tokens, which could consume a significant portion of the context window. Showing only the first ${FALLBACK_MAX_LINES} lines to preserve context space. Use line_range if you need to read specific sections.</notice>\n`


The safeguard notice message is hardcoded in English. For internationalization, consider using the i18n function (e.g., t()) so that the message can be translated for all supported languages.

^{This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.}

cte · 2025-07-24T15:48:34Z

@roomote-agent Instead of line count let's use file size to detect files that potentially contain a lot of tokens; this should handle the case where there's a file with a huge amount of content on a single line.

- Access actual context window size from Task object via cline.api.getModel().info.contextWindow - Increase LARGE_FILE_LINE_THRESHOLD from 1,000 to 10,000 lines - Calculate MAX_TOKEN_THRESHOLD as 50% of actual context window instead of hardcoded 50K - Update tests to reflect new thresholds and mock api.getModel() properly

roomote · 2025-07-24T15:51:51Z

Thanks for the feedback @cte! I understand - switching from line count to file size makes sense to better handle files with large amounts of content on single lines. I'll update the implementation to use file size as the primary detection method for potentially token-heavy files. Working on this now.

roomote · 2025-07-24T15:58:48Z

Hi @cte! Ive successfully implemented both requested changes:

✅ Changes made:

Dynamic context window sizing: Now accessing the actual context window size from cline.api.getModel().info.contextWindow instead of assuming 100K
Increased line threshold: Bumped LARGE_FILE_LINE_THRESHOLD from 1,000 to 10,000 lines as requested

📝 Implementation details:

The MAX_TOKEN_THRESHOLD is now calculated as 50% of the actual models context window
Added fallback to 100K if context window info is not available
Updated all related tests to mock the api.getModel() method properly
All tests are passing locally

🔍 CI Status:

Most checks have passed ✅
The Windows unit test failure appears to be a pre-existing issue (Ubuntu tests passed fine)

The changes ensure that the safeguard adapts to different model context windows while being less aggressive about triggering for moderately-sized files.

@cte

- Replace line count threshold with file size threshold (100KB) - Files larger than 100KB trigger token count check - Files larger than 1MB automatically apply safeguard if token counting fails - Update tests to reflect new file size-based approach - This better handles files with large amounts of content on single lines As requested by @cte in PR comment

roomote · 2025-07-24T16:06:37Z

Hi @cte! I've successfully implemented the requested change to use file size instead of line count for detecting large files.

Summary of changes:

Replaced line count threshold with file size threshold: Files larger than 100KB now trigger the token count check (instead of files with more than 10,000 lines)
Added file size-based safeguard: Files larger than 1MB automatically apply the safeguard if token counting fails
Updated notice messages: The safeguard notices now show file size in KB instead of line count
Updated all tests: Modified test suite to use file size mocking and updated test descriptions accordingly

This approach better handles edge cases where files might have a huge amount of content on single lines, as you mentioned. The implementation still respects the dynamic context window calculation that was added in the previous commit.

All tests are passing locally, and CI checks are currently running. The changes have been pushed to this PR branch.

ellipsis-dev · 2025-07-24T16:08:26Z

src/core/tools/readFileTool.ts

+							if (tokenCount > MAX_TOKEN_THRESHOLD) {
+								shouldApplySafeguard = true
+								linesToRead = FALLBACK_MAX_LINES
+								safeguardNotice = `<notice>This file is ${fileSizeKB}KB and contains approximately ${tokenCount.toLocaleString()} tokens, which could consume a significant portion of the context window. Showing only the first ${FALLBACK_MAX_LINES} lines to preserve context space. Use line_range if you need to read specific sections.</notice>\n`


Consider using the i18n translation function for the safeguard notice message (e.g. on line 550) instead of hardcoded English text, to ensure consistency with other user‐facing messages.

^{This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.}

roomote bot requested review from mrubens, cte and jr as code owners July 24, 2025 15:36

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 24, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 24, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 24, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 24, 2025

ellipsis-dev bot reviewed Jul 24, 2025

View reviewed changes

daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jul 24, 2025

hannesrudolph added the PR - Needs Preliminary Review label Jul 24, 2025

ellipsis-dev bot reviewed Jul 24, 2025

View reviewed changes

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 24, 2025

daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Jul 24, 2025

hannesrudolph added PR - Draft / In Progress and removed PR - Needs Preliminary Review labels Jul 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

roomote bot commented Jul 24, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Uh oh!

Uh oh!

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

Are you sure you want to change the base?

feat: add safeguard for large files in readFileTool when maxReadFileLine is -1 #6174

Conversation

roomote bot commented Jul 24, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Testing

Performance Considerations

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

ellipsis-dev bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

cte commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Uh oh!

roomote bot commented Jul 24, 2025

Summary of changes:

Uh oh!

ellipsis-dev bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

roomote bot commented Jul 24, 2025 •

edited by ellipsis-dev bot

Loading