Skip to content

File write read improvements #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 7, 2025
Merged

Conversation

wonderwhy-er
Copy link
Owner

@wonderwhy-er wonderwhy-er commented Jun 4, 2025

Summary by CodeRabbit

  • New Features

    • Improved file reading performance for large files and tail reads, providing faster and more efficient access when using line offsets.
    • Enhanced user messages when writing files, offering positive feedback and performance tips for large files.
    • Added support for reading files from the end using negative offsets, similar to Unix tail functionality.
  • Documentation

    • Clarified tool descriptions for reading and writing files, including detailed explanations and examples for using offsets and chunking.
    • Updated guidance to recommend chunking file writes for optimal performance.
    • Added FAQ entry and updated README and homepage to highlight negative offset support for file reading.
  • Bug Fixes

    • Addressed and tested the correct handling of negative offsets when reading files, ensuring expected behavior when retrieving lines from the end.
  • Tests

    • Added comprehensive tests to verify negative offset handling in file reading, covering standard and edge cases.
    • Refactored test runner to execute tests as subprocesses with enhanced reporting and error handling.
    • Improved test exit code handling to reflect success or failure accurately.

Copy link
Contributor

coderabbitai bot commented Jun 4, 2025

Walkthrough

The changes introduce a new, efficient mechanism for reading files with line-based offsets and lengths, especially optimizing tail reads and large files. Warning and success messages for file writing are revised for clarity and positivity. Tool documentation is updated for clearer guidance, and new tests are added to verify and analyze negative offset behavior in file reading.

Changes

File(s) Change Summary
src/tools/filesystem.ts Introduced readFileWithSmartPositioning for optimized line-based file reading; added readFileInternal for internal use; implemented helper functions for various reading strategies based on file size and offset; updated readFileFromDisk to use new logic; added necessary imports.
src/handlers/filesystem-handlers.ts Revised the informational message in handleWriteFile for line limit exceedance to a positive confirmation with a general performance tip, replacing the previous strict warning and detailed instructions.
src/server.ts Updated documentation strings for "read_file" and "write_file" tools to clarify offset semantics, chunking best practices, and provided usage examples; no code logic changed.
src/tools/edit.ts Changed file reading in performSearchReplace to use new readFileInternal for direct content access, removing destructuring and adjusting assignment accordingly.
test/test-negative-offset-analysis.js Added test file to document and analyze broken behavior of negative offsets in file reading, with detailed findings and recommendations; exports runTests() returning false.
test/test-negative-offset-readfile.js Added comprehensive test suite to validate negative offset handling in file reading, including edge cases, comparisons, and error handling; manages setup/teardown and logs results; exports runAllTests().

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Server
    participant Filesystem

    User->>Server: Request read_file(filePath, offset, length)
    Server->>Filesystem: readFileWithSmartPositioning(filePath, offset, length)
    alt offset < 0 (tail read)
        alt Large file & small N
            Filesystem->>Filesystem: readLastNLinesReverse()
        else Large file & large N
            Filesystem->>Filesystem: readFromEndWithReadline()
        else Small file
            Filesystem->>Filesystem: read entire, slice tail
        end
    else offset >= 0
        alt Small file or offset 0
            Filesystem->>Filesystem: readFromStartWithReadline()
        else Large file & deep offset
            Filesystem->>Filesystem: readFromEstimatedPosition()
        end
    end
    Filesystem-->>Server: FileResult (content, status)
    Server-->>User: File content / status
Loading

Possibly related PRs

  • Change read write #108: Introduced configuration-driven line limits and chunked write modes in handleWriteFile, directly related to the current PR’s changes in messaging for line limit exceedance.
  • Allow reading urls and some cleanup #63: Added URL reading support and restructured command handlers including handleWriteFile, related at the file/function level but not overlapping in message content.

Suggested labels

size:L

Suggested reviewers

  • dmitry-ottic-ai
  • serg33v

Poem

A rabbit nibbles code with cheer,
Tail reads now swift, the end is near!
Write-file tips are bright and new,
Chunk your lines for speed—who knew?
With tests for tails and docs that shine,
This patch makes file ops quite divine.
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d5cabd and 4e06da3.

📒 Files selected for processing (4)
  • FAQ.md (1 hunks)
  • README.md (2 hunks)
  • docs/index.html (1 hunks)
  • src/server.ts (2 hunks)
✅ Files skipped from review due to trivial changes (3)
  • docs/index.html
  • FAQ.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/server.ts
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
test/test-negative-offset-analysis.js (1)

34-36: Consider renaming the function to clarify its purpose.

The function name runTests might be misleading since this doesn't actually run tests but rather documents broken behavior.

Consider renaming to better reflect its purpose:

-export default async function runTests() {
+export default async function documentBrokenBehavior() {
  return false; // Test documents that negative offsets are broken
}
src/tools/filesystem.ts (3)

344-380: Consider adding error handling for stream cleanup.

While the circular buffer implementation is excellent, consider wrapping the reading logic in a try-finally block to ensure the readline interface is closed even if an error occurs.

 async function readFromEndWithReadline(filePath: string, requestedLines: number, mimeType: string, includeStatusMessage: boolean = true): Promise<FileResult> {
     const rl = createInterface({
         input: createReadStream(filePath),
         crlfDelay: Infinity
     });
     
     const buffer: string[] = new Array(requestedLines);
     let bufferIndex = 0;
     let totalLines = 0;
     
-    for await (const line of rl) {
-        buffer[bufferIndex] = line;
-        bufferIndex = (bufferIndex + 1) % requestedLines;
-        totalLines++;
-    }
-    
-    rl.close();
+    try {
+        for await (const line of rl) {
+            buffer[bufferIndex] = line;
+            bufferIndex = (bufferIndex + 1) % requestedLines;
+            totalLines++;
+        }
+    } finally {
+        rl.close();
+    }
     
     // Extract lines in correct order
     let result: string[];

382-414: Consider adding error handling for stream cleanup.

Similar to the previous function, consider wrapping the reading logic in a try-finally block.

 async function readFromStartWithReadline(filePath: string, offset: number, length: number, mimeType: string, includeStatusMessage: boolean = true): Promise<FileResult> {
     const rl = createInterface({
         input: createReadStream(filePath),
         crlfDelay: Infinity
     });
     
     const result: string[] = [];
     let lineNumber = 0;
     
-    for await (const line of rl) {
-        if (lineNumber >= offset && result.length < length) {
-            result.push(line);
-        }
-        if (result.length >= length) break; // Early exit optimization
-        lineNumber++;
-    }
-    
-    rl.close();
+    try {
+        for await (const line of rl) {
+            if (lineNumber >= offset && result.length < length) {
+                result.push(line);
+            }
+            if (result.length >= length) break; // Early exit optimization
+            lineNumber++;
+        }
+    } finally {
+        rl.close();
+    }

416-487: Consider adding error handling for stream cleanup in both readline interfaces.

The byte position estimation strategy is excellent for large files. However, both readline interfaces should be wrapped in try-finally blocks for proper cleanup.

The function has two readline interfaces that need proper cleanup. Would you like me to provide a complete refactored version with proper error handling for both?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0ee04e and 9f187ba.

📒 Files selected for processing (6)
  • src/handlers/filesystem-handlers.ts (1 hunks)
  • src/server.ts (2 hunks)
  • src/tools/edit.ts (2 hunks)
  • src/tools/filesystem.ts (4 hunks)
  • test/test-negative-offset-analysis.js (1 hunks)
  • test/test-negative-offset-readfile.js (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/tools/edit.ts (1)
src/tools/filesystem.ts (1)
  • readFileInternal (598-620)
test/test-negative-offset-readfile.js (4)
setup-claude-server.js (1)
  • setup (512-751)
src/config-manager.ts (1)
  • configManager (212-212)
test/test.js (1)
  • fs (41-41)
src/handlers/filesystem-handlers.ts (1)
  • handleReadFile (47-105)
🔇 Additional comments (15)
src/tools/edit.ts (2)

1-1: Good addition of internal file reading capability.

Adding readFileInternal to the imports provides access to the optimized internal file reading function.


122-123: Excellent optimization for internal file operations.

The switch from readFile to readFileInternal is a smart improvement for the edit functionality. This change eliminates unnecessary status messages and uses the optimized reading path that's better suited for internal operations within the edit tool.

test/test-negative-offset-analysis.js (1)

1-33: Excellent documentation of the negative offset issue.

This comprehensive analysis clearly documents the broken behavior with negative offsets, provides specific examples, and offers actionable recommendations. The detailed findings will be valuable for understanding the need for the improvements introduced elsewhere in this PR.

src/handlers/filesystem-handlers.ts (1)

170-172: Great improvement to user messaging.

The updated message strikes a better balance by confirming successful completion while providing helpful performance guidance. This positive approach aligns well with the updated tool documentation that presents chunking as a standard best practice.

src/server.ts (2)

129-144: Excellent clarification of offset semantics.

The enhanced documentation clearly explains the behavior differences between positive and negative offsets with concrete examples. This will help users understand when to use each approach and sets proper expectations for the tail-like behavior of negative offsets.


176-199: Outstanding reframing of chunking as best practice.

This documentation update is a significant improvement that proactively guides users toward optimal file writing patterns. By presenting chunking as the standard recommended approach rather than an emergency fallback, users are more likely to adopt these performance best practices from the start.

test/test-negative-offset-readfile.js (5)

28-48: LGTM! Well-structured test setup.

The setup function properly saves the original configuration for restoration and creates a predictable test file with numbered lines for easy verification.


53-66: LGTM! Proper cleanup implementation.

The teardown function correctly restores the original configuration and handles file cleanup gracefully with appropriate error handling.


71-135: LGTM! Comprehensive negative offset tests.

The test cases cover various scenarios including standard negative offsets, edge cases like -1, and beyond-file-size offsets. The expected line verification logic is correct.


197-248: LGTM! Well-designed edge case tests.

The edge case tests properly handle scenarios like zero offset, very large negative offsets, and zero-length reads with appropriate success/failure validation.


253-298: LGTM! Excellent test orchestration and module pattern.

The test runner properly executes all test suites with comprehensive error handling, cleanup in the finally block, and a clean module pattern that supports both direct execution and importing.

src/tools/filesystem.ts (4)

5-6: LGTM! Necessary imports for streaming functionality.

The imports for createReadStream and createInterface are required for the new efficient file reading implementation.


252-297: LGTM! Excellent implementation of smart file reading strategies.

The function intelligently selects the most efficient reading strategy based on file size, offset type, and requested lines. The threshold-based approach (10MB for large files, 100 lines for small reads) provides good performance optimization.


299-342: LGTM! Efficient reverse file reading implementation.

The function efficiently reads files backwards in 8KB chunks, properly handles partial lines at chunk boundaries, and ensures resource cleanup with the finally block.


591-620: LGTM! Clean implementation for internal file reading.

The function properly validates inputs, prevents reading image files for internal text operations, and reuses the smart positioning logic without status messages.

const negativeLines = negativeContent.split('\n').filter(line => line.startsWith('Line '));
const positiveLines = positiveContent.split('\n').filter(line => line.startsWith('Line '));

const isMatching = negativeLines.join('\\n') === positiveLines.join('\\n');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the string join to use actual newline character.

The join operation uses escaped newline '\\n' which creates a literal backslash-n string instead of joining with actual newlines.

-    const isMatching = negativeLines.join('\\n') === positiveLines.join('\\n');
+    const isMatching = negativeLines.join('\n') === positiveLines.join('\n');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const isMatching = negativeLines.join('\\n') === positiveLines.join('\\n');
const isMatching = negativeLines.join('\n') === positiveLines.join('\n');
🤖 Prompt for AI Agents
In test/test-negative-offset-readfile.js at line 176, the join operation uses
the escaped string '\\n' which results in a literal backslash followed by n
instead of an actual newline character. Replace the string '\\n' with '\n' in
the join method to correctly join the array elements with real newline
characters.

@wonderwhy-er wonderwhy-er merged commit 4458a1c into main Jun 7, 2025
2 checks passed
@wonderwhy-er wonderwhy-er deleted the file-write-read-improvements branch June 7, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant