nsfs refactor file_reader and file_writer #9153

guymguym · 2025-07-23T01:13:46Z

Describe the Problem

TBD ...

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Doc added/updated
Tests added

Summary by CodeRabbit

New Features
- Introduced a new FileReader utility for efficient and flexible file streaming.
- Added new configuration for download stream memory thresholds.
- Added utility functions for file lifecycle management and sparse file handling.
Bug Fixes
- Corrected configuration constant names for memory thresholds.
Refactor
- Simplified file read and write logic by leveraging new utilities.
- Improved abort handling and error logging in file operations.
- Modularized file writing and MD5 calculation methods.
Tests
- Added comprehensive unit tests for the new FileReader functionality.

Signed-off-by: Guy Margalit <guymguym@gmail.com>

coderabbitai · 2025-07-23T01:13:53Z

Walkthrough

This update introduces a new FileReader utility for efficient file streaming with buffer pooling and abort support, refactors NamespaceFS and related modules to use this abstraction, and modularizes file writing logic in FileWriter. It also adds new configuration constants, utility functions for file handling, and comprehensive unit tests for FileReader. Several code paths are simplified, and error handling and stream management are improved.

Changes

File(s)	Change Summary
config.js	Fixed typo in config constant name; added `NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD` constant.
src/util/file_reader.js	Introduced new `FileReader` class for efficient, abortable file streaming with buffer pooling and statistics; exported the class.
src/test/unit_tests/internal/test_file_reader.test.js	Added comprehensive unit tests for `FileReader`, covering stream reading, buffer pooling, byte range handling, and edge cases.
src/sdk/namespace_fs.js	Refactored `NamespaceFS` to use `FileReader` for object streaming, removed sparse file logic, improved abort signal handling, replaced manual buffer management, and updated stream handling logic.
src/util/file_writer.js	Refined constructor, fixed config constant, added `write_entire_stream`, `write_buffers`, and `finalize` methods; modularized and clarified MD5 and buffer writing logic.
src/tools/file_writer_hashing.js	Replaced manual stream pipeline and finish logic with `file_writer.write_entire_stream` for stream writing.
src/util/native_fs_utils.js	Added `use_file`, `is_sparse_file`, and `warmup_sparse_file` utilities; refactored open/close logic, improved error handling, and adjusted loop formatting.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant NamespaceFS
    participant FileReader
    participant FileWriter
    participant NativeFS

    Client->>NamespaceFS: request object read (read_object_stream)
    NamespaceFS->>FileReader: create with file, range, buffer pool, abort signal
    FileReader->>NativeFS: open file, read data (with pooling/abort support)
    FileReader->>NamespaceFS: stream data to response
    NamespaceFS->>Client: send streamed data

    Client->>NamespaceFS: request object upload (upload_stream)
    NamespaceFS->>FileWriter: create with file, context, options
    FileWriter->>NativeFS: write_entire_stream(source_stream, abort signal)
    FileWriter->>NativeFS: write buffers, update MD5, finalize
    FileWriter->>NamespaceFS: signal completion
    NamespaceFS->>Client: acknowledge upload

Estimated code review effort

4 (~90 minutes)

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

npm error Exit handler never called!
npm error This is an error with npm itself. Please report this error at:
npm error https://github.com/npm/cli/issues
npm error A complete log of this run can be found in: /.npm/_logs/2025-07-23T01_18_27_439Z-debug-0.log

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

src/tools/file_writer_hashing.js (1)

74-74: Consider adding a proper interface for target_file

While the /**@type {any}*/ cast works, it bypasses type safety. Consider defining an interface that both nb.NativeFile and TargetHash can implement to maintain type safety.

src/util/file_reader.js (1)

147-148: Consider supporting abort signal in buffer pool

The TODO comment correctly identifies that the buffer pool and semaphore should support abort signals to avoid blocking on aborted requests.

Would you like me to open an issue to track adding abort signal support to the buffer pool implementation?
src/util/native_fs_utils.js (2)
82-82: Minor style inconsistency

The for loop style was changed from for (;;) to for (; ;). While both are valid, this seems like an unnecessary cosmetic change that doesn't improve readability.

Consider keeping the original for (;;) style for infinite loops as it's more common and concise.

Also applies to: 169-169, 527-527

823-825: Consider buffer size flexibility

The warmup buffer is hardcoded to 4096 bytes. Consider making this configurable or at least defining it as a constant.
+const WARMUP_BUFFER_SIZE = 4096;
+
 async function warmup_sparse_file(fs_context, file, file_path, stat, pos) {
     dbg.log0('warmup_sparse_file', {
         file_path, pos, size: stat.size, blocks: stat.blocks,
     });
     if (!warmup_buffer) {
-        warmup_buffer = nb_native().fs.dio_buffer_alloc(4096);
+        warmup_buffer = nb_native().fs.dio_buffer_alloc(WARMUP_BUFFER_SIZE);
     }
     await file.read(fs_context, warmup_buffer, 0, 1, pos);
 }
src/sdk/namespace_fs.js (2)
2898-2899: Consider using optional chaining for cleaner code.

The current null check works correctly, but optional chaining would be more concise and idiomatic:
-        return (stat && stat.xattr[XATTR_VERSION_ID]) || 'null';
+        return stat?.xattr?.[XATTR_VERSION_ID] || 'null';
2626-2627: Valid performance optimization opportunity identified.

The TODO comment correctly identifies a potential optimization. The stat call in _load_bucket could potentially be avoided or cached in certain scenarios, which would reduce filesystem operations and improve performance.

Would you like me to analyze the usage patterns of _load_bucket and suggest specific optimization strategies?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24e11b7 and 21420b0.

📒 Files selected for processing (7)

config.js (1 hunks)
src/sdk/namespace_fs.js (32 hunks)
src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
src/tools/file_writer_hashing.js (2 hunks)
src/util/file_reader.js (2 hunks)
src/util/file_writer.js (8 hunks)
src/util/native_fs_utils.js (8 hunks)

📓 Path-based instructions (1)

src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

src/test/unit_tests/internal/test_file_reader.test.js

🧬 Code Graph Analysis (2)

src/tools/file_writer_hashing.js (1)

src/endpoint/s3/ops/s3_put_object.js (1)

source_stream (40-40)

src/util/file_writer.js (4)

src/util/native_fs_utils.js (2)

dbg (4-4)

config (10-10)

config.js (2)

require (17-17)

config (7-7)

src/util/file_reader.js (2)

config (5-5)

stream (4-4)

src/tools/file_writer_hashing.js (2)

config (7-7)

stream (10-10)

🪛 Biome (1.9.4)

src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🧰 Additional context used

📓 Path-based instructions (1)

src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

src/test/unit_tests/internal/test_file_reader.test.js

🧬 Code Graph Analysis (2)

src/tools/file_writer_hashing.js (1)

src/endpoint/s3/ops/s3_put_object.js (1)

source_stream (40-40)

src/util/file_writer.js (4)

src/util/native_fs_utils.js (2)

dbg (4-4)

config (10-10)

config.js (2)

require (17-17)

config (7-7)

src/util/file_reader.js (2)

config (5-5)

stream (4-4)

src/tools/file_writer_hashing.js (2)

config (7-7)

stream (10-10)

🪛 Biome (1.9.4)

src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (28)

config.js (2)

988-988: Good typo fix!

Renaming NFSF_UPLOAD_STREAM_MEM_THRESHOLD to NSFS_UPLOAD_STREAM_MEM_THRESHOLD improves consistency with the NSFS naming convention used throughout the codebase.

989-989: LGTM!

Adding NSFS_DOWNLOAD_STREAM_MEM_THRESHOLD with the same 8 MB value as the upload threshold provides symmetry for streaming operations.

src/tools/file_writer_hashing.js (2)

78-78: Nice simplification!

Using write_entire_stream method eliminates the need for manual stream pipeline management and improves code readability.

115-115: Consistent refactoring

Good to see the same simplification pattern applied in both hash_target and file_target functions.

src/test/unit_tests/internal/test_file_reader.test.js (3)

1-2: Keep the 'use strict' directive

While modules are automatically in strict mode, keeping 'use strict' in test files is a common practice for consistency and clarity.

14-133: Excellent test coverage!

The test suite comprehensively covers:

Multiple files in the test directory

Various edge cases (start/end boundaries)

Both stream interfaces (Readable and read_into_stream)

Comparison with Node.js native fs.createReadStream for correctness verification

Proper resource cleanup with file closing

The test structure using describe_read_cases helper is clean and maintainable.

83-86: Good correctness verification

Comparing FileReader output with Node.js native fs.createReadStream ensures the implementation matches expected behavior for the same byte ranges.

src/util/file_reader.js (4)

12-25: Excellent documentation and design rationale!

The class documentation clearly explains the performance motivation for the dual interface approach (standard Readable vs optimized read_into_stream with buffer pooling).

132-134: Good defensive programming

Preventing Transform streams from being used with read_into_stream is important since they might retain buffer references after the write callback, which would break the buffer pool recycling.

217-222: Smart optimization for sparse files

The sparse file warming logic helps avoid holding large buffers while waiting for high-latency reads from archived or remote storage.

199-212: Comprehensive statistics collection

The histogram of buffer sizes using log2 scale is a clever way to track buffer usage patterns efficiently.

src/util/native_fs_utils.js (3)

102-150: Excellent resource management pattern!

The use_file function provides a clean, safe pattern for file operations with:

Guaranteed file closure in the finally block

Detailed error logging at each stage

Generic return type support via template

This pattern helps prevent file descriptor leaks.

409-409: Good error standardization

Using error_utils.new_error_code to create standardized errors with proper error codes improves error handling consistency.

789-827: Well-documented sparse file utilities

The is_sparse_file and warmup_sparse_file functions are well-implemented with:

Clear documentation explaining various sparse file scenarios

Lazy allocation of the warmup buffer

Single-byte read optimization to trigger data recall

These utilities effectively support the FileReader's sparse file optimization.

src/util/file_writer.js (9)

7-7: LGTM: Import path correction.

The debug module import path has been correctly updated from a relative path to the current directory, which aligns with the typical project structure.

17-27: LGTM: Improved constructor with better type annotations.

The constructor improvements are well-designed:

Specific native types (nb.NativeFile, nb.NativeFSContext) provide better type safety

Optional parameters are properly marked

Removed unused large_buf_size parameter reduces complexity

Corrected config constant NSFS_UPLOAD_STREAM_MEM_THRESHOLD aligns with the renamed constant

The parameter reordering and type refinements enhance code maintainability.

41-48: LGTM: Clean stream handling abstraction.

The write_entire_stream method provides a clean abstraction for writing entire streams:

Uses stream.promises.pipeline for proper error handling and backpressure

stream.promises.finished ensures the stream is fully processed

Proper async/await pattern with optional abort signal support

This simplifies stream handling compared to manual pipeline management.

50-62: LGTM: Efficient concurrent buffer processing.

The write_buffers method effectively parallelizes MD5 calculation and file writing:

Uses Promise.all to run MD5 update and buffer writing concurrently

Conditional MD5 processing only when enabled

Proper separation of concerns with stats update

This design improves performance by avoiding sequential processing of independent operations.

64-72: LGTM: Clean MD5 finalization.

The finalize method properly handles MD5 digest calculation:

Conditional execution only when MD5 is enabled

Proper async/await pattern

Stores hex string format for consistent usage

The implementation is clean and follows expected patterns.

104-105: LGTM: Improved documentation clarity.

The comment updates better describe the IOV_MAX batching behavior, making the implementation intent clearer for future maintainers.

125-126: LGTM: Enhanced method documentation.

The updated documentation clearly describes the offset and total bytes tracking behavior, improving code readability.

138-149: LGTM: Improved code reuse in _writev.

The refactoring to use the new write_buffers method eliminates code duplication and centralizes the buffer writing logic. This improves maintainability and ensures consistent behavior across different entry points.

158-163: LGTM: Simplified _final implementation.

The refactoring to use the new finalize method reduces code duplication and provides a cleaner separation of concerns. The stream interface methods now delegate to the more focused public methods.

src/sdk/namespace_fs.js (5)

11-11: Good addition of stream module import for proper stream handling.

The addition of the stream module import is necessary for using stream.promises.finished in the refactored code.

1026-1144: Excellent refactoring to use FileReader for stream handling.

The refactoring of read_object_stream to use FileReader brings several improvements:

Cleaner separation of concerns with buffer management delegated to FileReader

Proper abort signal handling throughout the operation

Better stream completion detection using stream.promises.finished

Improved statistics tracking with detailed metrics

The implementation correctly handles edge cases like empty directory content and maintains backward compatibility.

965-965: Well-implemented abort signal handling throughout the codebase.

The addition of object_sdk.throw_if_aborted() checks at strategic points ensures operations can be cancelled promptly:

Before and after retry attempts

Before file operations

Before stream completion waits

The abort signal is also properly propagated to FileReader and file_writer.write_entire_stream, ensuring cancellation works through the entire call chain.

Also applies to: 991-992, 1045-1045, 1055-1056, 1084-1084, 1098-1098, 1137-1138

1607-1610: Correct propagation of abort signal to file writer.

Good addition of the abort signal to file_writer.write_entire_stream. This ensures upload operations can be properly cancelled when the abort controller is triggered.

1780-1791: Clean refactoring of multipart listing logic.

The refactored code is more readable with clearer separation of concerns:

Filtering logic is separated from transformation logic

Part number extraction is more explicit

Consistent use of async/await pattern

The functionality remains unchanged while improving code clarity.

coderabbitai · 2025-07-23T01:19:44Z

src/sdk/namespace_fs.js

+                    { bucket_path: this.bucket_path, object_name: params.key }, err);
                dbg.log0('NamespaceFS: read_object_stream couldnt find dir content xattr', err);


⚠️ Potential issue

Fixed undefined variable reference in error events.

Good catch fixing the object_name parameter from key to params.key in NoobaaEvent creation. This ensures error events contain the correct object name instead of potentially undefined values.

Also applies to: 1149-1149, 1201-1201

🤖 Prompt for AI Agents

In src/sdk/namespace_fs.js at lines 1019-1020, 1149, and 1201, the NoobaaEvent creation incorrectly referenced the object_name parameter as key instead of params.key, causing undefined variable errors. Update all these instances to use params.key for the object_name to ensure error events correctly include the object name.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

src/test/unit_tests/internal/test_file_reader.test.js (1)
1-2: Remove redundant 'use strict' directive.

JavaScript modules are automatically in strict mode. Apply this fix:
/* Copyright (C) 2020 NooBaa */
-'use strict';
src/util/file_writer.js (1)
41-48: Consider removing the redundant finished call.

The stream.promises.pipeline already waits for the pipeline to complete before resolving. The additional stream.promises.finished call appears redundant unless there's a specific reason to wait for the 'finish' event separately.

If the extra wait is not needed, simplify to:
 async write_entire_stream(source_stream, options) {
     await stream.promises.pipeline(source_stream, this, options);
-    await stream.promises.finished(this, options);
 }
src/sdk/namespace_fs.js (5)
867-867: Inconsistent loop syntax standardization.

While standardizing loop syntax from for(;;) to for(; ;) improves consistency, not all infinite loops in the file have been updated. This creates inconsistency within the same file.

Consider updating all infinite loops for consistency:
#!/bin/bash
# Search for remaining instances of for(;;) pattern
rg "for\s*\(\s*;\s*;\s*\)" --type js
1027-1032: JSDoc parameter documentation is incomplete.

The JSDoc comment is missing the @returns documentation and parameter descriptions are empty.

Complete the JSDoc documentation:
-    /**
-     * 
-     * @param {*} params
-     * @param {nb.ObjectSDK} object_sdk
-     * @param {nb.S3Response|stream.Writable} res
-     * @returns 
-     */
+    /**
+     * Reads object data and streams it to the response
+     * @param {Object} params - Object read parameters including bucket, key, range
+     * @param {nb.ObjectSDK} object_sdk - SDK instance with abort controller
+     * @param {nb.S3Response|stream.Writable} res - Response stream to write data to
+     * @returns {Promise<null>} Returns null to indicate response was handled
+     */
2627-2627: Address the TODO comment.

The TODO comment indicates a potential unnecessary stat call that could impact performance.

Would you like me to analyze whether this stat call is necessary and potentially optimize it? I can examine the usage patterns and determine if we can avoid this filesystem operation.

2898-2898: Consider using optional chaining for better null safety.

The static analysis tool correctly identifies that optional chaining would be safer here.

Apply optional chaining for safer property access:
-        return (stat && stat.xattr[XATTR_VERSION_ID]) || 'null';
+        return stat?.xattr?.[XATTR_VERSION_ID] || 'null';
3409-3409: Use BigInt literal for better readability.

Instead of using BigInt(0), use the BigInt literal syntax for consistency with modern JavaScript.
-                { mtimeNsBigint: BigInt(0), name: undefined });
+                { mtimeNsBigint: 0n, name: undefined });

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24e11b7 and 21420b0.

📒 Files selected for processing (7)

config.js (1 hunks)
src/sdk/namespace_fs.js (32 hunks)
src/test/unit_tests/internal/test_file_reader.test.js (1 hunks)
src/tools/file_writer_hashing.js (2 hunks)
src/util/file_reader.js (2 hunks)
src/util/file_writer.js (8 hunks)
src/util/native_fs_utils.js (8 hunks)

📓 Path-based instructions (1)

src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

src/test/unit_tests/internal/test_file_reader.test.js

🧬 Code Graph Analysis (1)

src/tools/file_writer_hashing.js (1)

src/endpoint/s3/ops/s3_put_object.js (1)

source_stream (40-40)

🪛 Biome (1.9.4)

src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🧰 Additional context used

📓 Path-based instructions (1)

src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

src/test/unit_tests/internal/test_file_reader.test.js

🧬 Code Graph Analysis (1)

src/tools/file_writer_hashing.js (1)

src/endpoint/s3/ops/s3_put_object.js (1)

source_stream (40-40)

🪛 Biome (1.9.4)

src/test/unit_tests/internal/test_file_reader.test.js

[error] 1-2: Redundant use strict directive.

The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.

(lint/suspicious/noRedundantUseStrict)

src/sdk/namespace_fs.js

[error] 2898-2898: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build Noobaa Image
GitHub Check: run-jest-unit-tests

🔇 Additional comments (27)

config.js (1)

988-989: LGTM! Good typo fix and symmetric configuration.

The correction from NFSF_ to NSFS_ fixes an important typo, and adding the corresponding download threshold creates symmetry with the upload configuration. The 8MB threshold is reasonable for stream memory management.

src/tools/file_writer_hashing.js (2)

74-78: Good refactoring to use FileWriter's centralized stream handling.

The change from manual pipeline management to write_entire_stream simplifies the code and aligns with the broader refactoring effort. The type cast is acceptable here since TargetHash implements the necessary writev method for testing purposes.

115-115: Consistent refactoring applied.

Good to see the same simplification pattern applied here as well.

src/util/file_reader.js (3)

4-224: Well-designed FileReader implementation with efficient buffer management.

The dual-mode approach (standard Readable stream and optimized read_into_stream) provides flexibility. The buffer pooling, abort signal handling, and sparse file support are all well implemented. Good documentation explaining the design rationale.

132-134: Good defensive check against Transform streams.

This prevents a subtle bug where Transform streams could hold onto buffers after the write callback, breaking the buffer pool's assumptions.

156-160: Proper buffer cleanup on EOF.

Good attention to detail in setting buffer_pool_cleanup = null before calling the callback to prevent double-cleanup.

src/test/unit_tests/internal/test_file_reader.test.js (1)

14-133: Excellent test coverage for FileReader.

The test suite thoroughly validates both reading modes against Node.js native streams, covering edge cases and various byte ranges. Good use of parameterized tests to avoid duplication.

src/util/native_fs_utils.js (4)

102-150: Excellent resource management pattern implementation.

The use_file function properly implements the RAII pattern with comprehensive error handling and logging at each stage. The finally block ensures files are always closed, preventing resource leaks.

789-827: Smart sparse file handling implementation.

Good documentation explaining the rationale. The warmup approach of reading just 1 byte is efficient for triggering recalls without holding large buffers. The lazy allocation of the warmup buffer is a nice optimization.

409-409: Good use of centralized error handling.

Using error_utils.new_error_code ensures consistent error object structure across the codebase.

82-82: Minor formatting consistency improvement.

The loop syntax changes from for(;;) to for(; ;) improve formatting consistency without affecting functionality.

Also applies to: 169-169, 527-527

src/util/file_writer.js (8)

17-27: LGTM! Type refinements improve code clarity.

The parameter type annotations using specific native types (nb.NativeFile, nb.NativeFSContext) and making parameters optional improves the API documentation and flexibility.

50-62: Excellent parallel processing implementation!

The method efficiently handles MD5 calculation and file writing in parallel using Promise.all, which improves performance. The conditional MD5 update and stats tracking are properly implemented.

64-72: LGTM! Clean finalization logic.

The method properly encapsulates the MD5 finalization logic with appropriate null checking and hex conversion.

137-155: Good refactoring to reduce code duplication!

The delegation to write_buffers eliminates duplicate logic and improves maintainability.

157-169: LGTM! Consistent refactoring pattern.

The delegation to finalize follows the same pattern as _writev, improving code organization and maintainability.

7-7: debug_module Import Path Verified

The import path update is correct—src/util/debug_module.js exists alongside file_writer.js, so require('./debug_module') will resolve successfully. No further action is needed.

26-26: All FileWriter calls have been updated—large_buf_size removed

No instantiations pass large_buf_size anymore:

Checked src/tools/file_writer_hashing.js (two calls) – only target_file, fs_context, namespace_resource_id

Checked src/sdk/namespace_fs.js – only target_file, fs_context, offset, md5_enabled, stats

No occurrences of large_buf_size in the codebase

The removal is safe and callers have been updated.

27-27: NSFS_UPLOAD_STREAM_MEM_THRESHOLD rename verified across codebase

The new constant is defined in config.js (config.NSFS_UPLOAD_STREAM_MEM_THRESHOLD = 8 * 1024 * 1024;) and no occurrences of NFSF_UPLOAD_STREAM_MEM_THRESHOLD remain.

src/sdk/namespace_fs.js (8)

11-11: Good addition of stream module import.

The stream module import is properly added to support the new stream.promises.finished usage throughout the file.

31-31: Proper integration of FileReader module.

The FileReader import follows the existing pattern and is correctly positioned among other utility imports.

1035-1056: Excellent abort signal handling implementation.

The addition of abort signal checks at critical points (lines 1045, 1055) and passing the signal to stream operations improves the cancellation behavior and prevents unnecessary work when requests are aborted.

1104-1140: Well-structured FileReader integration.

The refactoring to use FileReader improves code modularity and maintainability. The implementation properly:

Passes all necessary context including abort signal

Handles statistics collection

Provides detailed logging with performance metrics

Ensures proper cleanup with file closure

1136-1138: Proper use of stream.promises.finished with abort signal.

Using stream.promises.finished with the abort signal is the correct approach for waiting for stream completion while respecting cancellation. This replaces the custom utility and uses Node.js native functionality.

1607-1609: Clean abstraction using file_writer.write_entire_stream.

The refactoring to use file_writer.write_entire_stream with abort signal support improves code clarity and properly handles stream cancellation.

1780-1791: Improved readability in multipart listing.

The refactored code using filter and map chaining is more readable and functional. The logic is clearer and easier to maintain.

2552-2552: Centralized restore status calculation.

Good refactoring to use Glacier.get_restore_status for consistent restore status calculation across the codebase.

liranmauda

LGTM

liranmauda · 2025-07-23T13:24:31Z

src/sdk/namespace_fs.js

@@ -2701,6 +2623,7 @@ class NamespaceFS {

    async _load_bucket(params, fs_context) {
        try {
+            // TODO(guymguym): can we find a way to avoid this stat?


Should we keep this TODO?

romayalon · 2025-07-24T08:46:01Z

src/util/file_writer.js

+     * @param {import('events').Abortable} [options]
+     */
+    async write_entire_stream(source_stream, options) {
+        await stream.promises.pipeline(source_stream, this, options);


Do we miss any special handling when moving from stream_utils.js pipeline to stream.promises.pipeline ?

guymguym · 2025-07-24T09:49:40Z

src/sdk/namespace_fs.js

@@ -1672,7 +1593,6 @@ class NamespaceFS {
                md5_enabled,
                stats: this.stats,
                bucket: params.bucket,
-                large_buf_size: multi_buffer_pool.get_buffers_pool(undefined).buf_size,


check if really was unused

guymguym · 2025-07-24T09:53:28Z

src/test/unit_tests/internal/test_file_reader.test.js

+                            multi_buffer_pool,
+                            highWaterMark: 1024, // bytes
+                        });
+                        const data = await buffer_utils.read_stream_join(file_reader);


add also a test for read_into_stream()

nsfs refactor file_reader and file_writer

21420b0

Signed-off-by: Guy Margalit <guymguym@gmail.com>

pull-request-size bot added the size/XL label Jul 23, 2025

guymguym requested review from liranmauda, nadavMiz, romayalon and tangledbytes July 23, 2025 01:15

coderabbitai bot reviewed Jul 23, 2025

View reviewed changes

liranmauda approved these changes Jul 23, 2025

View reviewed changes

romayalon reviewed Jul 24, 2025

View reviewed changes

guymguym commented Jul 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nsfs refactor file_reader and file_writer #9153

nsfs refactor file_reader and file_writer #9153

guymguym commented Jul 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 23, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

liranmauda left a comment

Uh oh!

liranmauda Jul 23, 2025

Uh oh!

romayalon Jul 24, 2025

Uh oh!

guymguym Jul 24, 2025

Uh oh!

guymguym Jul 24, 2025

Uh oh!

Uh oh!

		{ bucket_path: this.bucket_path, object_name: params.key }, err);
		dbg.log0('NamespaceFS: read_object_stream couldnt find dir content xattr', err);

nsfs refactor file_reader and file_writer #9153

Are you sure you want to change the base?

nsfs refactor file_reader and file_writer #9153

Conversation

guymguym commented Jul 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the Problem

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

liranmauda left a comment

Choose a reason for hiding this comment

Uh oh!

liranmauda Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

romayalon Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

guymguym Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

guymguym Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guymguym commented Jul 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 23, 2025 •

edited

Loading