Skip to content

fix: hdfs read into buffer fully #2031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

parthchandra
Copy link
Contributor

The get and read_range methods in hdfs object store implementation do not always read the data requested because the underlying hdfs call may return fewer bytes than requested.
#1992 (comment)

@parthchandra parthchandra requested review from andygrove and comphead and removed request for andygrove July 16, 2025 00:50
@parthchandra
Copy link
Contributor Author

@Kontinuation, fyi

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.67%. Comparing base (f09f8af) to head (e630c1e).
Report is 328 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2031      +/-   ##
============================================
+ Coverage     56.12%   58.67%   +2.55%     
- Complexity      976     1253     +277     
============================================
  Files           119      134      +15     
  Lines         11743    13102    +1359     
  Branches       2251     2396     +145     
============================================
+ Hits           6591     7688    +1097     
- Misses         4012     4179     +167     
- Partials       1140     1235      +95     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

.read_with_pos((range.start + total_read) as i64, buf[total_read as usize..].as_mut())
.map_err(to_error)?;
if read == -1 {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return an unexpected eof error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, otherwise the function result is probably not predictable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this implementation is incorrect. read_with_pos never returns -1. Instead, it returns 0 when hitting EOF.

libhdfs behaves strangely, and so does fs-hdfs.

References:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this @Kontinuation! I based this on the return value of FSDataInputStream, but looks like both fs-hdfs and the official hadoop libhdfs behave differently.
Let me address this.
In fs-hdfs
: a return value < 0 is an error
: a return value of 0 is OK even if the buffer is not fully read into. However, it looks like other object store implementations treat this as an error. So I will treat both as an error (maybe reintroduce the original assertion).

Comment on lines 146 to 149
let read = file.read(buf.as_mut_slice()).map_err(to_error)?;
if read == -1 {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also problematic. read also return 0 when hitting EOF and never return an negative value.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kontinuation
Copy link
Member

Sorry @Kontinuation if I check your references https://github.com/datafusion-contrib/fs-hdfs/blob/8c03c5ef0942b75abc79ed673931355fa9552131/c_src/libhdfs/hdfs.c#L1564C15-L1564C19 like here, it actually returns -1?

When EOF was hit, FSDataInputStream#read returns -1. This follows the requirement of the parent class DataInputStream#read:

return the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.

The jVal.i < 0 branch is for handling EOF in libhdfs, it returns 0:

    if (jVal.i < 0) {
        // EOF
        destroyLocalReference(env, jbRarray);
        return 0;
    }

@parthchandra
Copy link
Contributor Author

@Kontinuation @andygrove @comphead, updated based on review comments

@@ -88,19 +88,33 @@ impl HadoopFileSystem {

fn read_range(range: &Range<u64>, file: &HdfsFile) -> Result<Bytes> {
let to_read = (range.end - range.start) as usize;
let mut total_read = 0u64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering would be that better to have total_read as usize?

Copy link
Member

@Kontinuation Kontinuation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch looks good to me, and it reminds me another problem with fs-hdfs.

The HdfsErr returned by fs-hdfs read functions does not contain JVM stack traces. If there's a read failure caused by an exception in JVM, all we get is an Err(HdfsErr::Generic) with a vague error message. The stack trace will only be printed to stderr by libhdfs but not get propagated to the Java exception thrown by Comet. User has to browse the executor logs to know what happened when a query failure was caused by fs-hdfs read failure.

libhdfs has getLastTLSExceptionRootCause and getLastTLSExceptionStackTrace for retrieving detailed information about the exception, but fs-hdfs is not making use of them.

An ideal approach is to rethrow the original Java exception using env.throw, just like how Comet handles other JNI call exceptions. But this requires revising how Java exception objects are managed in libhdfs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants