-
Notifications
You must be signed in to change notification settings - Fork 219
Do not append an empty buffer to S3 during file handle RELEASE or FLUSH #1591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Mansi Pandey <mansipnd@amazon.com>
… and Flush Signed-off-by: Mansi Pandey <mansipnd@amazon.com>
previously_flushed, | ||
.. | ||
} => { | ||
if (*written_bytes == 0 && !*previously_flushed) || !are_from_same_process(open_pid, pid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I need to fix this condition to move if (*written_bytes == 0 && *previously_flushed && are_from_same_process(open_pid, pid)) return Ok(())
condition inside so we don't go to the code on line 280
previously_flushed, | ||
.. | ||
} => { | ||
if (*written_bytes == 0 && !*previously_flushed) || !are_from_same_process(open_pid, pid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks that addition of !*previously_flushed
results in a change in behaviour, I am not sure if we want this:
- now on flush we keep the file handle open only if no data was written
*written_bytes == 0
and append have not been "commited" yet!*previously_flushed
- append is "commited" on \1 flush (close) and \2 fsync
- previously if close or fsync happened before current close we still may have kept the file handle open
what's the reasoning, why is this desired? also may it break behaviour of tools like touch
or dd
if they fsync
before the first close?
written_bytes, | ||
previously_flushed, | ||
} => { | ||
// No data upload should be pending completion as part of the release - if we have not written anything locally AND we already committed on a preceding Flush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarify here, that with this if we're skipping an S3 request, since we know that it happened before (and an empty object was already created)
// No data upload should be pending completion as part of the release - if we have not written anything locally AND we already committed on a preceding Flush | ||
if written_bytes == 0 && previously_flushed { | ||
Self::finish(fs, ino, initial_etag).await; | ||
return Ok(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reasoning for returning false
here?
With this change, Mountpoint will not commit an in-progress Append during RELEASE or FLUSH fs call, if the buffer is empty AND there has been a previous Flush on the file handle. This is because RELEASE call usually only happens after the last FLUSH operation on a file handle, and in that case an empty buffer append (to create the object in S3) must already have been committed as part of that preceding Flush. Hence, the Release (or a FLUSH on a duplicate FD) does not need to do another empty commit to S3, and just marks the inode as "Remote" (from "LocalOpen"). However, in cases when a FLUSH is not called because of some unexpected reason (say, application terminates), the kernel will still cause RELEASE if that application had the last reference of the file descriptor; and hence RELEASE will have to take care of the final commits and cleanup.
This is done to minimise the risk of an async RELEASE from the kernel competing with a new OPEN for the file, and causing the file handle to be in a "LocalOpen" state for longer than required thus failing any OPENs being called immediately after a CLOSE (Flush); however it does not solve the problem completely. It also avoids doing a no-op S3 PuObjectSingle call during RELEASE (and any Flushes on duplicate open FDs) if nothing has been written to the handle.
Does this change impact existing behavior?
Yes, it changes the behaviour of the RELEASE and FLUSH fs calls in a a specific situation and state of the file handle, only during Append path. Since RELEASE is async, the change only impact customer workloads where duplicate FDs were being open and closed without writing anything (causing multiple empty PuObjectSingle calls to S3 during flushes) for which we will skip doing any append calls if we have previously flushed the empty write.
Does this change need a changelog entry? Does it require a version change?
No, since it's not a functional change in behaviour/data seen by customer and only a logic optimisation, it does not need a changelog entry or version change.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).