Skip to content

Write support for blob handles with pending payload #24458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 8, 2025

Conversation

ChumpChief
Copy link
Contributor

@ChumpChief ChumpChief commented Apr 25, 2025

Ports the remainder of the prototype in #24158, which is write support for blob handles with pending payload.

Mostly unchanged as compared to the prototype, other than updating for the renames taken in #24320.

AB#36251

@Copilot Copilot AI review requested due to automatic review settings April 25, 2025 00:58
@ChumpChief ChumpChief requested a review from a team as a code owner April 25, 2025 00:58
@github-actions github-actions bot added base: main PRs targeted against main branch area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc public api change Changes to a public API labels Apr 25, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ports write support for blob handles with pending payload by adding a new flag (createBlobPayloadPending) and updating tests, API signatures, and internal handling of blob payload states. Key changes include:

  • Updating test files to run with and without pending payload support.
  • Adding new type guards and API exports (e.g. isFluidHandlePayloadPending) to support pending payload states.
  • Updating blob manager logic to handle payload state transitions (local, shared, failed).

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/test/test-end-to-end-tests/src/test/blobsisAttached.spec.ts Refactored imports and wrapped tests in a loop for createBlobPayloadPending variations.
packages/test/test-end-to-end-tests/src/test/blobs.spec.ts Updated test container config and test descriptions to pass the new flag.
packages/runtime/runtime-utils/* Export and add new type guard for handling pending blob payloads.
packages/runtime/container-runtime/src/* Updated blob manager methods and tests to support blob handles with pending payloads.
packages/common/core-interfaces/* Added new interfaces and types for the pending payload feature.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* @legacy
* @alpha
*/
export type PayloadState = "local" | "shared" | "pending" | "failed";
Copy link
Contributor

@anthony-murphy anthony-murphy Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to think through the usage here. what are the supported state transitions? how do users reason about them? What is the experience for local client vs remote clients? are they symmetrical? should they be, or do client need custom code for local and remote.

Copy link
Contributor

@anthony-murphy anthony-murphy Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this would be easier to review if the implementation were split from api, as the api will likely have significantly more comments than the internals. getting the internals in would also unblock re-enabling testing with staging mode.

Copy link
Contributor

@anthony-murphy anthony-murphy Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should remove 'local' at this layer, and only have "persisted" | "pending" | "failed", anything related to local should be moved to a blob handle type such that local information is only available on the strongly typed local object returned from the call to upload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For visibility - had a chat with Tony on the side, we have a bit better understanding of each other now.

Re: the question above, the states are not symmetrical for local/remote nor should they be, since the states are wholly different between the two (both in what can be detected and what reaction should be taken), with the exception of "shared" state which is where they reach parity. I've updated the documentation a bit to clarify as well.

In our conversation Tony suggested splitting the interface instead, such that local/remote have different types. I'm thinking more on this option or if there's any preferable alternative here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just realized that you can use mermaid diagrams in github comments, so including the current state diagram here too:

---
title: Handles with pending payload lifecycle
---

flowchart LR
subgraph local["local only"]
  direction LR
  local_state["local"]
  failed_state["failed
    (emit 'failed')"]
  local_state -- "fail publish" --> failed_state
end
subgraph remote["remote only"]
  direction LR
  pending_state["pending"]
end
shared_state["shared
    (emit 'shared')"]

local ~~~ shared_state
shared_state ~~~ remote
local_state -- "publish" --> shared_state
pending_state -- "observe publish" --> shared_state
Loading

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand the remote case a bit better now, but i'm still not sure the api as is ergonomic for the remote case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: splitting the interface, I'm struggling to find a case where it improves the customer experience.

When creating a BlobHandle via uploadBlob(), the customer immediately gets the handle back in local state (and it will not exit this state until they elect to attach the handle, giving them plenty of time to register listeners). They need the full interface except "pending" state, since they likely want to observe all of the possible transitions and states through the upload process. It's true we have type knowledge here that the handle will never enter "pending" state, but at the same time the customer doesn't need to do anything special to avoid the "pending" state; it doesn't get in their way.

const myBlobHandle = containerRuntime.uploadBlob(blob);
const onFailed = (error) => {
    /* Some remedy here, e.g. delete the handle */
    myBlobHandle.events.off("failed", onFailed);
    myBlobHandle.events.off("shared", onShared);
};
const onShared = () => {
    /* Here maybe clear a "might lose data" flag to warn users before closing the tab */
    myBlobHandle.events.off("failed", onFailed);
    myBlobHandle.events.off("shared", onShared);
}
myBlobHandle.events.on("failed", onFailed);
myBlobHandle.events.on("shared", onShared);
/* Here maybe set a "might lose data" flag to warn users before closing the tab */
myMap.set("someKey", myBlobHandle);

To acquire a RemoteFluidObjectHandle, the customer is likely pulling it from a location where it might be mixed in with BlobHandles in normal app operation. For example, if I get an IFluidHandle out of a SharedMap it is probably dangerous to assume whether that IFluidHandle is actually a BlobHandle or RemoteFluidObjectHandle. The customer can't ignore that both are possibilities with whatever logic they're applying. However, I don't think this is a burden for them. If we assume they are doing their local failure watching at the point of calling uploadBlob as suggested above, then this portion is probably just looking to run the remote cleanup heuristic.

// Here in the code we don't know whether the handle is sourced locally or remote
const someMaybeRemoteHandle = myMap.get<IFluidHandle>("someOtherKey")!;
// ...So we just check for "pending", e.g. rather than !"shared" - these are the ones we want to run a heuristic on only
if (someMaybeRemoteHandle.payloadState === "pending") {
    const shouldGiveUpP = new Promise<boolean>((resolve) => {
        const onShared = () => {
            resolve(false);
            someMaybeRemoteHandle.events.off("shared", onShared);
        };
        someMaybeRemoteHandle.events.on("shared", onShared);
        setTimeout(() => {
            resolve(true);
            someMaybeRemoteHandle.events.off("shared", onShared);
        }, someDelayMs);
    });

    shouldGiveUpP
        .then((shouldCleanUp) => { /* if true, do some cleanup */ })
        .catch((error) => { /* ... */ };
}

These are obviously just examples, but I don't see where having a narrower interface necessarily simplifies. If there are other better examples would be glad to take a look though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I just realized I had missed your tsplayground - responses inline here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the responses i feel pretty strongly we should split the local and remote experiences. its very confusing that only some states and events only apply local or remote. I can see customer writing code where they wait for failure on remote handles which is impossible, but nothing in the interface or code will prevent it. I'm of the opinion that when possible, it should be clear from the code itself how to use an api, and it seems quite achievable here with little change.

for the local api, i would still push for a combined event which covers both shared and failed states to avoid the need to write code that crosses event handles, as that type of code gets very complex, especially when you take in further requirements like tracking close/dispose to give up.

I'm also curious what @yann-achard-MS and @znewton think about this, as they will helping partners to onboard these apis and their extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also consider punting on the remote half of the api, since its not hooked up yet, so no need to commit to that api yet anyway.

Comment on lines 141 to 145
(event: "shared", listener: () => void);
/**
* Emitted for locally created handles when the payload fails sharing to remote collaborators.
*/
(event: "failed", listener: (error: unknown) => void);
Copy link
Contributor

@anthony-murphy anthony-murphy Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should remove both of these, and just have a single event for progress. Maybe something like:
(event: "progress", listener: ({type:"update"} | {type:"persisted"} | {type:"failed" error?:Error})=>void)

this ensure the event is applicable to all clients, and it lets the handle implementation short-circuit the event if someone registers while the state is already terminal (persisted or failed). we do this for a number of events like connected, as we saw users frequently building dangling event handles that never resolved as the connection state was already connected.

something like progress is probably optional for now, but someday i could see us using signals in the blob manager so broadcast progress updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate to merge since the customer reaction to a failure likely has no overlap with their reaction to successful completion. In general it's more ergonomic for a customer to register specifically for the event they're interested in observing (rather than forcing them to conditionalize within their handler).

The short-circuiting behavior you mention is an antipattern, we removed many of those over time (e.g. #14371, not sure if we got them all though). See #14272 (comment) and other comments in that PR for more context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find it more ergonomic to only need a single event when both events must always be considered, as i don't think there is ever a case where shared or failed are useful separately, so the current design forces a complicated multi event pattern. if there is a single event pattern, combining also doesn't make that more difficult.

return (this._events ??= createEmitter<IFluidHandlePayloadPendingEvents>());
}

private _state: PayloadState = "local";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will this do on a remote client right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing since remote clients don't instantiate BlobHandles (they will only have RemoteFluidObjectHandles). When we add support in RemoteFluidObjectHandles they'll have their own corresponding state/events for "pending" -> "shared".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so isFluidHandlePayloadPending will always return false for a remote blob handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the future but yes for this PR (since we haven't implemented observability for remote handles yet). It will always return true when the payload status is inspectable/observable.

Once observability is moved to IFluidHandle then the typeguard isn't needed anymore and can just be removed.

Copy link
Contributor

github-actions bot commented May 8, 2025

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  195689 links
    1565 destination URLs
    1797 URLs ignored
       0 warnings
       0 errors


@ChumpChief ChumpChief merged commit 9416ca9 into microsoft:main May 8, 2025
37 checks passed
@ChumpChief ChumpChief deleted the PendingPayloadWrite branch May 8, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc base: main PRs targeted against main branch public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants