[indexer] Use full_objects_history in reads #19290

lxfind · 2024-09-10T01:14:30Z

Description

This PR queries the full_objects_history table for objects query.
It does so in a backward-compatible way. We always return data from the objects_history table if it's found, and otherwise we return from the full_objects_history table.

Test plan

Added e2e test

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

vercel · 2024-09-10T01:14:33Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 17, 2024 6:46pm

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Sep 17, 2024 6:46pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Sep 17, 2024 6:46pm
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Sep 17, 2024 6:46pm

wlmyng · 2024-09-10T19:39:54Z

Testing is tricky. We need to be able to create a scenario where an object version is in the full_objects_history table but not in the objects_history table. Is it possible to trigger pruning in the tests?

This would be easier done on the graphql side - I've exposed something recently that enables pruning and waits for pruning effects

amnn

Doing an initial pass -- I'm a little concerned about how we're achieving consistency here by relying on the tables that we would use to fetch the live object set first, and then moving on to the full objects history for point look-ups, because it leaves us open to races breaking consistency for brand new objects:

Imagine someone runs a query for an object they just created. At the time the request starts, it's not in objects_snapshot or objects_history (so it wouldn't show up in any consistent query for the live object set).
By the time we issue the query to full_objects_history it does get written out.

Now, because of the cascading logic introduced in this PR, that object will appear in the result set, when we did not intend it to.

If we are settling on keeping objects_version around and with a very generous retention, then I think the best course of action is to rely on it as the sole arbiter for consistency:

Introduce a DataLoader that just does the point lookups to full_objects_history -- this will one day be replaced by a call to the KV store.
Both the historical and parent lookups work by first querying objects_version to get a consistent set of (id, version) keys, and following up with a query to get object contents.

This architecture also meshes nicely with another Object-related workstream, which is to make sure it's always possible (and efficient) to get the ObjectRef for an object, e.g. from the effects of transactions that have just run for very old historical transactions where the object itself has been pruned. We can add the digest to objects_version and most object queries will just be a query to objects_version to get those initial key details about an object. The subsequent data-loader fetch to get the objects content will be hidden in the implementation of Object's fields, something like so:

struct Object {
  address: SuiAddress,
  version: u64,
  digest: Digest,
}

struct ObjectContents {
  native: NativeObject,
  stored: StoredKVObject,
}

#[Object]
impl Object {
  async fn address(&self) -> Address {
    self.address
  }

  async fn version(&self) -> UInt53 {
    self.version.into()
  }

  async fn digest(&self) -> Base64 {
    Base64::from(self.digest)
  }

  #[graphql(flatten)]
  async fn contents(&self, ctx: &Context<'_>) -> Result<Option<ObjectContents>> {
    let DataLoader(loader) = ctx.data_unchecked();

    loader.load_one(HistoricalKey { 
       id: self.address,
       version: self.version
    })
    .await
    .extend()
  } 
}

#[Object]
impl ObjectContents {
  fn bcs(&self) -> Base64 { /* ... */ }
  // ...
}

This is also where my thought came from to have objects_snapshot and the consistent set store only the same fields as objects_version and otherwise rely entirely on the KV store for object content.

crates/sui-graphql-rpc/src/types/object.rs

amnn · 2024-09-12T11:25:03Z

crates/sui-graphql-rpc/src/types/object.rs

+                        None,
+                    )
+                } else {
+                    Object::new_wrapped_or_deleted(key.id, key.version, key.checkpoint_viewed_at)


IIUC, this is a bit strange, it means that if I request some totally made up ObjectID and/or version, I will get a response that says that that object is wrapped or deleted, which doesn't seem right.

I'm onboard with the idea that we shouldn't distinguish between an object that is wrapped/deleted/pruned/never existed, but I think in practice that has to mean that they all return no response, rather than this response.

In this case, we know for sure it is a wrapped/deleted object, because we had an entry from full_objects_history table, but the serialized column is null.

crates/sui-graphql-rpc/src/types/object.rs

lxfind · 2024-09-12T22:49:54Z

Introduced a new key called PointLookupKey and a loader for it. This would look up in the full_objects_history table for a given ID and version.
Also added a new ObjectKind called Serialized to capture this case.

amnn

The implementation looks broadly correct to me and I only had nits looking at that, but then the tests surprised me, so let's ignore the nits for now and focus on that:

Here's how I understand the test:

Publish
Create O(0) and send to A (aka object(2,0))
Create checkpoint + advance epoch
Create O(1) and send to A (aka object(5,0)

Ask GraphQL for a version of O(0), which exists, ask for A's objects which should include O(0) but not O(1) because we haven't advanced the checkpoint yet -- so far so good.

Create the checkpoint, advance the epoch -- this triggers pruning in objects_history.

Now because the test set-up is kind of strange, with only two checkpoints per epoch, we're in a weird place: objects_history is designed to trail objects by some number of checkpoints, but its pruning policy is set in terms of epochs, this means that although we should always be able to query live objects, we can't, because these live objects are being pulled from objects_history and not objects_snapshot.

This is the strange part that I understand, then there's the strange part I don't understand: Where did O(1) go? It was created only one epoch ago, and we should have a two epoch lag.

TL;DR: The expected test output at the end should be that we see both O(0) and O(1) owned by A, but for two different reasons, we see neither.

crates/sui-graphql-rpc/src/types/object.rs

amnn · 2024-09-16T10:55:51Z

crates/sui-graphql-rpc/src/types/object.rs

+            .map(|o| {
+                (
+                    PointLookupKey {
+                        id: addr(&o.object_id).unwrap(),


ditto on the question of unwrap or internal error.

crates/sui-graphql-rpc/src/types/object.rs

crates/sui-graphql-e2e-tests/tests/objects/full_objects_history.move

amnn

I think this PR is good to go now, and we can address the broader questions around objects_history pruning in a follow-up -- thanks @lxfind .

amnn · 2024-09-17T11:27:13Z

crates/sui-graphql-rpc/src/types/object.rs

+    pub(crate) fn new_serialized(
+        object_id: SuiAddress,
+        version: u64,
+        serialized: SerializedObject,


nit (optional): I would keep this unaliased, because it's useful to know that it's an Option<_> and from context it's pretty clear that it's the bytes of the serialized object.

amnn · 2024-09-17T11:46:55Z

Couple of notes before this lands though:

Does this need to wait until the full_objects_history backfill is completed? Because IIUC, if not, we will start failing point lookups where previously we did not, because we are no longer falling back on objects_history.
I'm about to land the CI steps to accommodate staging, and that will move all the existing tests into a stable directory, so watch out for that on rebase.

lxfind · 2024-09-17T17:07:59Z

Does this need to wait until the full_objects_history backfill is completed? Because IIUC, if not, we will start failing point lookups where previously we did not, because we are no longer falling back on objects_history.

The next release that's going out will already have the writer-side of changes, which I will backfill to make sure it has all data. So when this change gets into prod we should be already complete on data.

lxfind requested review from amnn, emmazzz, stefan-mysten, suiwombat and wlmyng as code owners September 10, 2024 01:14

vercel bot deployed to Preview – sui-docs September 10, 2024 01:16 View deployment

Base automatically changed from indexer-cleanup-graphql-object-deleted to main September 11, 2024 17:21

lxfind force-pushed the indexer-use-full-object-history branch from 0a4805d to c0a55be Compare September 11, 2024 19:52

vercel bot deployed to Preview – sui-docs September 11, 2024 19:57 View deployment

lxfind force-pushed the indexer-use-full-object-history branch from c0a55be to 15e5fb1 Compare September 12, 2024 05:11

vercel bot deployed to Preview – sui-docs September 12, 2024 05:15 View deployment

amnn reviewed Sep 12, 2024

View reviewed changes

vercel bot deployed to Preview – sui-docs September 12, 2024 22:47 View deployment

lxfind force-pushed the indexer-use-full-object-history branch from 434f790 to ede930c Compare September 12, 2024 22:48

lxfind requested a review from amnn September 12, 2024 22:49

vercel bot deployed to Preview – sui-docs September 12, 2024 22:49 View deployment

lxfind force-pushed the indexer-use-full-object-history branch from ede930c to abac636 Compare September 13, 2024 18:56

vercel bot deployed to Preview – sui-docs September 13, 2024 18:57 View deployment

lxfind force-pushed the indexer-use-full-object-history branch from abac636 to ee056ff Compare September 13, 2024 20:03

vercel bot deployed to Preview – sui-docs September 13, 2024 20:04 View deployment

lxfind force-pushed the indexer-use-full-object-history branch from ee056ff to 25f854a Compare September 13, 2024 22:05

vercel bot deployed to Preview – sui-docs September 13, 2024 22:07 View deployment

amnn reviewed Sep 16, 2024

View reviewed changes

vercel bot deployed to Preview – sui-docs September 16, 2024 20:51 View deployment

lxfind requested a review from amnn September 16, 2024 22:38

amnn approved these changes Sep 17, 2024

View reviewed changes

lxfind added 7 commits September 17, 2024 10:27

[indexer] Use full_objects_history in reads

e6dd3ec

Also patch dynamic field queries

5a582b3

Add tests

dc77f59

Add PointLookup

9bc0b46

code feedback

580b929

Make test clearer

ff1d04a

Remove SerializedObject alias

4289ace

lxfind force-pushed the indexer-use-full-object-history branch from 3cb81de to 4289ace Compare September 17, 2024 17:29

lxfind enabled auto-merge (squash) September 17, 2024 17:31

vercel bot deployed to Preview – sui-docs September 17, 2024 17:33 View deployment

Rebase and fix test

5ac609c

vercel bot deployed to Preview – sui-docs September 17, 2024 18:46 View deployment

lxfind merged commit 95f0a4d into main Sep 17, 2024
47 of 48 checks passed

lxfind deleted the indexer-use-full-object-history branch September 17, 2024 19:12

bingyanglin mentioned this pull request Jan 15, 2025

move(visitor): expose underlying byte slice #19517 iotaledger/iota#4783

Closed

[indexer] Use full_objects_history in reads #19290

[indexer] Use full_objects_history in reads #19290

Uh oh!

Conversation

lxfind commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test plan

Release notes

Uh oh!

vercel bot commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wlmyng commented Sep 10, 2024

Uh oh!

amnn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amnn Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

lxfind Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lxfind commented Sep 12, 2024

Uh oh!

amnn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amnn Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amnn left a comment

Choose a reason for hiding this comment

Uh oh!

amnn Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

amnn commented Sep 17, 2024

Uh oh!

lxfind commented Sep 17, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lxfind commented Sep 10, 2024 •

edited

Loading

vercel bot commented Sep 10, 2024 •

edited

Loading