Skip to content

Conversation

tzakian
Copy link
Contributor

@tzakian tzakian commented Aug 29, 2024

Description

This is the base rewrite for the package loading and caching mechanism. Before progressing much further than this, we will need to rip out the current loading and module/type/function resolution in the interpreter and move over to passing vtables.

Test plan

This is not tested across the whole test suite. However I have added a number of unit tests for the loader in package_cache_tests.rs.

@tzakian tzakian requested review from cgswords and dariorussi August 29, 2024 23:16
Copy link

vercel bot commented Aug 29, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 14, 2024 0:07am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Sep 14, 2024 0:07am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Sep 14, 2024 0:07am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Sep 14, 2024 0:07am

}

/// Read the package bytes stored on-disk at `addr`
fn get_package_bytes(&self, address: &AccountAddress) -> Result<Option<Vec<Vec<u8>>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type seems janky -- shouldn't we give an error if the address path is invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in keeping with the other APIs in here/in general (e.g., get_module_bytes) with storage-related APIs, so I think this type makes sense to me at least?

If it's useful -- this is used for the ModuleResolver trait below in this module, and this trait is also over DBs where you do want to signify "error fetching" vs "no error but not found"

Comment on lines 10 to 11
};
use std::fmt::Debug;
use std::sync::Arc;
use std::{collections::BTreeSet, fmt::Debug};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine imports


/// Return the transitive closure of all package dependencies of the current linkage context.
fn all_package_dependencies(&self) -> Result<BTreeSet<AccountAddress>, Self::Error> {
Ok(BTreeSet::new())
Copy link
Contributor

@cgswords cgswords Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be empty, to demand others implement it?

///
/// The resuling map of vtables _must_ be closed under the static dependency graph of the root
/// package w.r.t, to the current linkage context in `data_store`.
pub fn new(data_store: &impl DataStore, package_runtime: &'a PackageLoader) -> VMResult<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
pub fn new(data_store: &impl DataStore, package_runtime: &'a PackageLoader) -> VMResult<Self> {
pub fn new(data_store: &impl DataStore, package_loader: &'a PackageLoader) -> VMResult<Self> {

};

struct Context<'a> {
link_context: AccountAddress,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this field now, right?


struct Context<'a> {
link_context: AccountAddress,
cache: &'a LoadedPackage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: rename this to package.

// Also compute the packages dependency order. This is because we need to count on the fact that
// all dependencies are loaded and their types cached before we cache a package.
let package_deps = if let Some(pkg) = self.package_cache.read().loaded_package_at(dep) {
let package_deps = Self::compute_immediate_package_dependencies(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a guarantee that the dependencies are in the cache if we find this package in the cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily since different link contexts will induce different dependencies possibly and a package is not cached based on link context (and is in fact shared across multiple link contexts).

Base automatically changed from cgswords/arena_loader_0 to vm_2024_rewrite September 3, 2024 23:09
@tzakian tzakian force-pushed the tzakian/wip-package-loader branch from e8f8c00 to 52e9385 Compare September 4, 2024 19:57
Comment on lines +59 to +61
// NB: this is needed for the bytecode verifier. If we update the bytecode verifier we should
// be able to remove this.
pub compiled_modules: BinaryCache<Identifier, CompiledModule>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this. comment means?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what to think of this.
In a way this is not a massive deal as those modules would have to live somewhere.
I was envisioning a global cache for CompiledModule which would return verified modules and so we would put things there only when verified (module verification, not cross modules). In a way that global cache could have any policy we wanted (dropped any time). Sort of a front end cache to module for the data store.
But we are there yet and not clear when and if we would go there, and in that case this may very well be just fine for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we go to check linking we need to call into the bytecode verifier for checking compatibility across dependencies. For this it needs/uses the CompiledModule so we need to keep the CompiledModules for the package around for this purpose until/if we change the cyclic check and linking checker in the bytecode verifier to work over loaded modules.

Comment on lines 24 to 25
/// before the beginning of execution, and based on the static call graph of the package that
/// contains the root package id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// before the beginning of execution, and based on the static call graph of the package that
/// contains the root package id.
/// before the beginning of execution, based on the static call graph of the root package (that
/// is, the package that contains the root package id).

@tzakian tzakian force-pushed the tzakian/wip-package-loader branch from 5497eec to a89f593 Compare September 12, 2024 23:15
@tzakian tzakian marked this pull request as ready for review September 12, 2024 23:15
@tzakian tzakian changed the title WIP: package loader [move][vm_rewrite] package loader Sep 12, 2024
Bytecode::StLoc(a) => write!(f, "StLoc({})", a),
Bytecode::Call(a) => write!(f, "Call({})", a.to_ref().name),
Bytecode::KnownCall(a) => write!(f, "Call({})", a.to_ref().name),
Bytecode::VirtualCall(a) => write!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't CallGeneric also be updated to reflect this distinction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, also let's consider if we want an instruction for calling the framework. That would be equivalent to a static call but it may be good to distinguish those even if the implementation would be identical.
Also we should do the same for native calls irrespective of the implementation.
I think all of that info is available already so it should not be a problem to do so

use std::collections::BTreeMap;

#[derive(Debug, Clone)]
pub(crate) struct DeserializedPackage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider BinaryFormatPackage or something, since it holds things from the move_binary_format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add the type table and the linkage table here, not sure why we are omitting them.
At the end it does not seem a big deal to have them and it may help for future work.
Though I am not very clear yet on how all of this is coming together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type table may be fine here, but linkage table would definitely not be correct here (IMO). This package is being loaded "irrespective" of the context. The context that the package is then linked in/interpreted in is a dynamic construct and IMO shouldn't reside on this most likely. But lets chat about it :)

Comment on lines 47 to 65
pub fn new(data_store: &impl DataStore, package_runtime: &'a VMCache) -> VMResult<Self> {
let mut loaded_packages = HashMap::new();

// Make sure the root package and all of its dependencies (under the current linkage
// context) are loaded.
let cached_packages = package_runtime.load_and_cache_link_context(data_store)?;

// Verify that the linkage and cyclic checks pass for all packages under the current
// linkage context.
linkage_checker::verify_linkage_and_cyclic_checks(&cached_packages)?;
cached_packages.into_iter().for_each(|(_, p)| {
loaded_packages.insert(p.runtime_id, p);
});

Ok(Self {
loaded_packages,
cached_types: &package_runtime.type_cache,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small structural nit: I would expect the VMCache to take the DataStore and return one of these, as package_runtime.generate_vtables(data_store) or similar, so that, e.g., we can push the parallel execution handling into the cache logic. As it is right now, this code will need to be lock-aware to let that work.

@@ -0,0 +1,120 @@
// Copyright (c) The Move Contributors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering adding this to move-vm-test-utils/storage.rs if it can go there?

.clone())
}

fn all_package_dependencies(&self) -> Result<BTreeSet<AccountAddress>, Self::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of confusion: it appears we also define relinking_store.rs below, which could b emoved to move-vm-test-utils/src/storage.rs If we did that, could we also reuse it here?

}
}

fn load_package(&self, package_id: &AccountAddress) -> VMResult<Vec<Vec<u8>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have hoped version cuts could have avoided needing this sort of stuff, but I see we are sort of stuck where we are. Any solution I have isn't particularly better. Maybe on a longer timeline the version cut should become the whole crates/ folder...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, versioning the whole thing is the only way around this :'(

That being said though I realized I can just stub these out -- they should never be accessed from old versions.

Copy link
Contributor

@dariorussi dariorussi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to move some conversation offline and stop reviewing at the moment until we talk.
Thanks so much for this work, it's looking good and a lot to do

))
})
.map(|f| f.as_ref())
.ok_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe that is what we used to do and to make sure I understand.
This is giving MISSING_DEPENDENCY with a missing function message both in case the lookup of the package or the lookup of the function fail.
Would it be better to have different errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MISSING_DEPENDENCY in both cases is fine/correct. Maybe we just add a different message?

Comment on lines +35 to +36
pub type PackageStorageId = AccountAddress;
pub type RuntimePackageId = AccountAddress;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of those names which mean very little to me. Admittedly I am not sure I have better names.
But at the end is something along the lines of OriginId and PackageId.
Or maybe PackageId and PackageVersionId.
Not sure, but storage and runtime make no sense to me and I'd love for us to think of better names.
Basically PackageRuntimeId (I think, because I never remember which goes to prove how non descriptive the names are) is the package identity as a "name". The logical view of a package. Package at version 0.
Whereas PackageStorageId is, well... the package itself.
In other words if I wanted to know all version of a package I would do a "search" over PackageRuntimeId and have all the different version that have been loaded for that package.
Whether that operation is ever needed I don't think so, but having the package identity (as again in the name of the package) in there does not seem to hurt. Also it will be probably needed in some scenario.

edit: I wrote this comment mixing up storage id and runtime id and had to edit this - lol

Comment on lines +55 to +56
// NB: this is under the package's context so we don't need to further resolve by
// address in this table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment as it is not adding much in my opinion

Comment on lines +59 to +61
// NB: this is needed for the bytecode verifier. If we update the bytecode verifier we should
// be able to remove this.
pub compiled_modules: BinaryCache<Identifier, CompiledModule>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what to think of this.
In a way this is not a massive deal as those modules would have to live somewhere.
I was envisioning a global cache for CompiledModule which would return verified modules and so we would put things there only when verified (module verification, not cross modules). In a way that global cache could have any policy we wanted (dropped any time). Sort of a front end cache to module for the data store.
But we are there yet and not clear when and if we would go there, and in that case this may very well be just fine for now

/// ```..., arg(1), arg(2), ..., arg(n) -> ..., return_value(1), return_value(2), ...,
/// return_value(k)```
Call(ArenaPointer<Function>),
KnownCall(ArenaPointer<Function>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please call this StaticCall? I cannot possibly digest KnownCall

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! forgot to make this change, but doing it :)

// - Virtual: the function is unknown and the index is the index in the global table of vtables
// that will be filled in at a later time before execution.
pub enum CallType {
Known(ArenaPointer<Function>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in the bytecode comment can we call this Static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Making that name change right now

Bytecode::StLoc(a) => write!(f, "StLoc({})", a),
Bytecode::Call(a) => write!(f, "Call({})", a.to_ref().name),
Bytecode::KnownCall(a) => write!(f, "Call({})", a.to_ref().name),
Bytecode::VirtualCall(a) => write!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, also let's consider if we want an instruction for calling the framework. That would be equivalent to a static call but it may be good to distinguish those even if the implementation would be identical.
Also we should do the same for native calls irrespective of the implementation.
I think all of that info is available already so it should not be a problem to do so

@@ -0,0 +1,29 @@
// Copyright (c) The Move Contributors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file name is funny, it took me a while to understand what it means, and I guess chain refers to blockchain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also maybe call it binary_ast or binary_package?

use std::collections::BTreeMap;

#[derive(Debug, Clone)]
pub(crate) struct DeserializedPackage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add the type table and the linkage table here, not sure why we are omitting them.
At the end it does not seem a big deal to have them and it may help for future work.
Though I am not very clear yet on how all of this is coming together


pub struct TypeCache {
pub cached_types: DatatypeCache,
pub cached_instantiations: HashMap<CachedTypeIndex, HashMap<Vec<Type>, DatatypeInfo>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this used and how?

@tzakian tzakian force-pushed the tzakian/wip-package-loader branch from cee6909 to 04a1ba5 Compare September 14, 2024 00:06
@tzakian tzakian merged commit e0d3927 into vm_2024_rewrite Sep 16, 2024
32 of 46 checks passed
@tzakian tzakian deleted the tzakian/wip-package-loader branch September 16, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants