Results of Multi-world research and next steps #20238
Replies: 4 comments 14 replies
-
This helped me a lot to understand how communication between worlds would work. Particularly if entities share the same id across worlds then it becomes fairly simple as you've pointed out. Obviously the copying would have to happen safely and soundly. I guess as long as there's only one executor per app then Bevy's existing safety guarantees should ensure worlds only copy data when its safe to do so, right? That type of "sync point" between worlds will be important. Particularly for rendering, you don't want the main world to get too far ahead of the render world in situations where your main world completes a frame much faster than your render world. The main world should start blocking if it gets more than 2 or 3 "frames"/update cycles/whatever you want to call it ahead of the render world. |
Beta Was this translation helpful? Give feedback.
-
Overall I like this, although I'll quibble with details and think that this post would have been stronger with clearer concrete use cases to motivate it :)
Why use this design, rather than a
I think your implementation plan is too complex: observer ordering etc is not needed. Simply swapping to a system set-first approach and removing the end-of-schedule flushes would fix things. Access checking could also be accelerated by doing a two stage check: first for world, then for data within that world. This pairs especially nicely with "maintain multiple copies of the storage infrastructure". With an entity-partitioning flavored solution, you could also allow systems to access data from multiple "worlds" / "partitions" at once, making extraction and other forms of communication much easier.
Fully agree here. You need to have central orchestration. This is a big part of why I don't like our existing "one executor per schedule" model, and why I don't think that |
Beta Was this translation helpful? Give feedback.
-
Brain dump about "parallelized structural changes". This is the point that concerns me most. Most structural ops are clearly world-local; this includes adding and removing components, or any sort of archetype move. Spawns are different if all worlds share an ID space, but assuming you can claim an entity ID in parallel then moving it into an archetype is world-local and therefore also Parallelizable. But what about despawns? When you despawn an entity, you have to do an archetype move in every world before you release the entity ID. Many designs also rely on entities continuing to exist during the runtime of a system, or even between systems. I would expect despawns to work like this in a multiworld scenario:
This would allow us to set up the render world to be "one frame behind" the simulation world, but still share the same entity ID space. Despawning an entity in the sim world would keep it (and it's data) around until the render world was through with it. I suspect there may be a few other operations that are not world-local, for which we will need to defer work to a once-per-frame cross-world sync. Does this seem reasonable? |
Beta Was this translation helpful? Give feedback.
-
Points of interest from discussion with @alice-i-cecile
I agree that it might make sense to frame this as "shards" or "partitions" of a world, rather than changing the semantics of worlds. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
History of Multiworld?
A
World
represents an entire ECS ecosystem. But what if you want multiple ecosystems? This has been a long-requested feature in Bevy. It is possible today to use multiple worlds together. That is in essence whatSubApp
s are. But just about everybody agrees thatSubApp
s andApp
s are pretty clunky. They have inflexible points of communication, can't really be created once an app starts, require mapping entity ids between worlds, etc.As a rough timeline:
A lot has happened, and while just about everybody thinks multiworld should be a reality, no consensus has been reached. This is largely because of Bevy's (good) ideal of keeping PRs relatively small. Multiworld is a HUGE feature. It would take many small PRs to add multiworld, and nobody wants to review (or merge) a PR like this without understanding the endgame. And nobody can understand the endgame without seeing a prototype in code. And round and round we go lol.
I set out to build my own ECS engine to prototype many of my wish-list bevy features that weren't coming any time soon (Multiworld, archetype invariants, systems as entities, trait query, schedule-less, etc). Ultimately, I don't think I want to publish the engine on its own when it would be so much nicer to just get it into Bevy, but doing that is hard. I may change my mind if my vision becomes fundamentally different than Bevy's, and that's ok. But for now, I want to share what I've learned about multiworld in the hopes of Bevy being able to add it soon(er)/eventually.
Why multiworld
This has been discussed many times, so I'll try to keep this brief. Advantages of multiworld as opposed to a single world include:
SubApp
s.SubApp
's extraction happens in sequence. That's fine internally becausebevy
itself only has oneSubApp
, the render world. But for real applications that may have a physics world (at least) as well as others, this is room for improvement.First
,PreUpdate
,Update
, etc. But the schedules for the render world are different. Both need to run in parallel. If we were only using one world,PreUpdate
for game logic could only run onceFirst
for render logic was finished. Lots of bottlenecking potential.There are also some things that multiworld would help with but wouldn't fix. These are related non-advantages:
Resource
. Aside fromTime
and other data, there's very few things that are logically per-world. For example, resources are sometimes used to represent data about the player. That's fine for single player, but breaks down for multiplayer. End users can get around that, but it's much harder for plugins. I would guess that's why some plugins, like input plugins, are available through resource and component APIs. Multiworld isn't a good solution for this though. It would help, but it's impractical to have a separate world per UI menue, per player, etc. A better solution is a more generalizedRes
, but that's a topic for a different discussion.Defining Multiworld
There have been debates over what multiworld really means. It doesn't have to be implemented by literally getting multiple
World
's to communicate. Ultimately, a multiworld solution is an ECS where entities can have multiple components in disjoint storage systems, such that structural changes to one storage can be parallelized with all other storages. I know that's a bit of a mouthful. Put another way, any solution that fully satisfies the 5 advantages of multiworld listed above is a multiworld solution.Multiworld is... awkward
Multiworld solutions have a few fundamental issues. First, it no longer makes sense to think of systems as "per-world". A system can access data from multiple worlds. This is not a problem; it is the solution to advantages 3 and 4. But it means that systems don't really have a world. They can have a base world, where they name other worlds based on those worlds' relation to the base world. But fundamental structures like schedules, executors, etc can no longer work under the very convenient premise that there is only one world.
Access checking is also an issue. It already takes so much work to coordinate systems in parallel that the benefits of parallelism are almost lost. The more we try to parallelize, the more that parallelism costs in overhead. Bevy has recently moved in the direction of "parallelize less, faster" with #19143. Adding something like
MultiworldComponentId
would "parallelize more, slower". This is easily the biggest blocker on multiworld today.Solutions
This is hard. Really hard. But, here's what I've come up with. I just defined "multiworld solution" as satisfying the 5 advantages above, so lets go through them one at a time.
Let's start with advantage 4: parallel schedules. This is the easiest one. The solution: get rid of schedules 😱. Let's go schedule-less. The simplest way to do this is to 1) Buff system sets to allow more ordering configuration there. 2) Allow ordering in observers, etc. 3) Allow a system to queue another system onto the executor as it is running. 4) Let those queued systems also be ordered. 5) Have one, singular, per-app, executor. 6) Make schedules just be the act of queueing to the executor multiple ordered systems. Tada! That's the easy one. Bonus: Combine this with async systems, and the whole update cycle could be a loop with some "await sleep" and "queue
Update
systems" logic. (Oversimplification).Next, lets look at advantage 1: parallelized structural changes. That means having multiple copies of the
Storages
,Archetypes
,Entities
, etc fields ofWorld
. A structural change, (ex: spawn an entity) only needs to access one of these storage groups/sub worlds/worlds/data domains (the name isn't important right now, and will probably depend on how this is implemented). I'll call it a world for now. If there are multiple worlds, structural changes to one world can be parallelized with structural changes to other worlds. The point is, we need multiple copies of the "data storage" part of the ECS.Advantage 2: Splitting logic between worlds (ex: render world, logic/main world, physics world, etc). The implementation challenge here is that the worlds described in the previous paragraph need to communicate. Not only that, but if we want the worlds to be able to store any data, worlds need to have components, not the other way around. (In other words, the same component type may be registered in multiple worlds at the same time under different ids.) That means 2 things. First, the "component" data needs to also be part of these worlds/sub-worlds/storage groups/whatever-its-called. That's easy enough. Second, there must be a way to map entities in one world to entities in another. That's easy with a hashmap, but we can do better. More on that later.
Advantage 5: Adding/removing worlds at runtime. This is also easy on paper. Just make worlds entities. Each world can have child worlds that serve some function for their parent world. Ex: The main world could have a render and physics world. But this has a big problem. As an example, imagine an asset world that stores all the assets for the main world. Well, the main world needs to access that world, but so does the render world. Ideally, the render world shouldn't need to impose a read on the main world (which would block structural changes) just to read from the asset world. This means we need a centralized world storage type. Maybe
App
? It would probably end up having something likeRwLock<HashMap<WorldId, Arc<UnsafeCell<World>>>>
. Then, we store the world id and/or aWeak<UnsafeCell<World>>
as a component on a world. The tricky part here is managing removing the worlds. When the main world ends, so should its render world. I am yet to find an uncompromising solution to this. We would also need to at least warn when tampering with theseWorldLink
components. Designing this is hard, but this is the general idea.Advantage 3: Parallel extraction phase. This is not easy on its own, but with the other 4 issues resolved, we can just make extraction a schedule. Tada! One of the details here is that we would need query joins between world. Ex: "query for component A from world 1 and component A from world 2; then, copy the data from 1 tot 2." This is not too hard to do, but it is worth mentioning. We need to make inter-world communication low friction.
Bonus 1: Remember how I said we could do better than a hashmap for mapping entities between worlds? Consider the case where a world exists purely to mirror another world in order to defer some work (ex the render world just has entities that "mirror" what to render for their corresponding main world entity). Of course, the serving world will also have its own entities. One thing we can do here is entity paging (see #19430). Basically, this lets us efficiently start the entity allocator for a world at any
EntityRow
we like. For example, the main world could allocate entities between dis 0 and 1 million, and the render world could allocate entities between ids 1 million and 2 million. Of course, the numbers could change (we have 4 billion to work with), but the general idea is that, when a world directly mirrors the entities of exactly one other world (very common), start its entity allocator where its source world's allocator (the one it's mirroring) ends. This also means worlds would need to declare their max entity count, but we can have a reasonable default. Then for example, the render world could spawn an entity at the same id as any entity in the main world. Very convinient! There are other advantages of entity paging, but this is the big advantage for multiworld. Practically speaking, this means main and render world entities could share the same id space!Bonus 2: Looking at my solution to advantage 4 again, it means there would be one, per-app system executor. That means we need to include the
WorldId
in the access checks somehow. We could do aWorldComponentId
and parallelize over that. We could also just make communicating between worlds really easy and just parallelize per world instead of per component, which may be even faster. Ultimately, the answer here is benchmarking, and lots of it. We could even make this configurable. There's lots of options here, and we really can't afford a regression in access checking.My solution
For my prototype ECS, what I settled on is communicating between multiple
World
s, where eachWorld
isArc
ed on anApp
, and eachApp
has its own executor. My solution is very much evolving and is not in a sharable state (scarce documentation, rapidly changing details). However, I've done enough work to know that this design should work and will satisfy most, if not every, use case for multiworld.Roadmap
It's going to take a lot of work to get here. A lot of work. And I'm not even sure if "here" is where Bevy wants to go. Lots of discussion needed! That said, the next real steps to making multiworld a reality is schedule-less systems and a per-app executor as explained above. Most of the tricky parts are centered in that implementation, and others have done much more research there than I have. We can also look into entity paging at the same time too.
I'm sharing this because the broad phase design work for multiworld in my ECS is pretty much complete. I'm hoping that sharing these ideas will let Bevy start heading in the direction of multiworld or help Bevy articulate a case against pursuing multiworld further if this is not the direction we want to go. At the end of the day, all this is still very much just an idea, and more discussion will reveal how to proceed from here.
Beta Was this translation helpful? Give feedback.
All reactions