Replies: 1 comment
-
Notes from offline meeting with @alice-i-cecile and Nisan:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to figure out a way to achieve maximum performance when running many (thousands) parallel Bevy
App
instances to train AIs in https://github.com/entity-neural-network/entity-gym-rs.On a high level, three different systems need to run in every iteration:
During deployment, this is very straightforward. "Obs" and "Act" can just be a single system that constructs the observations, calls a function with the observations that returns an action, and applies the action to the game state.
The issue is that when training an AI, the "Act" system performs a blocking call to await actions that are produced by an external Python process. The current solution I have ends up requiring one thread per
App
instance which blocks on a channel once per iteration to communicate with the Python process. Unfortunately, awaiting a channel and context switching between thousands of threads on each tick introduces massive overhead that drastically curtails the maximum achievable throughput.In the ideal architecture, there would be a much smaller number of threads each owning multiple
App
s. Each thread would call just the "Obs" system on everyApp
, perform all synchronization with the Python process in a single batch, and then run the "Act" and "Physics" systems (testing this approach with a non-Bevy game yields > 20x throughput).I haven't managed to find a good way to set something like this up yet. One approximation would be to reorder the systems to "Act" -> "Physics" -> "Obs", which moves the synchronization barrier in between iterations and allow multiple
App
s to be single-stepped by one worker thread. This still has two issues. (1) "Obs" would not be able to observe entities created on that tick. So really we'd like the "Obs" system and the "Act"+Physics" systems to take turns. This seems doable by skipping every other system execution but I'm not sure how to make that ergonomic. (2) I haven't actually found a way to single-step anApp
. There is theScheduleRunnerSettings::run_once()
which seems to basically do what I want, but it only runs the systems on the firstApp::run
call and every subsequent calls do nothing.Another orthogonal issue is that when creating 1024(EDIT: this doesn't seem to happen anymore after upgrading to Bevy 0.8, but also throughput is now half of what it was in 0.7 🤔)App
s, I get a panic inbevy_tasks-0.7.0/src/task_pool.rs:152:22
because the Linux thread limit is exceeded. Raising the thread limit is possible, but it would prevent the crate from working out of the box. Ideally there would be a way to preventBevy
from creating any additional threads.Beta Was this translation helpful? Give feedback.
All reactions