You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Data] Implement forceful releasing of actors upon shutdown of StreamingExecutor (#51769)
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->
## Why are these changes needed?
We've recently run into the issue where we had
1. Large pipeline execution was triggered in *tight* succession (one
after another, immediately)
2. We had N GPUs available and all N used by the ActorPool
3. GPUs not being released in time before next execution begins
4. Subsequent dataset execution times out after 10m not being able to
get the required GPUs
Changes
---
1. Added `force` param to `PhysicalOperator.shutdown` method
2. Revisited pending/running actors release seq to kill these if it's a
forced shutdown
3. Made sure shutdown seq awaits `on_exit` callback returning
4. Cleaned up a bunch of dead code
## Related issue number
<!-- For example: "Closes#1234" -->
## Checks
- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
---------
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
0 commit comments