Skip to content

Commit a3d66da

Browse files
authored
[docs/data] Fix shuffle section wording (#51289)
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
1 parent 3d30db2 commit a3d66da

File tree

1 file changed

+3
-9
lines changed

1 file changed

+3
-9
lines changed

doc/source/data/shuffling-data.rst

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,8 @@ To perform block order shuffling, use :meth:`randomize_block_order <ray.data.Dat
9191
# Randomize the block order of this dataset.
9292
ds = ds.randomize_block_order()
9393

94-
Shuffle all rows
95-
~~~~~~~~~~~~~~~~
94+
Shuffle all rows (Global shuffle)
95+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9696

9797
To randomly shuffle all rows globally, call :meth:`~ray.data.Dataset.random_shuffle`.
9898
This is the slowest option for shuffle, and requires transferring data across
@@ -128,13 +128,7 @@ to data transfer costs. This cost can be prohibitive when using very large datas
128128

129129
The best route for determining the best tradeoff between preprocessing time and cost and
130130
per-epoch shuffle quality is to measure the precision gain per training step for your
131-
particular model under different shuffling policies:
132-
133-
* no shuffling,
134-
* local (per-shard) limited-memory shuffle buffer,
135-
* local (per-shard) shuffling,
136-
* windowed (pseudo-global) shuffling, and
137-
* fully global shuffling.
131+
particular model under different shuffling policies such as no shuffling, local shuffling, or global shuffling.
138132

139133
As long as your data loading and shuffling throughput is higher than your training throughput, your GPU should
140134
be saturated. If you have shuffle-sensitive models, push the

0 commit comments

Comments
 (0)