|
3 | 3 | """
|
4 | 4 | This example shows how to use Ray Data for data parallel batch inference.
|
5 | 5 |
|
6 |
| -Ray Data is a data processing framework that can handle large datasets |
7 |
| -and integrates tightly with vLLM for data-parallel inference. |
8 |
| -
|
9 |
| -As of Ray 2.44, Ray Data has a native integration with |
10 |
| -vLLM (under ray.data.llm). |
| 6 | +Ray Data is a data processing framework that can process very large datasets |
| 7 | +with first-class support for vLLM. |
11 | 8 |
|
12 | 9 | Ray Data provides functionality for:
|
13 |
| -* Reading and writing to cloud storage (S3, GCS, etc.) |
14 |
| -* Automatic sharding and load-balancing across a cluster |
15 |
| -* Optimized configuration of vLLM using continuous batching |
16 |
| -* Compatible with tensor/pipeline parallel inference as well. |
| 10 | +* Reading and writing to most popular file formats and cloud object storage. |
| 11 | +* Streaming execution, so you can run inference on datasets that far exceed |
| 12 | + the aggregate RAM of the cluster. |
| 13 | +* Scale up the workload without code changes. |
| 14 | +* Automatic sharding, load-balancing, and autoscaling across a Ray cluster, |
| 15 | + with built-in fault-tolerance and retry semantics. |
| 16 | +* Continuous batching that keeps vLLM replicas saturated and maximizes GPU |
| 17 | + utilization. |
| 18 | +* Compatible with tensor/pipeline parallel inference. |
17 | 19 |
|
18 | 20 | Learn more about Ray Data's LLM integration:
|
19 | 21 | https://docs.ray.io/en/latest/data/working-with-llms.html
|
|
0 commit comments