Skip to content

feat: Add gRPC v1alpha1 streaming support, client SDK, and benchmarks #5377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions docs/source/guides/grpc_streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# gRPC Streaming with BentoML (v1alpha1)

BentoML supports gRPC streaming, allowing for efficient, long-lived communication channels between clients and servers. This guide demonstrates how to define, implement, and use gRPC streaming services with BentoML's `v1alpha1` gRPC protocol.

This `v1alpha1` protocol is an initial version focused on bi-directional streaming where the client sends a single message and the server responds with a stream of messages.

## 1. Defining the Service (.proto)

First, define your service and messages using Protocol Buffers. For the `v1alpha1` streaming interface, BentoML provides a specific service definition. If you were building custom services beyond the default `BentoService`, you'd create your own `.proto` similar to this.

The core `v1alpha1` service used internally by BentoML is defined in `src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto`:

```protobuf
syntax = "proto3";

package bentoml.grpc.v1alpha1;

// The BentoService service definition.
service BentoService {
// A streaming RPC method that accepts a Request message
// and returns a stream of Response messages.
rpc CallStream(Request) returns (stream Response) {}
}

// The request message containing the input data.
message Request {
string data = 1;
}

// The response message containing the output data.
message Response {
string data = 1;
}
```

Key aspects:
- `service BentoService`: Defines the service name.
- `rpc CallStream(Request) returns (stream Response) {}`: This declares a server-streaming RPC method. The client sends a single `Request`, and the server replies with a stream of `Response` messages.

After defining your `.proto` file, you need to generate the Python gRPC stubs:
```bash
pip install grpcio-tools
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. your_service.proto
```
For BentoML's internal `v1alpha1` service, these stubs (`bentoml_service_v1alpha1_pb2.py` and `bentoml_service_v1alpha1_pb2_grpc.py`) are already generated and included.

## 2. Implementing the Server-Side Streaming Logic

You implement the server-side logic by creating a class that inherits from the generated `YourServiceServicer` (e.g., `BentoServiceServicer` for the internal service) and overriding the streaming methods.

Here's how the internal `BentoServiceImpl` for `v1alpha1` is structured (simplified from `src/bentoml/grpc/v1alpha1/server.py`):

```python
import asyncio
import grpc
# Assuming stubs are generated in 'generated' directory or available in path
from bentoml.grpc.v1alpha1 import bentoml_service_v1alpha1_pb2 as pb
from bentoml.grpc.v1alpha1 import bentoml_service_v1alpha1_pb2_grpc as services

class BentoServiceImpl(services.BentoServiceServicer):
async def CallStream(self, request: pb.Request, context: grpc.aio.ServicerContext) -> pb.Response:
"""
Example CallStream implementation.
Receives a Request and yields a stream of Response messages.
"""
print(f"CallStream received: {request.data}")
for i in range(5): # Example: send 5 messages
response_data = f"Response {i+1} for '{request.data}'"
print(f"Sending: {response_data}")
await asyncio.sleep(0.5) # Simulate work
yield pb.Response(data=response_data)
print("CallStream finished.")

# To run this service (example standalone server):
async def run_server(port=50051):
server = grpc.aio.server()
services.add_BentoServiceServicer_to_server(BentoServiceImpl(), server)
server.add_insecure_port(f"[::]:{port}")
await server.start()
print(f"gRPC server started on port {port}")
await server.wait_for_termination()

if __name__ == "__main__":
asyncio.run(run_server())
```

When integrating with `bentoml serve-grpc`, BentoML handles running the gRPC server. You need to ensure your service implementation is correctly picked up, which is done by modifying `Service.get_grpc_servicer` if you are customizing the main BentoService, or by mounting your own servicer for custom services. For the `v1alpha1` protocol, BentoML's `Service` class is already configured to use this `BentoServiceImpl`.

## 3. Using the BentoMlGrpcClient (v1alpha1)

BentoML provides a client SDK to interact with the `v1alpha1` gRPC streaming service.

Example usage (from `src/bentoml/grpc/v1alpha1/client.py`):
```python
import asyncio
from bentoml.grpc.v1alpha1.client import BentoMlGrpcClient

async def main():
client = BentoMlGrpcClient(host="localhost", port=50051)

input_data = "Hello Streaming World"
print(f"Calling CallStream with data: '{input_data}'")

try:
idx = 0
async for response in client.call_stream(data=input_data):
print(f"Received from stream (message {idx}): {response.data}")
idx += 1
except Exception as e:
print(f"An error occurred: {e}")
finally:
await client.close()
print("Client connection closed.")

if __name__ == "__main__":
asyncio.run(main())
```
The `client.call_stream(data=...)` method returns an asynchronous iterator that yields `Response` messages from the server.

## 4. Using the `call-grpc-stream` CLI Command

BentoML provides a CLI command to easily test and interact with `v1alpha1` gRPC streaming services.

**Command Syntax:**
```bash
bentoml call-grpc-stream --host <hostname> --port <port_number> --data "<your_request_data>"
```

**Example:**
Assuming your BentoML gRPC server (with `v1alpha1` protocol) is running on `localhost:50051`:
```bash
bentoml call-grpc-stream --host localhost --port 50051 --data "Test Message from CLI"
```

Output will be similar to:
```
Connecting to gRPC server at localhost:50051...
Sending data: 'Test Message from CLI' to CallStream...
--- Streamed Responses ---
Response 1 for 'Test Message from CLI'
Response 2 for 'Test Message from CLI'
Response 3 for 'Test Message from CLI'
... (based on server implementation) ...
------------------------
Connection closed.
```

This CLI command uses the `BentoMlGrpcClient` internally.

## Summary

The `v1alpha1` gRPC streaming support in BentoML provides a foundation for building services that require persistent, streamed communication. By defining services in `.proto` files, implementing the server-side logic, and using the provided client SDK or CLI, you can leverage gRPC streaming in your BentoML applications. Remember that this `v1alpha1` version is specific to a client-sends-one, server-streams-many interaction pattern for the main `BentoService`. For more complex gRPC patterns (client-streaming, bidirectional-streaming for custom services), you would define those in your own `.proto` files and implement corresponding servicers and clients.
```
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ For release notes and detailed changelogs, see the `Releases <https://github.com
build-with-bentoml/lifecycle-hooks
build-with-bentoml/asgi
build-with-bentoml/streaming
guides/grpc_streaming
build-with-bentoml/websocket
build-with-bentoml/gradio
build-with-bentoml/observability/index
Expand Down
80 changes: 80 additions & 0 deletions examples/grpc_streaming/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# BentoML Custom gRPC Streaming Example

This example demonstrates how to define, implement, and serve a custom gRPC streaming service with BentoML. The service implements a simple "chat" style interaction where the client sends a message and the server streams back a series of responses.

## Files

- `protos/example_service.proto`: Protocol buffer definition for the `SimpleStreamingService`.
- `service.py`: BentoML service implementation that includes the gRPC servicer for `SimpleStreamingService`.
- `bentofile.yaml`: BentoML build configuration file.
- `client_example.py`: A Python script demonstrating how to call the gRPC streaming service.
- `generated/`: Directory containing Python stubs generated from `example_service.proto`.

## Prerequisites

- Python 3.8+
- BentoML installed (`pip install bentoml`)
- gRPC tools (`pip install grpcio grpcio-tools`)

## Setup

1. **Generate gRPC Stubs**:
Navigate to the `examples/grpc_streaming` directory and run:
```bash
mkdir generated
python -m grpc_tools.protoc -Iprotos --python_out=generated --grpc_python_out=generated protos/example_service.proto
# Create __init__.py files to make them importable
touch generated/__init__.py
```
This will generate `example_service_pb2.py` and `example_service_pb2_grpc.py` in the `generated` directory.

## Running the Example

1. **Serve the BentoML Service**:
From the `examples/grpc_streaming` directory:
```bash
bentoml serve service:svc --reload
```
This will start the BentoML server, which by default includes a gRPC server on a port like `50051` (or the next available one, check console output, typically it is `[::]:<port specified in bentoml_config_options.yml or default>`). For this example, we assume the gRPC server runs on the default port that `bentoml serve` would expose if not configured, which might require checking BentoML's default configuration or explicitly setting it.
*Note: `bentoml serve` starts HTTP server by default. For gRPC, you usually use `bentoml serve-grpc`. However, BentoML services can expose both. We will use `mount_grpc_servicer` which should make it available via the main gRPC server that `serve-grpc` would typically manage.*

To ensure it uses a known gRPC port (e.g., 50051 if not default for `serve`), you might run:
```bash
bentoml serve service:svc --reload --grpc-port 50051
# Or more explicitly for gRPC focus:
# bentoml serve-grpc service:svc --reload --port 50051
```
Check the output from `bentoml serve` for the actual gRPC port if you don't specify one. For this example, `client_example.py` assumes `localhost:50051`.

2. **Run the Client**:
In a new terminal, from the `examples/grpc_streaming` directory:
```bash
python client_example.py
```

## Expected Output (Client)

```
Client sending: Hello, stream!
Server says: Response 1 to 'Hello, stream!'
Server says: Response 2 to 'Hello, stream!'
Server says: Response 3 to 'Hello, stream!'
Server says: Response 4 to 'Hello, stream!'
Server says: Response 5 to 'Hello, stream!'
Stream finished.
```

## How it Works

- **`example_service.proto`**: Defines a `SimpleStreamingService` with a server-streaming RPC method `Chat`.
- **`service.py`**:
- Implements `SimpleStreamingServicerImpl` which provides the logic for the `Chat` method.
- Creates a BentoML `Service` named `custom_grpc_stream_example`.
- Mounts the `SimpleStreamingServicerImpl` to the BentoML service instance. When this BentoML service is run with gRPC enabled, the custom gRPC service will be available.
- **`client_example.py`**:
- Uses `grpc.insecure_channel` to connect to the server.
- Creates a stub for `SimpleStreamingService`.
- Calls the `Chat` method and iterates over the streamed responses.

This example showcases how to integrate custom gRPC services with streaming capabilities within the BentoML framework.
```
21 changes: 21 additions & 0 deletions examples/grpc_streaming/bentofile.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
service: "service:svc"
name: "custom_grpc_stream_example"
version: "0.1.0"

description: "A BentoML example showcasing custom gRPC streaming services."

# Ensure generated stubs are included if building a bento
# For local development (bentoml serve), Python's import system will find them
# if they are in the PYTHONPATH (e.g., in the same directory or an installed package).
# If you build this into a Bento, you'd want to ensure 'generated' is included.
include:
- "*.py"
- "generated/*.py"
- "protos/*.proto"

python:
packages:
- grpcio
- grpcio-tools # For local stub generation, not strictly needed by the Bento itself at runtime
- bentoml
```
50 changes: 50 additions & 0 deletions examples/grpc_streaming/client_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import asyncio
import time
import uuid

import grpc

# Import generated gRPC stubs
from generated import example_service_pb2
from generated import example_service_pb2_grpc


async def run_client():
# Target server address
target_address = (
"localhost:50051" # Default gRPC port for BentoML, adjust if necessary
)

# Create a channel
async with grpc.aio.insecure_channel(target_address) as channel:
# Create a stub (client)
stub = example_service_pb2_grpc.SimpleStreamingServiceStub(channel)

# Prepare a request message
client_message_text = "Hello, stream!"
request_message = example_service_pb2.ChatMessage(
message_id=str(uuid.uuid4()), # Unique ID for the message
text=client_message_text,
timestamp=int(time.time() * 1000),
)

print(f"Client sending: {client_message_text}")

try:
# Call the Chat RPC method and iterate through the streamed responses
async for response in stub.Chat(request_message):
print(
f"Server says: {response.text} (ID: {response.message_id}, TS: {response.timestamp})"
)

print("Stream finished.")

except grpc.aio.AioRpcError as e:
print(f"gRPC call failed: {e.code()} - {e.details()}")
except Exception as e:
print(f"An error occurred: {e}")


if __name__ == "__main__":
print("Starting gRPC client example...")
asyncio.run(run_client())
Empty file.
35 changes: 35 additions & 0 deletions examples/grpc_streaming/generated/example_service_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading