bentoml · alvidofaisal · May 31, 2025 · May 31, 2025
@@ -0,0 +1,153 @@
+# gRPC Streaming with BentoML (v1alpha1)
+
+BentoML supports gRPC streaming, allowing for efficient, long-lived communication channels between clients and servers. This guide demonstrates how to define, implement, and use gRPC streaming services with BentoML's `v1alpha1` gRPC protocol.
+
+This `v1alpha1` protocol is an initial version focused on bi-directional streaming where the client sends a single message and the server responds with a stream of messages.
+
+## 1. Defining the Service (.proto)
+
+First, define your service and messages using Protocol Buffers. For the `v1alpha1` streaming interface, BentoML provides a specific service definition. If you were building custom services beyond the default `BentoService`, you'd create your own `.proto` similar to this.
+
+The core `v1alpha1` service used internally by BentoML is defined in `src/bentoml/grpc/v1alpha1/bentoml_service_v1alpha1.proto`:
+
+```protobuf
+syntax = "proto3";
+
+package bentoml.grpc.v1alpha1;
+
+// The BentoService service definition.
+service BentoService {
+  // A streaming RPC method that accepts a Request message
+  // and returns a stream of Response messages.
+  rpc CallStream(Request) returns (stream Response) {}
+}
+
+// The request message containing the input data.
+message Request {
+  string data = 1;
+}
+
+// The response message containing the output data.
+message Response {
+  string data = 1;
+}
+```
+
+Key aspects:
+- `service BentoService`: Defines the service name.
+- `rpc CallStream(Request) returns (stream Response) {}`: This declares a server-streaming RPC method. The client sends a single `Request`, and the server replies with a stream of `Response` messages.
+
+After defining your `.proto` file, you need to generate the Python gRPC stubs:
+```bash
+pip install grpcio-tools
+python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. your_service.proto
+```
+For BentoML's internal `v1alpha1` service, these stubs (`bentoml_service_v1alpha1_pb2.py` and `bentoml_service_v1alpha1_pb2_grpc.py`) are already generated and included.
+
+## 2. Implementing the Server-Side Streaming Logic
+
+You implement the server-side logic by creating a class that inherits from the generated `YourServiceServicer` (e.g., `BentoServiceServicer` for the internal service) and overriding the streaming methods.
+
+Here's how the internal `BentoServiceImpl` for `v1alpha1` is structured (simplified from `src/bentoml/grpc/v1alpha1/server.py`):
+
+```python
+import asyncio
+import grpc
+# Assuming stubs are generated in 'generated' directory or available in path
+from bentoml.grpc.v1alpha1 import bentoml_service_v1alpha1_pb2 as pb
+from bentoml.grpc.v1alpha1 import bentoml_service_v1alpha1_pb2_grpc as services
+
+class BentoServiceImpl(services.BentoServiceServicer):
+    async def CallStream(self, request: pb.Request, context: grpc.aio.ServicerContext) -> pb.Response:
+        """
+        Example CallStream implementation.
+        Receives a Request and yields a stream of Response messages.
+        """
+        print(f"CallStream received: {request.data}")
+        for i in range(5): # Example: send 5 messages
+            response_data = f"Response {i+1} for '{request.data}'"
+            print(f"Sending: {response_data}")
+            await asyncio.sleep(0.5) # Simulate work
+            yield pb.Response(data=response_data)
+        print("CallStream finished.")
+
+# To run this service (example standalone server):
+async def run_server(port=50051):
+    server = grpc.aio.server()
+    services.add_BentoServiceServicer_to_server(BentoServiceImpl(), server)
+    server.add_insecure_port(f"[::]:{port}")
+    await server.start()
+    print(f"gRPC server started on port {port}")
+    await server.wait_for_termination()
+
+if __name__ == "__main__":
+    asyncio.run(run_server())
+```
+
+When integrating with `bentoml serve-grpc`, BentoML handles running the gRPC server. You need to ensure your service implementation is correctly picked up, which is done by modifying `Service.get_grpc_servicer` if you are customizing the main BentoService, or by mounting your own servicer for custom services. For the `v1alpha1` protocol, BentoML's `Service` class is already configured to use this `BentoServiceImpl`.
+
+## 3. Using the BentoMlGrpcClient (v1alpha1)
+
+BentoML provides a client SDK to interact with the `v1alpha1` gRPC streaming service.
+
+Example usage (from `src/bentoml/grpc/v1alpha1/client.py`):
+```python
+import asyncio
+from bentoml.grpc.v1alpha1.client import BentoMlGrpcClient
+
+async def main():
+    client = BentoMlGrpcClient(host="localhost", port=50051)
+
+    input_data = "Hello Streaming World"
+    print(f"Calling CallStream with data: '{input_data}'")
+
+    try:
+        idx = 0
+        async for response in client.call_stream(data=input_data):
+            print(f"Received from stream (message {idx}): {response.data}")
+            idx += 1
+    except Exception as e:
+        print(f"An error occurred: {e}")
+    finally:
+        await client.close()
+        print("Client connection closed.")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+The `client.call_stream(data=...)` method returns an asynchronous iterator that yields `Response` messages from the server.
+
+## 4. Using the `call-grpc-stream` CLI Command
+
+BentoML provides a CLI command to easily test and interact with `v1alpha1` gRPC streaming services.
+
+**Command Syntax:**
+```bash
+bentoml call-grpc-stream --host <hostname> --port <port_number> --data "<your_request_data>"
+```
+
+**Example:**
+Assuming your BentoML gRPC server (with `v1alpha1` protocol) is running on `localhost:50051`:
+```bash
+bentoml call-grpc-stream --host localhost --port 50051 --data "Test Message from CLI"
+```
+
+Output will be similar to:
+```
+Connecting to gRPC server at localhost:50051...
+Sending data: 'Test Message from CLI' to CallStream...
+--- Streamed Responses ---
+Response 1 for 'Test Message from CLI'
+Response 2 for 'Test Message from CLI'
+Response 3 for 'Test Message from CLI'
+... (based on server implementation) ...
+------------------------
+Connection closed.
+```
+
+This CLI command uses the `BentoMlGrpcClient` internally.
+
+## Summary
+
+The `v1alpha1` gRPC streaming support in BentoML provides a foundation for building services that require persistent, streamed communication. By defining services in `.proto` files, implementing the server-side logic, and using the provided client SDK or CLI, you can leverage gRPC streaming in your BentoML applications. Remember that this `v1alpha1` version is specific to a client-sends-one, server-streams-many interaction pattern for the main `BentoService`. For more complex gRPC patterns (client-streaming, bidirectional-streaming for custom services), you would define those in your own `.proto` files and implement corresponding servicers and clients.
+```
@@ -176,6 +176,7 @@ For release notes and detailed changelogs, see the `Releases <https://github.com
    build-with-bentoml/lifecycle-hooks
    build-with-bentoml/asgi
    build-with-bentoml/streaming
+   guides/grpc_streaming
    build-with-bentoml/websocket
    build-with-bentoml/gradio
    build-with-bentoml/observability/index

@@ -0,0 +1,80 @@
+# BentoML Custom gRPC Streaming Example
+
+This example demonstrates how to define, implement, and serve a custom gRPC streaming service with BentoML. The service implements a simple "chat" style interaction where the client sends a message and the server streams back a series of responses.
+
+## Files
+
+- `protos/example_service.proto`: Protocol buffer definition for the `SimpleStreamingService`.
+- `service.py`: BentoML service implementation that includes the gRPC servicer for `SimpleStreamingService`.
+- `bentofile.yaml`: BentoML build configuration file.
+- `client_example.py`: A Python script demonstrating how to call the gRPC streaming service.
+- `generated/`: Directory containing Python stubs generated from `example_service.proto`.
+
+## Prerequisites
+
+- Python 3.8+
+- BentoML installed (`pip install bentoml`)
+- gRPC tools (`pip install grpcio grpcio-tools`)
+
+## Setup
+
+1.  **Generate gRPC Stubs**:
+    Navigate to the `examples/grpc_streaming` directory and run:
+    ```bash
+    mkdir generated
+    python -m grpc_tools.protoc -Iprotos --python_out=generated --grpc_python_out=generated protos/example_service.proto
+    # Create __init__.py files to make them importable
+    touch generated/__init__.py
+    ```
+    This will generate `example_service_pb2.py` and `example_service_pb2_grpc.py` in the `generated` directory.
+
+## Running the Example
+
+1.  **Serve the BentoML Service**:
+    From the `examples/grpc_streaming` directory:
+    ```bash
+    bentoml serve service:svc --reload
+    ```
+    This will start the BentoML server, which by default includes a gRPC server on a port like `50051` (or the next available one, check console output, typically it is `[::]:<port specified in bentoml_config_options.yml or default>`). For this example, we assume the gRPC server runs on the default port that `bentoml serve` would expose if not configured, which might require checking BentoML's default configuration or explicitly setting it.
+    *Note: `bentoml serve` starts HTTP server by default. For gRPC, you usually use `bentoml serve-grpc`. However, BentoML services can expose both. We will use `mount_grpc_servicer` which should make it available via the main gRPC server that `serve-grpc` would typically manage.*
+
+    To ensure it uses a known gRPC port (e.g., 50051 if not default for `serve`), you might run:
+    ```bash
+    bentoml serve service:svc --reload --grpc-port 50051
+    # Or more explicitly for gRPC focus:
+    # bentoml serve-grpc service:svc --reload --port 50051
+    ```
+    Check the output from `bentoml serve` for the actual gRPC port if you don't specify one. For this example, `client_example.py` assumes `localhost:50051`.
+
+2.  **Run the Client**:
+    In a new terminal, from the `examples/grpc_streaming` directory:
+    ```bash
+    python client_example.py
+    ```
+
+## Expected Output (Client)
+
+```
+Client sending: Hello, stream!
+Server says: Response 1 to 'Hello, stream!'
+Server says: Response 2 to 'Hello, stream!'
+Server says: Response 3 to 'Hello, stream!'
+Server says: Response 4 to 'Hello, stream!'
+Server says: Response 5 to 'Hello, stream!'
+Stream finished.
+```
+
+## How it Works
+
+-   **`example_service.proto`**: Defines a `SimpleStreamingService` with a server-streaming RPC method `Chat`.
+-   **`service.py`**:
+    -   Implements `SimpleStreamingServicerImpl` which provides the logic for the `Chat` method.
+    -   Creates a BentoML `Service` named `custom_grpc_stream_example`.
+    -   Mounts the `SimpleStreamingServicerImpl` to the BentoML service instance. When this BentoML service is run with gRPC enabled, the custom gRPC service will be available.
+-   **`client_example.py`**:
+    -   Uses `grpc.insecure_channel` to connect to the server.
+    -   Creates a stub for `SimpleStreamingService`.
+    -   Calls the `Chat` method and iterates over the streamed responses.
+
+This example showcases how to integrate custom gRPC services with streaming capabilities within the BentoML framework.
+```
@@ -0,0 +1,21 @@
+service: "service:svc"
+name: "custom_grpc_stream_example"
+version: "0.1.0"
+
+description: "A BentoML example showcasing custom gRPC streaming services."
+
+# Ensure generated stubs are included if building a bento
+# For local development (bentoml serve), Python's import system will find them
+# if they are in the PYTHONPATH (e.g., in the same directory or an installed package).
+# If you build this into a Bento, you'd want to ensure 'generated' is included.
+include:
+  - "*.py"
+  - "generated/*.py"
+  - "protos/*.proto"
+
+python:
+  packages:
+    - grpcio
+    - grpcio-tools # For local stub generation, not strictly needed by the Bento itself at runtime
+    - bentoml
+```
@@ -0,0 +1,50 @@
+import asyncio
+import time
+import uuid
+
+import grpc
+
+# Import generated gRPC stubs
+from generated import example_service_pb2
+from generated import example_service_pb2_grpc
+
+
+async def run_client():
+    # Target server address
+    target_address = (
+        "localhost:50051"  # Default gRPC port for BentoML, adjust if necessary
+    )
+
+    # Create a channel
+    async with grpc.aio.insecure_channel(target_address) as channel:
+        # Create a stub (client)
+        stub = example_service_pb2_grpc.SimpleStreamingServiceStub(channel)
+
+        # Prepare a request message
+        client_message_text = "Hello, stream!"
+        request_message = example_service_pb2.ChatMessage(
+            message_id=str(uuid.uuid4()),  # Unique ID for the message
+            text=client_message_text,
+            timestamp=int(time.time() * 1000),
+        )
+
+        print(f"Client sending: {client_message_text}")
+
+        try:
+            # Call the Chat RPC method and iterate through the streamed responses
+            async for response in stub.Chat(request_message):
+                print(
+                    f"Server says: {response.text} (ID: {response.message_id}, TS: {response.timestamp})"
+                )
+
+            print("Stream finished.")
+
+        except grpc.aio.AioRpcError as e:
+            print(f"gRPC call failed: {e.code()} - {e.details()}")
+        except Exception as e:
+            print(f"An error occurred: {e}")
+
+
+if __name__ == "__main__":
+    print("Starting gRPC client example...")
+    asyncio.run(run_client())