This project serves as a rough-and-ready guide for hackathon participants by demonstrating how to stream LLM responses from a BAML-powered backend to a Vite-based frontend UI, using Express for the server.
The API exposes a generic endpoint at /baml/stream/:fn
that can be called with the name of your exported BAML function as well as an args
param, the value of which is a stringified and escaped object representing your args. This is to enable you to rapidly add new functions with new arguments quickly without refactoring the server. It is not, however, type safe so in production you'd normally call b.ExtractResume
explicitly for example.
Before you begin, ensure you have the following installed:
- Node.js (v18 or higher recommended)
- npm, pnpm, or yarn (this guide will use pnpm as an example)
You will also need an OpenAI API key.
-
Clone the repository:
git clone <repository-url> cd <repository-name>
-
Install dependencies: This project uses pnpm for package management.
pnpm install
If you prefer npm or yarn:
# Using npm npm install # Using yarn yarn install
-
Set up environment variables:
Set up your API key by running this in the terminal session your server will run:
export OPENAI_API_KEY="your_openai_api_key_here"
The backend server also uses
BAML_SERVER_PORT
, which defaults to3002
if not set. You can override it in the.env
file if needed:BAML_SERVER_PORT=3003
-
Generate BAML client: Before running the application, you might need to generate the BAML client based on your BAML source files (
baml_src/
).pnpm baml:generate
You'll need to run two separate processes: the backend server and the frontend development server.
-
Start the Backend Server: Open a terminal and run:
pnpm dev:server
This will start the Express server (defaults to
http://localhost:3002
). The server handles BAML function calls and streams responses. -
Start the Frontend Development Server: In a separate terminal, run:
pnpm dev:frontend
This will start the Vite development server (usually on
http://localhost:5173
or the next available port).
Once both servers are running, you can open your browser to the address provided by the Vite development server to see the application in action.
- BAML (
baml_src/
,baml_client/
): Defines the AI functions and handles the logic for interacting with language models. The generatedbaml_client
is used by the server to call these functions. - Express (
src/server/index.ts
): Provides the backend API endpoints. The key endpoint is/baml/stream/:fn
, which takes a BAML function name (fn
) and arguments, and streams the partial responses back to the client using Server-Sent Events (SSE). - Vite + React (
src/
(non-server parts),index.html
): Powers the frontend user interface. It will make requests to the Express server to invoke BAML functions and display the streaming data.
The backend server (src/server/index.ts
) exposes a generic endpoint /baml/stream/:fn
(available via both GET and POST).
- When a request hits this endpoint, the server uses the
BamlStream
from the@boundaryml/baml
library to call the specified BAML function (:fn
) with the provided arguments. - It then iterates over the stream of partial results. Each partial result is sent to the client as a Server-Sent Event (SSE).
- The frontend client will listen to these SSE events to update the UI in real-time as data arrives.
- Once the stream is complete, a final event with the complete response can also be sent.
This setup allows for a responsive user experience, as users don't have to wait for the entire AI response to complete before seeing results.
I've found it sometimes doesn't work in Brave for reasons I've not had time to debug. Chrome does appear to work however.