GoXStream is an open-source, Flink-inspired stream processing engine written in Goβwith a beautiful React dashboard for visual pipeline design, job submission, and history. Build and run real-time data pipelines with file, database, or Kafka sources and sinks.
- Modular pipeline engine (in Go): Compose pipelines from map, filter, reduce, window, time-window, and more
- Dynamic REST API: Submit pipelines and configure sources, sinks, operators via JSON
- Pluggable sources/sinks: File, Postgres, Kafka (more coming)
- Windowing: Tumbling, sliding, time-based, with watermark and late event support
- Stateful operators and checkpointing (coming soon!)
- React dashboard: Visual DAG pipeline builder (drag/drop), job submission, job history, JSON preview
- Persistent job history (localStorage and soon, backend)



- Go 1.19 or higher
git clone https://github.com/YOUR_GITHUB_USERNAME/goxstream.git
cd goxstream
go mod tidy
Place an example input.csv in the project root for the :
id,name,city
1,Alice,London
2,Bob,Berlin
3,Charlie,Paris
4,David,Berlin
5,Eve,Paris
6,Frank,Paris
7,Grace,Berlin
go run ./cmd/goxstream/main.go
cd goxstream-dashboard
npm install
npm start
Simple Map:
{
"source": { "type": "file", "path": "input.csv" },
"operators": [
{ "type": "map", "params": { "col": "processed", "val": "yes" } }
],
"sink": { "type": "file", "path": "output.csv" }
}
Windowed Reduce:
{
"source": { "type": "file", "path": "input.csv" },
"operators": [
{
"type": "time_window",
"params": {
"duration": "10s",
"inner": {
"type": "reduce",
"params": { "key": "city", "agg": "count" }
}
}
}
],
"sink": { "type": "file", "path": "output.csv" }
}
With Watermark/Late Event Support:
{
"source": { "type": "file", "path": "input.csv" },
"operators": [
{
"type": "time_window_watermark",
"params": {
"duration": "10s",
"allowed_lateness": "5s",
"inner": {
"type": "reduce",
"params": { "key": "city", "agg": "count" }
}
}
}
],
"sink": { "type": "file", "path": "output.csv" }
}
-
GoXStreamβs UI lets you visually build pipelines (drag/drop), edit operator parameters, and export as JSON to run jobs.
-
Submitted jobs and their configs are saved in a beautiful job history.
curl -X POST http://localhost:8080/jobs \
-H "Content-Type: application/json" \
-d '{
"source": { "type": "file", "path": "input.csv" },
"operators": [{ "type": "map", "params": { "col": "processed", "val": "yes" } }],
"sink": { "type": "file", "path": "output.csv" }
}'
Use curl or Postman to submit a dynamic pipeline (example: sliding window reduce): a. Regular time window:
Input.csv file for both below time_window and time_window_watermark
id,name,city,score,timestamp
1,Alice,London,10,2024-07-04T15:00:00Z
2,Bob,Berlin,15,2024-07-04T15:00:05Z
3,Charlie,Paris,12,2024-07-04T15:00:08Z
4,David,Berlin,8,2024-07-04T15:00:12Z
5,Eve,Paris,14,2024-07-04T15:00:16Z
6,Frank,London,11,2024-07-04T15:00:22Z
7,Grace,Berlin,9,2024-07-04T15:00:29Z
8,Harry,Paris,13,2024-07-04T15:00:35Z
9,Ivy,London,17,2024-07-04T15:00:41Z
10,Jack,Berlin,16,2024-07-04T15:00:43Z
a. Time window with reduce event support:
curl -X POST http://localhost:8080/jobs \
-H "Content-Type: application/json" \
-d '{
"source": { "type": "file", "path": "input.csv" },
"operators": [
{
"type": "time_window",
"params": {
"duration": "10s",
"inner": {
"type": "reduce",
"params": { "key": "city", "agg": "count" }
}
}
}
],
"sink": { "type": "file", "path": "output.csv" }
}'
b. Time window with watermark/late event support:
curl -X POST http://localhost:8080/jobs \
-H "Content-Type: application/json" \
-d '{
"source": { "type": "file", "path": "input.csv" },
"operators": [
{
"type": "time_window_watermark",
"params": {
"duration": "10s",
"allowed_lateness": "5s",
"inner": {
"type": "reduce",
"params": { "key": "city", "agg": "count" }
}
}
}
],
"sink": { "type": "file", "path": "output.csv" }
}'
Your results will be in output.csv with a window_id column.
city,count,window_end,window_id
London,1,2024-07-04T15:00:10Z,1
Berlin,1,2024-07-04T15:00:10Z,1
Paris,1,2024-07-04T15:00:10Z,1
...
[Source] --> [Map] --> [Filter] --> [Window/Reduce] --> [Sink]
| | | | |
[File] [Add/Transform] [Select] [Tumbling/Sliding] [File]
Operators: Implemented as Go interfaces, dynamically composed at runtime.
REST API: Accepts JSON job specs, launches pipeline as background Go routines.
A pipeline is defined by a simple JSON:
{
"source": {"type": "file", "path": "input.csv"},
"operators": [
{"type": "map", "params": {"col": "processed", "val": "yes"}},
{"type": "filter", "params": {"field": "city", "eq": "Berlin"}},
{
"type": "sliding_window",
"params": {
"size": 3,
"step": 1,
"inner": {
"type": "reduce",
"params": {"key": "city", "agg": "count"}
}
}
}
],
"sink": {"type": "file", "path": "output.csv"}
}
| Type | Description | Example Params |
| ---------------- | ------------------------ | ------------------------------------- |
| map | Add or transform columns | `col`, `val` |
| filter | Filter rows by condition | `field`, `eq` |
| reduce | Aggregate/group by field | `key`, `agg` (`count`, future: `sum`) |
| tumbling\_window | Non-overlapping windows | `size`, `inner` |
| sliding\_window | Overlapping windows | `size`, `step`, `inner` |
Add New Operators: Implement the Operator interface and add a factory to the operator registry.
Support New Sources/Sinks: Implement a Source or Sink interface in internal/source or internal/sink.
React UI Integration: Planned for interactive pipeline creation and monitoring.
-
Dynamic operator/source/sink registry
-
File/DB/Kafka connectors
-
REST API & React UI for pipeline design and monitoring
-
Visual DAG editor with drag/drop (reactflow)
-
Windowing and watermark support
-
Persistent job history (localStorage)
-
Checkpoint and fault-tolerance (coming next)
-
Backend job monitoring/status APIs
-
Multi-job/cluster execution
-
More analytics and ML operators
PRs, issues, and ideas are welcome! Fork and submit improvements or new featuresβletβs build a great Go stream engine together!