Skip to content

Struggles with implementing opensource pipelines. #953

@KitHaywood

Description

@KitHaywood

HI,

Apols for the troubleshooting in GH, the product is niche and community support isn't extensive. I'm trying to setup a very simple pipeline in docker on my local(ish) machine. Eventually linking it to kafka, but I can't get a very simple one to work.

The smoke-test is to create an impulse connection source and a black hole sink and just see if i can get a pipeline working, by seeing a basic heartbeat of timestamps flowing.

  arroyo-controller:
    image: ghcr.io/arroyosystems/arroyo:0.14.1
    container_name: timon-arroyo-controller
    profiles: ["realtime"]
    environment:
      RUST_LOG: debug
    command: ["controller"]
    ports:
      - "5116:5116"   # gRPC for workers
      - "5114:5114"   # Admin HTTP (no UI; 404 on / is expected)
    depends_on: [kafka]
    networks: [realtime]
    
  arroyo-api:
    image: ghcr.io/arroyosystems/arroyo:0.14.1
    container_name: timon-arroyo-api
    profiles: ["realtime"]
    environment:
      RUST_LOG: debug
      ARROYO__API__BIND_ADDRESS: 0.0.0.0
      ARROYO__API__RUN_HTTP_PORT: "5115"
      ARROYO__CONTROLLER_ENDPOINT: http://arroyo-controller:5116
      ARROYO__COMPILER_ENDPOINT: http://arroyo-compiler:5117
    command: ["api"]
    ports:
      - "5115:5115"   # Web UI + REST API
    depends_on: [arroyo-controller, arroyo-compiler]
    networks: [realtime]
    volumes:
      - ../realtime/schemas/avro:/schemas/avro:ro

  arroyo-compiler:
    image: ghcr.io/arroyosystems/arroyo:0.14.1
    container_name: timon-arroyo-compiler
    profiles: ["realtime"]
    environment:
      RUST_LOG: debug
    command: ["compiler"]
    depends_on: [arroyo-controller]
    networks: [realtime]
    volumes:
      - ../realtime/schemas/avro:/schemas/avro:ro

  arroyo-worker:
    build:
      context: ..
      dockerfile: realtime/arroyo/Dockerfile.worker
    container_name: timon-arroyo-worker
    profiles: ["realtime"]
    environment:
      RUST_LOG: info
      ARROYO__CONTROLLER_ENDPOINT: http://arroyo-controller:5116
    env_file:
      - ../.env
    command: ["worker"]
    depends_on: [kafka, arroyo-controller]
    networks: [realtime]
# Custom Arroyo worker image with Python + project code
FROM ghcr.io/arroyosystems/arroyo:0.14.1

# Install Python 3.11 + pip (Debian/Ubuntu base assumed). If the base changes, swap for equivalent package names.
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-pip python3-venv ca-certificates \
 && rm -rf /var/lib/apt/lists/*

# Set up workspace and copy repo
WORKDIR /app
COPY . /app

# Install Python dependencies for strategy/signaller code
# requirements.txt is the authoritative pinned list (AGENTS.md §6)
RUN python3 -m pip install --no-cache-dir -r requirements.txt

ENV PYTHONPATH=/app

# Keep default entrypoint/cmd from upstream (controller/worker set via compose command)

Above is an excerpt from DC.

Symptoms:

  • Connections creation looks good.
  • Pipeline creation - hangs on launch. Spins forever.
timon-arroyo-controller  | 2025-10-23T13:23:02.394510Z DEBUG h2::codec::framed_write: send frame=Settings { flags: (0x0), initial_window_size: 1048576, max_frame_size: 16384, max_header_list_size: 16384 }
timon-arroyo-controller  | 2025-10-23T13:23:02.394573Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384, max_header_list_size: 16384 }
timon-arroyo-controller  | 2025-10-23T13:23:02.394579Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Settings { flags: (0x1: ACK) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394583Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
timon-arroyo-controller  | 2025-10-23T13:23:02.394588Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 983041 }
timon-arroyo-controller  | 2025-10-23T13:23:02.394614Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Settings { flags: (0x1: ACK) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394619Z DEBUG Connection{peer=Server}: h2::proto::settings: received settings ACK; applying Settings { flags: (0x0), initial_window_size: 1048576, max_frame_size: 16384, max_header_list_size: 16384 }
timon-arroyo-controller  | 2025-10-23T13:23:02.394642Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394674Z DEBUG request{method=POST uri=http://arroyo-controller:5116/arroyo_rpc.ControllerGrpc/JobMetrics version=HTTP/2.0}: tower_http::trace::on_request: started processing request
timon-arroyo-controller  | 2025-10-23T13:23:02.394697Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394703Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394751Z DEBUG request{method=POST uri=http://arroyo-controller:5116/arroyo_rpc.ControllerGrpc/JobMetrics version=HTTP/2.0}: tower_http::trace::on_response: finished processing request latency=0 ms status=5
timon-arroyo-controller  | 2025-10-23T13:23:02.394771Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394892Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(0) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394898Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(1) }
timon-arroyo-controller  | 2025-10-23T13:23:02.394902Z DEBUG Connection{peer=Server}: h2::proto::connection: Connection::poll; connection error error=GoAway(b"", NO_ERROR, Library)
timon-arroyo-controller  | 2025-10-23T13:23:02.394933Z DEBUG tonic::transport::server: failed serving connection: connection error

I get the above logs repeated every 3-5s. Here is what the UI looks like

Image Image

I'm trying to do:

INSERT INTO bh_sink(value)
SELECT CAST(current_time AS TEXT) AS value
FROM impulse_src;

And I am getting

Image

Checks pass. Is it something to do with:

 failed serving connection error

Some additional info if relevant:

Linux version:

Linux Ubuntu-2204-jammy-amd64-base 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Docker version:

Client: Docker Engine - Community
 Version:    28.2.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.36.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 15
  Running: 12
  Paused: 0
  Stopped: 3
 Images: 238
 Server Version: 28.2.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
 runc version: v1.2.5-0-g59923ef
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-144-generic
 Operating System: Ubuntu 22.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 61.91GiB
 Name: Ubuntu-2204-jammy-amd64-base
 ID: b5962977-9f4f-4c15-a10a-0ec8ef5a394c
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false

Container state:

(.venv) kit@Ubuntu-2204-jammy-amd64-base:~/TimonInvestmentModel$ docker ps | grep arroyo
084629c7f52d   ghcr.io/arroyosystems/arroyo:0.14.1   "/app/arroyo api"        32 minutes ago   Up 32 minutes           0.0.0.0:5115->5115/tcp, [::]:5115->5115/tcp                                                          timon-arroyo-api
9f45e386be7e   ghcr.io/arroyosystems/arroyo:0.14.1   "/app/arroyo compiler"   32 minutes ago   Up 32 minutes           5115/tcp                                                                                             timon-arroyo-compiler
35a93e05881a   ghcr.io/arroyosystems/arroyo:0.14.1   "/app/arroyo control…"   32 minutes ago   Up 32 minutes           0.0.0.0:5114->5114/tcp, [::]:5114->5114/tcp, 0.0.0.0:5116->5116/tcp, [::]:5116->5116/tcp, 5115/tcp   timon-arroyo-controller

I've re-upped the stack, I've fiddled with env vars. I just can't get to RUN_ID (needed for the next step). Any assistance would be great, I know I'm close - its going to be something dead simple I just know it. If you need any further, non-sensitive, info - happy to provide. Thank you in advance.

K

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions