Skip to content

healthz endpoint returns "NOT OK" when --events or --policy is used (unless default events are enabled) #5000

@hsandhu2309

Description

@hsandhu2309

Description

The /healthz endpoint reports "NOT OK" when Tracee is started with custom events (e.g., --events anti_debugging) or any --policy, even though the server is running correctly. It only returns "OK" when the default event set is used — which includes the internal heartbeat_event.

This makes the health check unreliable for non-default configurations.


Steps to Reproduce

  1. Build Tracee in a clean environment:

    make -f builder/Makefile.tracee-make ubuntu-shell
    make all
  2. Start Tracee with healthz and a custom event:

    sudo ./dist/tracee --server healthz --events anti_debugging
  3. Query the health endpoint:

    curl localhost:3366/healthz

    Result: NOT OK

  4. Now start with default events:

    sudo ./dist/tracee --server healthz --events default
    curl localhost:3366/healthz

    Result: OK

  5. Same failure occurs with any --policy flag:

    sudo ./dist/tracee --server healthz --policy mypolicy.json

    NOT OK


Expected Behavior

/healthz should return "OK" as long as the Tracee server is running and responsive — regardless of event selection or policy usage.


Root Cause Analysis

The health status depends on the internal heartbeat_event , which calls SendPulse()setHealth(true) and sets isHealthy = true.

  • default events include heartbeat_event → health check passes
  • Custom events (like anti_debugging) or policies do not include heartbeat_eventisHealthy remains false"NOT OK"

Thus, healthiness is incorrectly tied to the presence of an internal diagnostic event rather than actual server liveness.


Environment

  • Tracee version: main-0b8ac82ca
  • OS: Ubuntu (via tracee-shell)
  • Build: make all in ubuntu-shell

Suggested Fix

Decouple /healthz from heartbeat_event. The endpoint should reflect server readiness (e.g., listening, no critical init errors), not event subscription.

Possible approaches:

  • Include heartbeat_event automatically when healthz is enabled
  • OR: Manually add health function and then call heartbeat.SendPulse() (Not sure where to add this)

Additional Context

This breaks Kubernetes liveness probes and monitoring when using custom policies or focused event sets. Users expect healthz to indicate service availability.


Priority: High (affects observability in production)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions