-
Notifications
You must be signed in to change notification settings - Fork 469
Description
Description
The /healthz endpoint reports "NOT OK" when Tracee is started with custom events (e.g., --events anti_debugging) or any --policy, even though the server is running correctly. It only returns "OK" when the default event set is used — which includes the internal heartbeat_event.
This makes the health check unreliable for non-default configurations.
Steps to Reproduce
-
Build Tracee in a clean environment:
make -f builder/Makefile.tracee-make ubuntu-shell make all
-
Start Tracee with healthz and a custom event:
sudo ./dist/tracee --server healthz --events anti_debugging
-
Query the health endpoint:
curl localhost:3366/healthz
Result:
NOT OK -
Now start with default events:
sudo ./dist/tracee --server healthz --events default
curl localhost:3366/healthz
Result:
OK -
Same failure occurs with any
--policyflag:sudo ./dist/tracee --server healthz --policy mypolicy.json
→
NOT OK
Expected Behavior
/healthz should return "OK" as long as the Tracee server is running and responsive — regardless of event selection or policy usage.
Root Cause Analysis
The health status depends on the internal heartbeat_event , which calls SendPulse() → setHealth(true) and sets isHealthy = true.
defaultevents includeheartbeat_event→ health check passes- Custom events (like
anti_debugging) or policies do not includeheartbeat_event→isHealthyremainsfalse→"NOT OK"
Thus, healthiness is incorrectly tied to the presence of an internal diagnostic event rather than actual server liveness.
Environment
- Tracee version:
main-0b8ac82ca - OS: Ubuntu (via
tracee-shell) - Build:
make allinubuntu-shell
Suggested Fix
Decouple /healthz from heartbeat_event. The endpoint should reflect server readiness (e.g., listening, no critical init errors), not event subscription.
Possible approaches:
- Include
heartbeat_eventautomatically when healthz is enabled - OR: Manually add health function and then call heartbeat.SendPulse() (Not sure where to add this)
Additional Context
This breaks Kubernetes liveness probes and monitoring when using custom policies or focused event sets. Users expect healthz to indicate service availability.
Priority: High (affects observability in production)