Skip to content

Connection from satellite to agent with misconfigured zone does not fail #10405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jschmidt-icinga opened this issue Apr 8, 2025 · 2 comments · May be fixed by #10415
Open

Connection from satellite to agent with misconfigured zone does not fail #10405

jschmidt-icinga opened this issue Apr 8, 2025 · 2 comments · May be fixed by #10415
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working

Comments

@jschmidt-icinga
Copy link
Contributor

Describe the bug

In the following situation the cluster and cluster-zone checks fails to detect a misconfigured zone on the agent.

onboarding-5 is a satellite that wants to connect to an agent onboarding-2 that has a misconfigured zone.

onboarding-5 tries to connect to onboarding-2 which rejects the attempt because it doesn't know onboarding-5 as an endpoint.

[2025-04-08 10:08:49 +0000] information/ApiListener: New client connection for identity 'onboarding-5' from [::ffff:10.27.2.199]:60250 (no Endpoint object found for identity)

but onboarding-5 does not seem to realize this and continues to send commands that onboarding-2 rejects due to an invalid endpoint origin:

[2025-04-08 10:08:49 +0000] notice/JsonRpcConnection: Received 'event::ExecuteCommand' message from identity 'onboarding-5'.
[2025-04-08 10:08:49 +0000] notice/ClusterEvents: Discarding 'execute command' message from 'onboarding-5': Invalid endpoint origin (client not allowed).

This leads to the checks run on onboarding-2 to come up as "Overdue":
Image

To Reproduce

onboarding-5's zone configuration:

object Endpoint "onboarding-1" {
}

object Endpoint "onboarding-4" {
}

object Zone "master" {
      endpoints = [ "onboarding-1", "onboarding-4" ]
}

object Endpoint "onboarding-5" {
}

object Zone "onboarding-5" {
      endpoints = [ "onboarding-5" ]
      parent = "master"
}

object Zone "global-templates" {
      global = true
}

object Zone "director-global" {
      global = true
}

object Endpoint "onboarding-2" {
    host = "10.27.1.225"
    log_duration = 0s
}

object Zone "onboarding-2" {
    parent = "onboarding-5"
    endpoints = [ "onboarding-2" ]
}

And on the agent onboarding-2 the following (misconfigured) setup is left over from when the agent was connected directly to the masters:

object Endpoint "onboarding-1" {
}

object Endpoint "onboarding-4" {
}

object Zone "master" {
      endpoints = [ "onboarding-1", "onboarding-4" ]
}

object Endpoint "onboarding-2" {
}

object Zone "onboarding-2" {
      endpoints = [ "onboarding-2" ]
      parent = "master"
}

object Zone "global-templates" {
      global = true
}

object Zone "director-global" {
      global = true
}

The following services then return ok even though the connection fails and no tests are returned from onboarding-5 (see attached log):

template Service "Generic Service" {
    max_check_attempts = "5"
    check_interval = 1m
    retry_interval = 30s
}

template Service "Icinga Service" {
    import "Generic Service"

    command_endpoint = host_name
}

object Service "cluster-zone-onboarding-2" {
    host_name = "onboarding-5"
    import "Icinga Service"

    check_command = "cluster-zone"
    vars.cluster_zone = "onboarding-2"
}

Expected behavior

I would expect the connection to fail, the cluster check on onboarding-5 and the above cluster-zone check to come up as CRITICAL and the service checks not being "Overdue" but in state UNKNOWN.

Your Environment

  • Version used (icinga2 --version): r2.14.5-1
  • Operating System and version: Debian 12
  • Enabled features (icinga2 feature list): api checker mainlog
  • Icinga Web 2 version and modules (System - About): 2.12.4

Additional context

Debug.log from the agent onboarding-2:
debug.log

@yhabteab yhabteab added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Apr 8, 2025
@jschmidt-icinga jschmidt-icinga linked a pull request Apr 17, 2025 that will close this issue
@Al2Klimov
Copy link
Member

Technically speaking, the connection to the agent IS ok... from the satellite PoV.😅

For YOUR case I personally would setup a cluster check on the AGENT side.

@julianbrost
Copy link
Contributor

For YOUR case I personally would setup a cluster check on the AGENT side.

I doubt that would help in that scenario. Even if that reports some kind of problem, the agent wouldn't send the result to the satellite and it would never show up in Icinga Web.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants