Skip to content

Clock conflicts and other errors when clustering #149

@xrodriguez-betterdoc

Description

@xrodriguez-betterdoc

Hi there!

We are using Swarm in a cluster with 4 nodes (eks). They discover themselves (dynamically) using libcluster (Kubernetes strategy) as many people is doing right now.
I didn't expect to get that amount of warnings and errors when using Swarm.. Maybe we are doing something wrong??

To give you some examples of the warnings we receive:

[swarm on {app}@x.x.x.x] [tracker:handle_replica_event] received track event for "{process}", mismatched pids, local clock conflicts with remote clock, event unhandled
** (exit) exited in: :gen_statem.call(Swarm.Tracker, {:track, "{process}", %{mfa: {Module, :start_link, ["{process}", {state}]}}}, 5000)
    ** (EXIT) time out
[swarm on {app}@x.x.x.x] [tracker:ensure_swarm_started_on_remote_node] nodeup for {app}@x.x.x.x was ignored because: {:badrpc, {:EXIT, {:timeout, {:gen_server, :call, [:application_controller, :which_applications]}}}}
[swarm on {app}@x.x.x.x] [tracker:handle_topology_change] handoff failed for "{process}": {:timeout, {GenServer, :call, [#PID<0.11273.0>, {:swarm, :begin_handoff}, 5000]}}

and some others..

Something worrying me is also how Swarm knows where to send the handoff messages. If we are rollout restarting a deployment, does it decide to send those messages to the "new" nodes? Or maybe it's sending them to the ones that will be knocked down in a second?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions