Skip to content

buckyd and bucky don't agree on the carbon hashring #31

@zerosoul13

Description

@zerosoul13

Hello,

I've raised an issue on the incorrect repo and would like to bring it to the right one. Below my original post on jjneely/buckytools. The content below is just added for everyone to have context on my initial issue jjneely/buckytools#38


I've found 2 issues which I would love to discuss:

BuckyD and bucky configuration

buckyd will accept the members of the hashring via non-option cli arguments as buckyd <graphite1:port> <graphite2:port> ....
bucky calls for the cluster configuration and it will get graphite1:hashringport instead of graphite1:4242 because of this mismatch, bucky won't be able to reach the buckyd members

/usr/sbin/bucky servers -h go-carbon-0.go-carbon.graphite:4242
2021/11/10 01:01:57 Error retrieving URL: Get "http://go-carbon-0.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:45902->172.16.76.119:2004: read: connection reset by peer
2021/11/10 01:01:57 Cluster unhealthy: go-carbon-0.go-carbon.graphite:2004: Get "http://go-carbon-0.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:45902->172.16.76.119:2004: read: connection reset by peer

2021/11/10 01:01:57 Error retrieving URL: Get "http://go-carbon-1.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:47772->172.16.27.82:2004: read: connection reset by peer
2021/11/10 01:01:57 Cluster unhealthy: go-carbon-1.go-carbon.graphite:2004: Get "http://go-carbon-1.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:47772->172.16.27.82:2004: read: connection reset by peer

2021/11/10 01:01:57 Error retrieving URL: Get "http://go-carbon-2.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:40946->172.16.58.32:2004: read: connection reset by peer
2021/11/10 01:01:57 Cluster unhealthy: go-carbon-2.go-carbon.graphite:2004: Get "http://go-carbon-2.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:40946->172.16.58.32:2004: read: connection reset by peer

2021/11/10 01:01:57 Error retrieving URL: Get "http://go-carbon-3.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:43570->172.16.127.44:2004: read: connection reset by peer
2021/11/10 01:01:57 Cluster unhealthy: go-carbon-3.go-carbon.graphite:2004: Get "http://go-carbon-3.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:43570->172.16.127.44:2004: read: connection reset by peer

2021/11/10 01:01:57 Error retrieving URL: Get "http://go-carbon-4.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:51674->172.16.27.91:2004: read: connection reset by peer
2021/11/10 01:01:57 Cluster unhealthy: go-carbon-4.go-carbon.graphite:2004: Get "http://go-carbon-4.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:51674->172.16.27.91:2004: read: connection reset by peer

2021/11/10 01:01:57 Error retrieving URL: Get "http://go-carbon-5.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:56712-172.16.14.79:2004: read: connection reset by peer
2021/11/10 01:01:57 Cluster unhealthy: go-carbon-5.go-carbon.graphite:2004: Get "http://go-carbon-5.go-carbon.graphite:2004/hashring": read tcp 172.16.14.79:56712->172.16.14.79:2004: read: connection reset by peer`

Buckd daemons are using port: 4242
Hashing algorithm: [carbon: 6 nodes, 100 replicas, 600 ring members go-carbon-0.go-carbon.graphite:2004=None go-carbon-1.go-carbon.graphite:2004=None go-carbon-2.go-carbon.graphite:2004=None go-carbon-3.go-carbon.graphite:2004=None go-carbon-4.go-carbon.graphite:2004=None go-carbon-5.go-carbon.graphite:2004=None]
Number of replicas: 100
Found these servers:
        go-carbon-0.go-carbon.graphite:2004
        go-carbon-1.go-carbon.graphite:2004
        go-carbon-2.go-carbon.graphite:2004
        go-carbon-3.go-carbon.graphite:2004
        go-carbon-4.go-carbon.graphite:2004
        go-carbon-5.go-carbon.graphite:2004

Is cluster healthy: false
2021/11/10 01:01:57 Cluster is inconsistent.

I've tracked the issue to line https://github.com/go-graphite/buckytools/blob/master/cmd/bucky/cluster.go#L88. The the port value for the cluster member is set to the same port as the hashring one instead of 4242 (or whichever port is specified by user).

To test this theory, I've forked and patched the code to set it to default 4242 and cluster is reported as healthy with the correct hashring values as below

/ # /usr/sbin/bucky servers -h go-carbon-5.go-carbon.graphite:4242
Buckd daemons are using port: 4242
Hashing algorithm: [carbon: 6 nodes, 100 replicas, 600 ring members go-carbon-0.go-carbon.graphite:2004=None go-carbon-1.go-carbon.graphite:2004=None go-carbon-2.go-carbon.graphite:2004=None go-carbon-3.go-carbon.graphite:2004=None go-carbon-4.go-carbon.graphite:2004=None go-carbon-5.go-carbon.graphite:2004=None]
Number of replicas: 100
Found these servers:
        go-carbon-0.go-carbon.graphite:4242
        go-carbon-1.go-carbon.graphite:4242
        go-carbon-2.go-carbon.graphite:4242
        go-carbon-3.go-carbon.graphite:4242
        go-carbon-4.go-carbon.graphite:4242
        go-carbon-5.go-carbon.graphite:4242

Is cluster healthy: true

Is this a real issue or just a misconfiguration on my side?

Inconsistent metric count will almost match active metric count

  1. bucky is reporting metrics as inconsistent on our cluster and the number is nearly the same as the active metrics one which is very odd. Taking a closer look, this line https://github.com/go-graphite/buckytools/blob/master/cmd/bucky/inconsistent.go#L69 does check the port values and these don't match because one is 2004 and the other is 4242.

The original code does not take the ports into account, just the hostnames
https://github.com/jjneely/buckytools/blob/master/cmd/bucky/inconsistent.go#L64

Is my assumption that these rings won't match because of this correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions