@@ -83,30 +83,45 @@ $ kubectl apply -f zabbix-agent-daemonset.yaml
83
83
84
84
| Zabbix Item Name | Zabbix Item Key |
85
85
| ------------ | ----------- |
86
- | ** etcd node: health** | healthz|
87
- | ** etcd node: receive requests** | v2/stats/self: recvAppendRequestCnt |
88
- | ** etcd node: send requests** | v2/stats/self: sendAppendRequestCnt |
89
- | ** etcd node: state** | v2/stats/self: state |
90
- | ** etcd node: expires** | v2/stats/store: expireCount |
91
- | ** etcd node: gets fail** | v2/stats/store: getsFail |
92
- | ** etcd node: gets success** | v2/stats/store: getsSuccess |
93
- | ** etcd node: watchers** | v2/stats/store: watchers |
94
- | ** etcd cluster: sets fail** | v2/stats/store: setsFail |
95
- | ** etcd cluster: sets success** | v2/stats/store: setsSuccess |
96
- | ** etcd cluster: update fail** | v2/stats/store: updateFail |
97
- | ** etcd cluster: update success** | v2/stats/store: updateSuccess |
98
- | ** etcd cluster: compare and delete fail** | v2/stats/store: compareAndDeleteFail |
99
- | ** etcd cluster: compare and delete success** | v2/stats/store: compareAndDeleteSuccess |
100
- | ** etcd cluster: compare and swap fail** | v2/stats/store: compareAndSwapFail |
101
- | ** etcd cluster: compare and swap success** | v2/stats/store: compareAndSwapSuccess |
102
- | ** etcd cluster: create fail** | v2/stats/store: createFail |
103
- | ** etcd cluster: create success** | v2/stats/store: createSuccess |
104
- | ** etcd cluster: delete fail** | v2/stats/store: deleteFail |
105
- | ** etcd cluster: delete success** | v2/stats/store: deleteSuccess |
106
- | ** ETCD MEMBERS** | v2/members |
107
- | ** etcd follower: {#MEMBER NAME} failed raft requests** | v2/stats/leader: followers /{#MEMBER ID}/counts/fail |
108
- | ** etcd follower: {#MEMBER NAME} successful raft requests** | v2/stats/leader: followers /{#MEMBER ID}/counts/success |
109
- | ** etcd follower: {#MEMBER NAME} latency to leader** | v2/stats/leader: followers /{#MEMBER ID}/latency/current |
86
+ | ** etcd node: health** | etcd.stats[ "health: health "] |
87
+ | ** etcd node: receive requests** | etcd.stats[ "v2/stats/self: recvAppendRequestCnt "] |
88
+ | ** etcd node: send requests** | etcd.stats[ "v2/stats/self: sendAppendRequestCnt "] |
89
+ | ** etcd node: state** | etcd.stats[ "v2/stats/self: state "] |
90
+ | ** etcd node: expires** | etcd.stats[ "v2/stats/store: expireCount "] |
91
+ | ** etcd node: gets fail** | etcd.stats[ "v2/stats/store: getsFail "] |
92
+ | ** etcd node: gets success** | etcd.stats[ "v2/stats/store: getsSuccess "] |
93
+ | ** etcd node: watchers** | etcd.stats[ "v2/stats/store: watchers "] |
94
+ | ** etcd cluster: sets fail** | etcd.stats[ "v2/stats/store: setsFail "] |
95
+ | ** etcd cluster: sets success** | etcd.stats[ "v2/stats/store: setsSuccess "] |
96
+ | ** etcd cluster: update fail** | etcd.stats[ "v2/stats/store: updateFail "] |
97
+ | ** etcd cluster: update success** | etcd.stats[ "v2/stats/store: updateSuccess "] |
98
+ | ** etcd cluster: compare and delete fail** | etcd.stats[ "v2/stats/store: compareAndDeleteFail "] |
99
+ | ** etcd cluster: compare and delete success** | etcd.stats[ "v2/stats/store: compareAndDeleteSuccess "] |
100
+ | ** etcd cluster: compare and swap fail** | etcd.stats[ "v2/stats/store: compareAndSwapFail "] |
101
+ | ** etcd cluster: compare and swap success** | etcd.stats[ "v2/stats/store: compareAndSwapSuccess "] |
102
+ | ** etcd cluster: create fail** | etcd.stats[ "v2/stats/store: createFail "] |
103
+ | ** etcd cluster: create success** | etcd.stats[ "v2/stats/store: createSuccess "] |
104
+ | ** etcd cluster: delete fail** | etcd.stats[ "v2/stats/store: deleteFail "] |
105
+ | ** etcd cluster: delete success** | etcd.stats[ "v2/stats/store: deleteSuccess "] |
106
+ | ** ETCD MEMBERS** | etcd.member.discovery |
107
+ | ** etcd follower: {#MEMBER NAME} failed raft requests** | etcd.stats[ "v2/stats/leader: followers /{#ID}/counts/fail"] |
108
+ | ** etcd follower: {#MEMBER NAME} successful raft requests** | etcd.stats[ "v2/stats/leader: followers /{#ID}/counts/success"] |
109
+ | ** etcd follower: {#MEMBER NAME} latency to leader** | etcd.stats[ "v2/stats/leader: followers /{#ID}/latency/current"] |
110
+ | ** The number of leader changes seen** | etcd.metrics[ counter,etcd_server_leader_changes_seen_total] |
111
+ | ** The total number of failed proposals seen** | etcd.metrics[ counter,etcd_server_proposals_failed_total] |
112
+ | ** Whether or not a leader exists. 1 is existence, 0 is not** | etcd.metrics[ gauge,etcd_server_has_leader] |
113
+ | ** The total number of consensus proposals applied in last 5 minutes** | etcd.metrics[ gauge,etcd_server_proposals_applied_total] |
114
+ | ** The total number of consensus proposals committed in last 5 minutes** | etcd.metrics[ gauge,etcd_server_proposals_committed_total] |
115
+ | ** The current number of pending proposals to commit** | etcd.metrics[ gauge,etcd_server_proposals_pending] |
116
+ | ** Maximum number of open file descriptors** | etcd.metrics[ gauge,process_max_fds] |
117
+ | ** Number of open file descriptors** | etcd.metrics[ gauge,process_open_fds] |
118
+ | ** etcd_disk_backend_commit_duration_seconds_count in last 5 minutes** | etcd.metrics[ histogram,etcd_disk_backend_commit_duration_seconds_count] |
119
+ | ** etcd_disk_backend_commit_duration_seconds_sum in last 5 minutes** | etcd.metrics[ histogram,etcd_disk_backend_commit_duration_seconds_sum] |
120
+ | ** The latency distributions of commit called by backend in last 5 minutes** | last("etcd.metrics[ histogram,etcd_disk_backend_commit_duration_seconds_sum] ",0)/last("etcd.metrics[ histogram,etcd_disk_wal_fsync_duration_seconds_count] ",0) |
121
+ | ** etcd_disk_wal_fsync_duration_seconds_count in last 5 minutes** | etcd.metrics[ histogram,etcd_disk_wal_fsync_duration_seconds_count] |
122
+ | ** etcd_disk_wal_fsync_duration_seconds_sum in last 5 minutes** | etcd.metrics[ histogram,etcd_disk_wal_fsync_duration_seconds_sum] |
123
+ | ** The latency distributions of fsync called by wal in last 5 minutes** | last("etcd.metrics[ histogram,etcd_disk_wal_fsync_duration_seconds_sum] ",0)/last("etcd.metrics[ histogram,etcd_disk_wal_fsync_duration_seconds_count] ",0) |
124
+
110
125
111
126
112
127
### Kubernetes apiserver/controller/scheduler
@@ -119,34 +134,34 @@ $ kubectl apply -f zabbix-agent-daemonset.yaml
119
134
| ** apiserver_request_count: error_rate (verb=PATCH)** | apiserver_request_error_rate[ PATCH] |
120
135
| ** apiserver_request_count: error_rate (verb=POST)** | apiserver_request_error_rate[ POST] |
121
136
| ** apiserver_request_count: error_rate (verb=PUT)** | apiserver_request_error_rate[ PUT] |
122
- | ** apiserver_request_count: verb=DELETE, metrics=error_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,DELETE: error_count ] |
123
- | ** apiserver_request_count: verb=DELETE, metrics=total_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,DELETE: total_count ] |
124
- | ** apiserver_request_count: verb=GET, metrics=error_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,GET: error_count ] |
125
- | ** apiserver_request_count: verb=GET, metrics=total_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,GET: total_count ] |
126
- | ** apiserver_request_count: verb=LIST, metrics=error_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,LIST: error_count ] |
127
- | ** apiserver_request_count: verb=POST, metrics=total_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,LIST: total_count ] |
128
- | ** apiserver_request_count: verb=PATCH, metrics=error_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PATCH: error_count ] |
129
- | ** apiserver_request_count: verb=PATCH, metrics=total_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PATCH: total_count ] |
130
- | ** apiserver_request_count: verb=POST, metrics=error_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,POST: error_count ] |
131
- | ** apiserver_request_count: verb=POST, metrics=total_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,POST: total_count ] |
132
- | ** apiserver_request_count: verb=PUT, metrics=error_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PUT: error_count ] |
133
- | ** apiserver_request_count: verb=PUT, metrics=total_count** | metrics_exporter [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PUT: total_count ] |
134
- | ** apiserver_request_latencies: DELETE** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,DELETE] |
135
- | ** apiserver_request_latencies: GET** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,GET] |
136
- | ** apiserver_request_latencies: LIST** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,LIST] |
137
- | ** apiserver_request_latencies: PATCH** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,PATCH] |
138
- | ** apiserver_request_latencies: POST** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,POST] |
139
- | ** apiserver_request_latencies: PUT** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,PUT] |
140
- | ** apiserver_request_latencies: POST** | metrics_exporter [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,POST] |
141
- | ** apiserver: healthz** | metrics_exporter [ https://{HOST.IP}:443/healthz,healthz] |
142
- | ** kube-scheduler: healthz** | metrics_exporter [ http://{HOST.IP}:10251/healthz,healthz] |
143
- | ** kube-scheduler: current leader** | metrics_exporter [ https://{HOST.IP}:443,get_leader,kube-scheduler] |
144
- | ** kube-controller-manager: healthz** | metrics_exporter [ http://{HOST.IP}:10252/healthz,healthz] |
145
- | ** kube-controller-manager: current leader** | metrics_exporter [ https://{HOST.IP}:443,get_leader,kube-controller-manager] |
137
+ | ** apiserver_request_count: verb=DELETE, metrics=error_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,DELETE: error_count ] |
138
+ | ** apiserver_request_count: verb=DELETE, metrics=total_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,DELETE: total_count ] |
139
+ | ** apiserver_request_count: verb=GET, metrics=error_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,GET: error_count ] |
140
+ | ** apiserver_request_count: verb=GET, metrics=total_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,GET: total_count ] |
141
+ | ** apiserver_request_count: verb=LIST, metrics=error_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,LIST: error_count ] |
142
+ | ** apiserver_request_count: verb=POST, metrics=total_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,LIST: total_count ] |
143
+ | ** apiserver_request_count: verb=PATCH, metrics=error_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PATCH: error_count ] |
144
+ | ** apiserver_request_count: verb=PATCH, metrics=total_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PATCH: total_count ] |
145
+ | ** apiserver_request_count: verb=POST, metrics=error_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,POST: error_count ] |
146
+ | ** apiserver_request_count: verb=POST, metrics=total_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,POST: total_count ] |
147
+ | ** apiserver_request_count: verb=PUT, metrics=error_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PUT: error_count ] |
148
+ | ** apiserver_request_count: verb=PUT, metrics=total_count** | kube.metrics [ https://{HOST.IP}:443/metrics,counter,apiserver_request_count,PUT: total_count ] |
149
+ | ** apiserver_request_latencies: DELETE** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,DELETE] |
150
+ | ** apiserver_request_latencies: GET** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,GET] |
151
+ | ** apiserver_request_latencies: LIST** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,LIST] |
152
+ | ** apiserver_request_latencies: PATCH** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,PATCH] |
153
+ | ** apiserver_request_latencies: POST** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,POST] |
154
+ | ** apiserver_request_latencies: PUT** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,PUT] |
155
+ | ** apiserver_request_latencies: POST** | kube.metrics [ https://{HOST.IP}:443/metrics,summary,apiserver_request_latencies_summary,POST] |
156
+ | ** apiserver: healthz** | kube.metrics [ https://{HOST.IP}:443/healthz,healthz] |
157
+ | ** kube-scheduler: healthz** | kube.metrics [ http://{HOST.IP}:10251/healthz,healthz] |
158
+ | ** kube-scheduler: current leader** | kube.metrics [ https://{HOST.IP}:443,get_leader,kube-scheduler] |
159
+ | ** kube-controller-manager: healthz** | kube.metrics [ http://{HOST.IP}:10252/healthz,healthz] |
160
+ | ** kube-controller-manager: current leader** | kube.metrics [ https://{HOST.IP}:443,get_leader,kube-controller-manager] |
146
161
147
162
148
163
### Kubelet
149
164
| Zabbix Item Name | Zabbix Item Key |
150
165
| ------------ | ----------- |
151
- | ** kubelet: healthz** | metrics_exporter [ https://{HOST.IP}:10250/healthz,healthz] |
152
- | ** KUBELET_RUNNING_POD_COUNT** | metrics_exporter [ https://{HOST.IP}:10250/metrics,gauge,kubelet_running_pod_count] |
166
+ | ** kubelet: healthz** | kube.metrics [ https://{HOST.IP}:10250/healthz,healthz] |
167
+ | ** KUBELET_RUNNING_POD_COUNT** | kube.metrics [ https://{HOST.IP}:10250/metrics,gauge,kubelet_running_pod_count] |
0 commit comments