Skip to content

Commit b243ea6

Browse files
authored
Merge pull request #136 from ExpediaGroup/fix/tcp_keep_alive_in_eks
Added keepalive config for EKS
2 parents 7a78f06 + 5db127f commit b243ea6

File tree

4 files changed

+36
-6
lines changed

4 files changed

+36
-6
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@ All notable changes to this project will be documented in this file.
33

44
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
55

6+
7+
## [4.5.3] - 2024-07-01
8+
### Added
9+
- Added support for setting the TCP keepalive settings of Waggledance.
10+
611
## [4.5.2] - 2024-06-04
712
### Updated
813
- Changed Service account creation to make it work with eks 1.24 and later.

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,10 @@ For more information please refer to the main [Apiary](https://github.com/Expedi
5858
| root_vol_type | Waggle Dance EC2 root volume type. | string | `gp2` | no |
5959
| root_vol_size | Waggle Dance EC2 root volume size. | string | `10` | no |
6060
| enable_query_functions_across_all_metastores | This controls the thrift call for `get_all_functions`. It is generally used to initialize a client and get built-in functions and registered UDF's from a metastore. Setting this to `false` is more performant as WD then only gets the functions from the `primary` metastore. However, setting this to `true` will collate results by calling `get_all_functions` from all configured metastores. This could be potentially slow if some of the metastores are slow to respond. If all the metastores configured are of the same version and no additional UDF's are installed, then WD gets the same functions back so it's not very useful to call this across metastores. For backwards compatibility, this property can be set to `true`. Further read: https://github.com/ExpediaGroup/waggle-dance#server | bool | false | no |
61-
| tcp_keepalive_time | Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS. | number | `200` | no |
62-
| tcp_keepalive_intvl | Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS. | number | `30` | no |
63-
| tcp_keepalive_probes | Sets net.ipv4.tcp_keepalive_probes (seconds), currently only supported in ECS. | number | `2` | no |
61+
| enable_tcp_keepalive | tcp_keepalive settings on the Waggledance pods. To use this you need to enable the ability to cahnge sysctl settings on your kubernetes cluster. For EKS you need to allow this on your cluster (https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ check EKS version for details). If your EKS version is below 1.24 you need to create a PodSecurityPolicy allowing the following sysctls "net.ipv4.tcp_keepalive_time", "net.ipv4.tcp_keepalive_intvl","net.ipv4.tcp_keepalive_probes" and a ClusterRole + Rolebinding for the service account running the HMS pods or all services accounts in the namespace where Apiary is running so that kubernetes can apply the tcp)keepalive configuration. For EKS 1.25 and above check this https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes. Also see tcp_keepalive_* variables. | bool | false | no |
62+
| tcp_keepalive_time | Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS. | number | `200` | no |
63+
| tcp_keepalive_intvl | Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS. | number | `30` | no |
64+
| tcp_keepalive_probes | Sets net.ipv4.tcp_keepalive_probes (seconds), currently only supported in ECS. | number | `2` | no |
6465
| datadog_key_secret_name | Name of the secret containing the DataDog API key. This needs to be created manually in AWS secrets manager. | string | | no |
6566
| datadog_agent_version | Version of the Datadog Agent running in the ECS cluster. | string | `7.46.0-jmx` | no |
6667
| include_datadog_agent | Whether to include the datadog-agent container alongside Waggledance. | string | bool | no |

k8s.tf

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,23 @@ resource "kubernetes_deployment_v1" "waggle_dance" {
7878
spec {
7979
service_account_name = kubernetes_service_account_v1.waggle_dance[0].metadata.0.name
8080
automount_service_account_token = true
81+
dynamic "security_context" {
82+
for_each = var.enable_tcp_keepalive ? ["enabled"] : []
83+
content {
84+
sysctl {
85+
name = "net.ipv4.tcp_keepalive_time"
86+
value = var.tcp_keepalive_time
87+
}
88+
sysctl {
89+
name = "net.ipv4.tcp_keepalive_intvl"
90+
value = var.tcp_keepalive_intvl
91+
}
92+
sysctl {
93+
name = "net.ipv4.tcp_keepalive_probes"
94+
value = var.tcp_keepalive_probes
95+
}
96+
}
97+
}
8198
container {
8299
image = "${var.docker_image}:${var.docker_version}"
83100
name = local.instance_alias

variables.tf

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -394,24 +394,31 @@ variable "datadog_metrics_enabled" {
394394
default = false
395395
}
396396

397+
variable "enable_tcp_keepalive" {
398+
description = "Enable tcp keepalive settings on the waggledance pods."
399+
type = bool
400+
default = false
401+
}
402+
397403
variable "tcp_keepalive_time" {
398-
description = "Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS."
404+
description = "Sets net.ipv4.tcp_keepalive_time (seconds)."
399405
type = number
400406
default = 200
401407
}
402408

403409
variable "tcp_keepalive_intvl" {
404-
description = "Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS."
410+
description = "Sets net.ipv4.tcp_keepalive_intvl (seconds)."
405411
type = number
406412
default = 30
407413
}
408414

409415
variable "tcp_keepalive_probes" {
410-
description = "Sets net.ipv4.tcp_keepalive_probes (number), currently only supported in ECS."
416+
description = "Sets net.ipv4.tcp_keepalive_probes (number)."
411417
type = number
412418
default = 2
413419
}
414420

421+
415422
variable "datadog_key_secret_name" {
416423
description = "Name of the secret containing the DataDog API key. This needs to be created manually in AWS secrets manager. This is only applicable to ECS deployments."
417424
type = string

0 commit comments

Comments
 (0)