Skip to content

Disk exhausted after upgrade 1.5.6-1.9.5 #24914

@monwolf

Description

@monwolf

Hi,
After upgrading our nodes from 1.5.6 -> 1.9.5 we observed differences in storage allocated resources.
We have 2 partitions on our hosts

  • / for the OS
  • /var/ for the tasks
    this is the storage free:

Image

our data_dir is pointing to /var/nomad . Our config looks like:

region = "gine"
name = "ec2devusfarm02"
log_level = "DEBUG"
leave_on_interrupt = true
leave_on_terminate = true
data_dir = "/var/nomad/data"
bind_addr = "0.0.0.0"
disable_update_check = true
limits {
        https_handshake_timeout   = "10s"
        http_max_conns_per_client = 400
        rpc_handshake_timeout     = "10s"
        rpc_max_conns_per_client  = 400
}
advertise {
    http = "10.121.200.13:4646"
    rpc = "10.121.200.13:4647"
    serf = "10.121.200.13:4648"
}
tls {
  http = true
  rpc  = true
  cert_file = "/opt/nomad/ssl/server.pem"
  key_file = "/opt/nomad/ssl/server-key.pem"
  ca_file = "/opt/nomad/ssl/nomad-ca.pem"
  verify_server_hostname = true
  verify_https_client    = true

}
log_file = "/var/log/nomad/"
log_json = true
log_rotate_max_files = 7
consul {
    address = "127.0.0.1:8500"
    server_service_name = "nomad-server"
    client_service_name = "nomad-client"
    auto_advertise = true
    server_auto_join = true
    client_auto_join = true

    ssl = true
    ca_file = "/opt/consul/ssl/consul-ca.pem"
    cert_file = "/opt/consul/ssl/server.pem"
    key_file = "/opt/consul/ssl/server-key.pem"
    token = "xxxxx"


}
acl {
  enabled = true
}

vault {
    enabled = true
    address = "https://vault.legacy-dev.com:8200/"
    ca_file = "/opt/vault/ssl/vault-ca.pem"
        cert_file = "/opt/vault/ssl/client-vault.pem"
    key_file = "/opt/vault/ssl/client-vault-key.pem"
}

telemetry {
  publish_allocation_metrics = true
  publish_node_metrics       = true
  datadog_address = "localhost:8125"
  disable_hostname = true
  collection_interval = "10s"
}
datacenter = "farm"

client {
    enabled = true
    network_interface = "ens5"
    cni_path = "/opt/cni/bin"
    cni_config_dir = "/etc/cni/net.d/"
}

plugin "docker" {
  config {
    auth {
      config = "/etc/docker/config.json"
    }
    allow_privileged = true
    volumes {
      enabled = true
    }
  }
}

After the upgrade we started to see exhausted disk errors when we tried to schedule a job:

Image

But the node has enough free storage. If we observe the nomad node status:

Image

As you can see, nomad uses / instead of /var to calculate allocable space. /var/ was used in 1.5.6. But in unique attributes is fingerprinting the right FS

Image

How can I solve that? I didn't see in the release notes something related with this.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Needs Roadmapping

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions