Skip to content

increasing gpu memory over reset simulation for reinforcement learning in kubernetes contrainers #6836

@xinxing-max

Description

@xinxing-max

I’m running multiple Webots simulations in parallel inside Kubernetes pods, each under Fluxbox as the X server in headless (“batch”) mode. All pods have access to the same NVIDIA GPU(s) via the Kubernetes NVIDIA device plugin, and I launch Webots with --batch --mode=fast. Despite this, GPU memory usage keeps climbing over time and is never released—even after simulations reset or by using gc.collect(), or clear up the cuda cache after every learning process.

Environment:

Kubernetes pods

Fluxbox on Xorg as X server

Webots R2025a in headless mode (Start command: webots --stdout --stderr --batch --minimize --mode=fast worlds/rl_world.wbt)

Symptoms:

GPU memory usage increases steadily from 250 MB to 5 GB for every webots-bin process. It could be even higher if the GPU memory is sufficient.

The controller e.g. python script for reinforcement learning uses only 200 MB GPU memory.

Graphics Card: NVIDIA L4 24 GB
Operating System: Ubuntu 22.04, 5.15.0-136-generic

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions