Skip to content

oomkill report doesn't work as expected (nsenter: failed to execute dmesg: No such file or directory) #1730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
donch opened this issue Feb 13, 2025 · 4 comments

Comments

@donch
Copy link

donch commented Feb 13, 2025

Describe the bug
When a oomkill occurs on our cluster (Based on Talos Linux), we are unable to get the full dmesg stack with the following error :

nsenter: failed to execute dmesg: No such file or directory

Expected behavior
We need the full dmesg output on that event

Additional context

We tried to debug the behavior using the toolbook image : robustadev/debug-toolbox.
It seens (at least on Talos) that the command used to get the dmesg isn't working (

command = f'nsenter -t 1 -a "{cmd}"'
)
When using nsenter -t 1 "{cmd}" it works as expected (at least from a debug-toolbox pods)

Copy link

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here.

@aantn
Copy link
Collaborator

aantn commented Feb 16, 2025

Hi @donch, we're actually planning to disable the dmesg enrichment by default in the upcoming Robusta release, as there are a few edge cases we're not happy with right now.

To help us prioritize fixing it, can you share more details on what you're looking for in the dmesg output and why you care about it? It does have helpful information on OOMKills, but given the challenges in making it work properly everywhere, we're not sure if it is worth the effort.

@donch
Copy link
Author

donch commented Feb 17, 2025

Hi @aantn , usually, the dmesg output is more a debug trace we can transmit to developper to understand why their application is running out of memory.
I think it's a nice to have informations

@aantn
Copy link
Collaborator

aantn commented Feb 24, 2025

Got it, thanks. We're looking at ways to let HolmesGPT surface up more of this data. If you're interested in discussing would love to chat and understand if it can work for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants