-
Notifications
You must be signed in to change notification settings - Fork 458
Do Not Merge: Memory Leak Investigation Summary #8167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
After some investigation, I think I know what is happening here. Indeed, @pkukielka is right about React reconciliation struggles with our code blocks, but I guess this is only a symptom of our mismemoization. On every message update from LLM backend, React creates a completely new DOM element for the whole code block in the message. You can imagine how many elements will be created and removed during one question/response session (it is a big number that browser can free up memory space fast enough). The more details
What I have tried already
|
This PR is not fixing a memory leak in the DOM highlighter code but significantly reduces amount of re-computation which is done. With mocked LLM response of length of 7528 characters I get: **Before my changes:** Execution time: 1 min 25 s ec Peak memory used: over 2GB Stable memory after final GC: 833MB **After my changes:** Execution time: **45 sec** Peak memory used: over 1GB Stable memory after final GC: 719MB Sop while memory usage drop was not significant there is very visible speedup in the execution speed. Also memory usage swings were subjectively less significant. ## Test plan I do not have good testing procedure on `main`, but I di testing using my testing PR, with some custom logging: #8167
Memory Leak Investigation Summary
Root Cause Discovery
The Problem
We discovered that the memory leak in Cody's webview is caused by React's inability to efficiently handle complex DOM structures generated by syntax highlighting, rather than the highlighting libraries themselves.
How We Reached This Conclusion
1. Simple testing framework for reproducible results
We slightly modified chat component to always stream the same mocked LLM in response to any input.
Mock data contains a 288-line TypeScript response with 6 complex code blocks, streamed character-by-character.
This implementation was essential for the memory leak investigation because it provided consistent test conditions for comparing memory usage between different highlighting implementations and retries. It is also a little bit faster than querying real LLM.
Without any changes to the highlighter running it once (one human question, one assistant response) leads to around 900-1000 MB usage (from initial ~100MB) clearly demonstrating a memory leak.
pnpm dev:standalone
command in theweb
directory and openhttp://localhost:5777/
in the browser.2. Initial Hypothesis Testing
CustomHTMLHighlighter
- Simple regex-based highlightingCustomHJSHighlighter
- Full highlight.js-powered highlighting3. Key Discovery
When examining the generated HTML output, we found that highlight.js produces highly complex nested DOM structures:
3. Elimination Process
❌ Ruled Out: highlight.js Library
useHighlightedCode
flag in theCustomHJSHighlighter.tsx
tofalse
runs all the highlight.js logic, but does not apply created code at the end. This does not cause a memory leak.❌ Ruled Out: Component Implementation
✅ Identified: React DOM Reconciliation Issue
Key Takeaway: The memory leak is not caused by the syntax highlighting library itself, but by React's inability to efficiently manage the complex DOM structures that advanced syntax highlighting generates. This finding shifts the focus from library replacement to DOM structure optimization.
Other Approaches Tested & Results
1. Caching Optimization Attempts
2. Conditional Highlighting During Streaming
3. Complete Architecture Change: markdown-it
**4. Moving Highlighting To WebWorker **
Interesting observation: When testing last 4 points on Linux I noticed sometimes memory usage was around ~470MB instead of ~900MB. Generally looking at the devtools Memory inspector when app is working is is clear that memory is allocated in batches, roughly doubling every time. That pattern looks more like heap allocation rather than DOM memory management issue, which is a bit confusing. Take form it your own conclusions ;)