Skip to content

Add jemalloc profiling via HTTP API #7746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: unstable
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[env]
# Set the number of arenas to 16 when using jemalloc.
JEMALLOC_SYS_WITH_MALLOC_CONF = "abort_conf:true,narenas:16"
#
# Provide `prof:true` to allow profiling, but `prof_active:false` to require
# profiling to be explicitly activated at runtime (possible via the BN HTTP API).
JEMALLOC_SYS_WITH_MALLOC_CONF = "abort_conf:true,narenas:16,prof:true,prof_active:false"
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions beacon_node/http_api/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ lighthouse_network = { workspace = true }
lighthouse_version = { workspace = true }
logging = { workspace = true }
lru = { workspace = true }
malloc_utils = { workspace = true }
metrics = { workspace = true }
network = { workspace = true }
operation_pool = { workspace = true }
Expand Down
49 changes: 49 additions & 0 deletions beacon_node/http_api/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4631,6 +4631,53 @@ pub fn serve<T: BeaconChainTypes>(
},
);

// POST lighthouse/malloc/prof_dump
let post_lighthouse_malloc_prof_dump = warp::path("lighthouse")
.and(warp::path("malloc"))
.and(warp::path("prof_dump"))
.and(warp::body::json())
.and(warp::path::end())
// Skip the `BeaconProcessor` for memory dumps so we can execute them as
// quickly as possible. Memory dumps should be uncommon and very
// deliberate.
.then(|filename: String| {
let dump = || {
let path = PathBuf::from_str(&filename).map_err(|e| {
warp_utils::reject::custom_bad_request(format!(
"Unable to parse {filename} as path: {e:?}"
))
})?;
if path.exists() {
Err(warp_utils::reject::custom_bad_request(format!(
"{filename} already exists"
)))
} else {
malloc_utils::prof_dump(&filename)
.map(|()| warp::reply::json(&filename).into_response())
.map_err(warp_utils::reject::custom_bad_request)
}
};

convert_rejection(dump())
});

// POST lighthouse/malloc/prof_active
let post_lighthouse_malloc_prof_active = warp::path("lighthouse")
.and(warp::path("malloc"))
.and(warp::path("prof_active"))
.and(warp::body::json())
.and(warp::path::end())
// Skip the `BeaconProcessor` for profiling so we can execute it as
// quickly as possible. Memory dumps should be uncommon and very
// deliberate.
.then(|enable: bool| {
let result = malloc_utils::prof_active(enable)
.map(|()| warp::reply::json(&enable).into_response())
.map_err(warp_utils::reject::custom_bad_request);

convert_rejection(result)
});

let get_events = eth_v1
.and(warp::path("events"))
.and(warp::path::end())
Expand Down Expand Up @@ -4908,6 +4955,8 @@ pub fn serve<T: BeaconChainTypes>(
.uor(post_lighthouse_compaction)
.uor(post_lighthouse_add_peer)
.uor(post_lighthouse_remove_peer)
.uor(post_lighthouse_malloc_prof_dump)
.uor(post_lighthouse_malloc_prof_active)
.recover(warp_utils::reject::handle_rejection),
),
)
Expand Down
1 change: 1 addition & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
* [FAQs](./faq.md)
* [Protocol Developers](./developers.md)
* [Lighthouse Architecture](./developers_architecture.md)
* [Memory Profiling](./developers_memory_profiling.md)
* [Security Researchers](./security.md)
* [Archived](./archived.md)
* [Merge Migration](./archived_merge_migration.md)
Expand Down
58 changes: 58 additions & 0 deletions book/src/developers_memory_profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Memory Profiling in Lighthouse

Lighthouse ships with jemalloc enabled by default on Linux, with heap profiling (`prof:true,prof_active:false`) already configured. This guide explains how to capture and inspect heap profiles using `jeprof` to help diagnose memory issues. Use this profiling setup to catch leaks, regressions, or bloated allocation paths.

## 1. Build Lighthouse with Debug Symbols

To make the profiling data readable, build Lighthouse with debug symbols:

```bash
RUSTFLAGS="-C debuginfo=2" make
```

This ensures the installed `lighthouse` binary includes symbol information. `debug = true` in Cargo profiles is more expansive (enabling `opt-level = 0`, e.g.), but `debuginfo=2` is sufficient and better suited for profiling with optimized binaries.

## 2. Run the Beacon Node

Run the node as usual:

```bash
lighthouse bn ...
```

Let it run for a while to accumulate allocations if desired. Note that jemalloc only records allocations after profiling is activated - consider this when deciding when to start profiling.

> **Be consistent:** When analyzing a profile dump, `jeprof` must be given the exact path to the binary used to launch the process. In this setup, it's simply `$(which lighthouse)`.

## 3. Start Profiling and Dump Memory

Enable jemalloc profiling:

```bash
curl -X POST http://localhost:5052/lighthouse/malloc/prof_active -H "Content-Type: application/json" -d "true"
```

Trigger a memory profile dump:

```bash
curl -X POST http://localhost:5052/lighthouse/malloc/prof_dump -H "Content-Type: application/json" -d '"/home/ubuntu/prof.dump"'
```

## 4. Analyze with `jeprof`

Install `jeprof` and dependencies

```bash
sudo apt update
sudo apt install libjemalloc-dev graphviz
```

Generate a visualization:

```bash
jeprof --svg $(which lighthouse) /home/ubuntu/prof.dump > profile.svg
```

Open `profile.svg` in a browser to inspect memory usage.

> **Important:** Symbol resolution will fail if the path to `lighthouse` doesn't exactly match how it was invoked. Stick to `$(which lighthouse)` if that's how the binary was executed.
7 changes: 4 additions & 3 deletions common/malloc_utils/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,22 @@ edition = { workspace = true }

[features]
mallinfo2 = []
jemalloc = ["tikv-jemallocator", "tikv-jemalloc-ctl"]
jemalloc-profiling = ["tikv-jemallocator/profiling"]
jemalloc = ["tikv-jemallocator", "tikv-jemalloc-ctl", "tikv-jemalloc-sys"]

[dependencies]
libc = "0.2.79"
metrics = { workspace = true }
parking_lot = { workspace = true }
tikv-jemalloc-ctl = { version = "0.6.0", optional = true, features = ["stats"] }
tikv-jemalloc-sys = { version = "0.6.0", optional = true }

[target.'cfg(not(target_os = "linux"))'.dependencies]
tikv-jemallocator = { version = "0.6.0", optional = true, features = ["stats"] }
tikv-jemallocator = { version = "0.6.0", optional = true, features = ["stats", "profiling"] }

# Jemalloc's background_threads feature requires Linux (pthreads).
[target.'cfg(target_os = "linux")'.dependencies]
tikv-jemallocator = { version = "0.6.0", optional = true, features = [
"stats",
"background_threads",
"profiling"
] }
64 changes: 64 additions & 0 deletions common/malloc_utils/src/jemalloc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@
//!
//! A) `JEMALLOC_SYS_WITH_MALLOC_CONF` at compile-time.
//! B) `_RJEM_MALLOC_CONF` at runtime.

use metrics::{
set_gauge, set_gauge_vec, try_create_int_gauge, try_create_int_gauge_vec, IntGauge, IntGaugeVec,
};
use std::ffi::{c_char, c_int};
use std::sync::LazyLock;
use std::{mem, ptr};
use tikv_jemalloc_ctl::{arenas, epoch, raw, stats, Access, AsName, Error};

#[global_allocator]
Expand Down Expand Up @@ -124,6 +127,67 @@ pub fn page_size() -> Result<usize, Error> {
"arenas.page\0".name().read()
}

/// A convenience wrapper around `mallctl` for writing `value` to `name`.
///
/// # Safety
///
/// - `name` must be a valid, null-terminated jemalloc control name.
/// - `value` must match the expected type for the specified control.
/// - The jemalloc allocator must be initialised.
///
/// Incorrect usage may cause undefined behaviour or allocator corruption.
unsafe fn mallctl_write<T>(name: &[u8], mut value: T) -> Result<(), c_int> {
// Use `tikv_jemalloc_sys::mallctl` directly since the `jemalloc_ctl::raw`
// functions artifically limit the `name` values.
let status = tikv_jemalloc_sys::mallctl(
name as *const _ as *const c_char,
ptr::null_mut(),
ptr::null_mut(),
&mut value as *mut _ as *mut _,
mem::size_of::<T>(),
);

if status == 0 {
Ok(())
} else {
Err(status)
}
}

/// Add a C-style `0x00` terminator to the string and return it as a `Vec` of
/// bytes.
#[allow(dead_code)]
fn terminate_string_for_c(s: &str) -> Vec<u8> {
let mut terminated = vec![0x00_u8; s.len() + 1];
terminated[..s.len()].copy_from_slice(s.as_ref());
terminated
}

/// Uses `mallctl` to call `"prof.dump"`.
///
/// This generates a heap profile at `filename`.
#[allow(dead_code)]
pub fn prof_dump(filename: &str) -> Result<(), String> {
let terminated_filename = terminate_string_for_c(filename);

unsafe {
mallctl_write(
"prof.dump\0".as_ref(),
terminated_filename.as_ptr() as *const c_char,
)
}
.map_err(|e| format!("Failed to call prof.dump on mallctl: {e:?}"))
}

/// Uses `mallctl` to call `"prof.enable"`.
///
/// Controls whether profile sampling is active.
#[allow(dead_code)]
pub fn prof_active(enable: bool) -> Result<(), String> {
unsafe { mallctl_write("prof.active\0".as_ref(), enable) }
.map_err(|e| format!("Failed to call prof.active on mallctl with code {e:?}"))
}

#[cfg(test)]
mod test {
use super::*;
Expand Down
31 changes: 31 additions & 0 deletions common/malloc_utils/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@ pub use interface::*;
mod interface {
pub use crate::glibc::configure_glibc_malloc as configure_memory_allocator;
pub use crate::glibc::scrape_mallinfo_metrics as scrape_allocator_metrics;

#[allow(dead_code)]
pub use super::prof_active_unsupported as prof_active;
#[allow(dead_code)]
pub use super::prof_dump_unsupported as prof_dump;
}

#[cfg(feature = "jemalloc")]
Expand All @@ -53,6 +58,8 @@ mod interface {
Ok(())
}

pub use crate::jemalloc::prof_active;
pub use crate::jemalloc::prof_dump;
pub use crate::jemalloc::scrape_jemalloc_metrics as scrape_allocator_metrics;
}

Expand All @@ -68,4 +75,28 @@ mod interface {

#[allow(dead_code)]
pub fn scrape_allocator_metrics() {}

#[allow(dead_code)]
pub use super::prof_dump_unsupported as prof_dump;

#[allow(dead_code)]
pub use super::prof_active_unsupported as prof_active;
}

#[allow(dead_code)]
pub fn prof_dump_unsupported(_: &str) -> Result<(), String> {
Err(
"Profile dumps are only supported when Lighthouse is built for Linux \
using the `jemalloc` feature."
.to_string(),
)
}

#[allow(dead_code)]
pub fn prof_active_unsupported(_: bool) -> Result<(), String> {
Err(
"Enabling profiling is only supported when Lighthouse is built for Linux \
using the `jemalloc` feature."
.to_string(),
)
}