|
| 1 | +# Storing Sample Context in V8 Continuation-Preserved Embedder Data |
| 2 | + |
| 3 | +## What is the Sample Context? |
| 4 | +Datadog's Node.js profiler has the ability to store a custom object that it will |
| 5 | +then associate with collected CPU samples. We refer to this object as the |
| 6 | +"sample context." A higher-level embedding (typically, dd-trace-js) will then |
| 7 | +update the sample context to keep it current with changes in the execution. A |
| 8 | +typical piece of data sample context stores is the tracing span ID, so whenever |
| 9 | +it changes, the sample context needs to be updated. |
| 10 | + |
| 11 | +## How is the Sample Context stored and updated? |
| 12 | +Before Node 23, the sample context would be stored in a |
| 13 | +`std::shared_ptr<v8::Global<v8::Value>>` field on the C++ `WallProfiler` |
| 14 | +instance. (In fact, due to the need for ensuring atomic updates and shared |
| 15 | +pointers not being effectively updateable atomically it's actually a pair of |
| 16 | +fields with an atomic pointer-to-shared-pointer switching between them, but I |
| 17 | +digress.) Due to it being a single piece of instance state, it had to be updated |
| 18 | +every time the active span changed, possibly on every invocation of |
| 19 | +`AsyncLocalStorage.enterWith` and `.run`, but even more importantly on every |
| 20 | +async context change, and for that we needed to register a "before" callback |
| 21 | +with `async_hooks.createHook`. This meant that we needed to both update the |
| 22 | +sample context on every async context change, but more importantly it also meant |
| 23 | +we needed to use `async_hooks.createHook` which is getting deprecated in Node. |
| 24 | +Current documentation for it is not exactly a shining endorsement: |
| 25 | +> Please migrate away from this API, if you can. We do not recommend using the |
| 26 | +> createHook, AsyncHook, and executionAsyncResource APIs as they have usability |
| 27 | +> issues, safety risks, and performance implications. |
| 28 | +
|
| 29 | +Fortunately, first the V8 engine and then Node.js gave us building blocks for a |
| 30 | +better solution. |
| 31 | + |
| 32 | +## V8 Continuation-Preserved Embedder Data and Node.js Async Context Frame |
| 33 | +In the V8 engine starting from version 12 (the one shipping with Node 22) |
| 34 | +`v8::Isolate` exposes an API to set and get embedder-specific data on it so that |
| 35 | +it is preserved across executions that are logical continuations of each other |
| 36 | +(essentially: across promise chains; this includes await expressions.) Even |
| 37 | +though the APIs are exposed on the isolate, the data is stored on a |
| 38 | +per-continuation basis and the engine takes care to return the right one when |
| 39 | +`Isolate::GetContinuationPreservedEmbedderData()` method is invoked. We will |
| 40 | +refer to continuation-preserved embedder data as "CPED" from now on. |
| 41 | + |
| 42 | +Starting with Node.js 23, CPED is used to implement data storage behind Node.js |
| 43 | +`AsyncLocalStorage` API. This dovetails nicely with our needs as all the |
| 44 | +span-related data we set on the sample context is normally managed in an async |
| 45 | +local storage (ALS) by the tracer. An application can create any number of |
| 46 | +ALSes, and each ALS manages a single value per async context. This value is |
| 47 | +somewhat confusingly called the "store" of the async local storage, making it |
| 48 | +important to not confuse the terms "storage" (an identity with multiple values, |
| 49 | +one per async context) and "store", which is a value of a storage within a |
| 50 | +particular async context. |
| 51 | + |
| 52 | +The new implementation for storing ALS stores introduces an internal Node.js |
| 53 | +class named `AsyncContextFrame` (ACF) which is a map that uses ALSes as keys, |
| 54 | +and their stores as the map values, essentially providing a mapping from an ALS |
| 55 | +to its store in the current async context. (This implementation is very similar |
| 56 | +to how e.g. Java implements `ThreadLocal`, which is a close analogue to ALS in |
| 57 | +Node.js.) ACF instances are then stored in CPED. |
| 58 | + |
| 59 | +## Storing the Sample Context in CPED |
| 60 | +Node.js – as the embedder of V8 – commandeers the CPED to store instances of |
| 61 | +ACF in it. This means that our profiler can't directly store our sample context |
| 62 | +in the CPED, because then we'd overwrite the ACF reference already in there and |
| 63 | +break Node.js. Fortunately, since ACF is "just" an ordinary JavaScript object, |
| 64 | +we can define a new property on it, and store our sample context in it! |
| 65 | +JavaScript properties can have strings, numbers, or symbols as their keys, with |
| 66 | +symbols being the recommended practice to define properties that are hidden from |
| 67 | +unrelated code as symbols are private to their creator and only compare equal to |
| 68 | +themselves. Thus we create a private symbol in the profiler instance for our |
| 69 | +property key, and our logic for storing the sample context thus becomes: |
| 70 | +* get the CPED from the V8 isolate |
| 71 | +* if it is not an object, do nothing (we can't set the sample context) |
| 72 | +* otherwise set the sample context as a value in the object with our property |
| 73 | + key. |
| 74 | + |
| 75 | +The reality is a bit thornier, though. Imagine what happens if while we're |
| 76 | +setting the property, we get interrupted by a PROF signal and the signal handler |
| 77 | +tries to read the property value? It could easily observe an inconsistent state |
| 78 | +and crash. But even if it reads a property value, which one did it read? Still |
| 79 | +the old one, already the new one, or maybe a torn value between the two? |
| 80 | + |
| 81 | +Fortunately, we had the exact same problem with our previous approach where we |
| 82 | +only stored one sample context in the profiler instances, and the solution is |
| 83 | +the same. We encapsulate the pair of shared pointers to a V8 `Global` and an |
| 84 | +atomic pointer-to-pointer in a class named `AtomicContextPtr`, which looks like |
| 85 | +this: |
| 86 | +``` |
| 87 | +using ContextPtr = std::shared_ptr<v8::Global<v8::Value>>; |
| 88 | +
|
| 89 | +class AtomicContextPtr { |
| 90 | + ContextPtr ptr1; |
| 91 | + ContextPtr ptr2; |
| 92 | + std::atomic<ContextPtr*> currentPtr = &ptr1; |
| 93 | + ... |
| 94 | +``` |
| 95 | +A `Set` method on this class will first store the newly passed sample context in |
| 96 | +either `ptr1` or `ptr2` – whichever `currentPtr` is _not_ pointing to at the |
| 97 | +moment. Subsequently it atomically updates `currentPtr` to now point to it. |
| 98 | + |
| 99 | +Instead of storing the current sample context in the ACF property directly, |
| 100 | +we want to store an `AtomicContextPtr` (ACP.) The only problem? This is a C++ |
| 101 | +class, and properties of JavaScript objects can only be JavaScript values. |
| 102 | +Fortunately, V8 gives us a solution for this as well: the `v8::External` type is |
| 103 | +a V8 value type that wraps a `void *`. |
| 104 | +So now the algorithm for setting a sample context is: |
| 105 | +* get the CPED from the V8 isolate |
| 106 | +* if it is not an object, do nothing (we can't set the sample context) |
| 107 | +* Retrieve the property value. If there is one, it's the `External` wrapping the |
| 108 | + pointer to the ACP we use. |
| 109 | +* If there is none, allocate a new ACP on C++ heap, create a `v8::External` to |
| 110 | + hold its pointer, and store it as a property in the ACF. |
| 111 | +* Set the sample context as a value on the either retrieved or created ACP. |
| 112 | + |
| 113 | +The chain of data now looks something like this: |
| 114 | +``` |
| 115 | +v8::Isolate (from Isolate::GetCurrent()) |
| 116 | + +-> current continuation (internally managed by V8) |
| 117 | + +-> node::AsyncContextFrame (in continuation's CPED field) |
| 118 | + +-> v8::External (in AsyncContextFrame's private property) |
| 119 | + +-> dd::AsyncContextPtr (in External's data field) |
| 120 | + +-> std::shared_ptr<v8::Global<v8::Value>> (in either AsyncContextPtr::ptr1 or ptr2) |
| 121 | + +-> v8::Global (in shared_ptr) |
| 122 | + +-> v8::Value (the actual sample context object) |
| 123 | +``` |
| 124 | +The last 3-4 steps were the same in the previous code version as well, except |
| 125 | +`ptr1` and `ptr2` were directly represented in the `WallProfiler`, so then it |
| 126 | +looked like this: |
| 127 | +``` |
| 128 | +dd::WallProfiler |
| 129 | + +-> std::shared_ptr<v8::Global<v8::Value>> (in either WallProfiler::ptr1 or ptr2) |
| 130 | + +-> v8::Global (in shared_ptr) |
| 131 | + +-> v8::Value (the actual sample context object) |
| 132 | +``` |
| 133 | +The difference between the two diagrams shows how we encapsulated the |
| 134 | +`(ptr1, ptr2, currentPtr)` tuple into a separate class and moved it out from |
| 135 | +being an instance state of `WallProfiler` to being a property of every ACF we |
| 136 | +encounter. |
| 137 | + |
| 138 | +## Odds and ends |
| 139 | +And that's mostly it! There are few more small odds and ends to make it work |
| 140 | +safely. We still need to guard writing the property value to the ACF against |
| 141 | +concurrent access by the signal handler, but now it happens only once for every |
| 142 | +ACF, when we create its ACP. We guard by introducing an atomic boolean and |
| 143 | +proper signal fencing. |
| 144 | + |
| 145 | +The signal handler code also needs to be prevented from trying to access the |
| 146 | +data while a GC is in progress. With this new model, the signal handler |
| 147 | +unfortunately needs to do a small number of V8 API invocations. It needs to |
| 148 | +retrieve the current V8 `Context`, it needs to obtain a `Local` for the property |
| 149 | +key, and finally it needs to use both in an `Object::Get` call on the CPED. |
| 150 | +Calling a property getter on an object is reentrancy into V8, which is advised |
| 151 | +against, but this being an ordinary property it ends up being a single dependent |
| 152 | +load, which turns out to work safely… unless there's GC happening. For this |
| 153 | +reason, we register GC prologue and epilogue callbacks with the V8 isolate so we |
| 154 | +can know when GCs are ongoing and the signal handler will refrain from touching |
| 155 | +CPED during them. We'll however grab the current sample context from the CPED |
| 156 | +and store it in a profiler instance field in the GC prologue and use it for any |
| 157 | +samples taken during GC. |
| 158 | + |
| 159 | +Speaking of GC, we can now have an unbounded number of ACPs – one for each live |
| 160 | +ACF. Each ACP is allocated on the C++ heap, and needs to be deleted eventually. |
| 161 | +The profiler tracks every ACP it creates in an internal set of live ACPs and |
| 162 | +deletes them all when it itself gets disposed. This would still allow for |
| 163 | +unbounded growth so we additionally register a V8 GC finalization callback for |
| 164 | +every ACF. When V8 collects an ACF instance its finalization callback will put |
| 165 | +that ACF's ACP into the profiler's internal vector of ready-to-delete ACPs and |
| 166 | +the profiler processes that vector (both deletes the ACP and removes it from the |
| 167 | +live set) on each call to `SetContext`. |
| 168 | + |
| 169 | +## Changes in dd-trace-js |
| 170 | +For completeness, we'll describe the changes in dd-trace-js here as well. The |
| 171 | +main change is that with Node 24, we no longer require async hooks. The |
| 172 | +instrumentation points for `AsyncLocalStorage.enterWith` and |
| 173 | +`AsyncLocalStorage.run` remain in place – they are the only ones that are needed |
| 174 | +now. |
| 175 | + |
| 176 | +There are some small performance optimizations that no longer apply with the new |
| 177 | +approach, though. For one, with the old approach we did some data conversions |
| 178 | +(span IDs to string, a tag array to endpoint string) in a sample when a sample |
| 179 | +was captured. With the new approach, we do these conversions for all sample |
| 180 | +contexts during profile serialization. Doing them after each sample capture |
| 181 | +amortized their cost possibly minimally reducing the latency induced at |
| 182 | +serialization time. With the old approach we also called `SetContext` only once |
| 183 | +per sampling – we'd install a sample context to be used for the next sample, and |
| 184 | +then kept updating a `ref` field in it with a reference to the actual data. |
| 185 | +Since we no longer have a single sample context (but one per continuation) we |
| 186 | +can not do this anymore, and we need to call `SetContext` on every ACF change. |
| 187 | +The cost of this (basically, going into a native call from JavaScript) are still |
| 188 | +well offset by not having to use async hooks and do work on every async context |
| 189 | +change. We could arguably simplify the code by removing those small |
| 190 | +optimizations. |
0 commit comments