Skip to content

Changes to Property Value Storage

Robert Jacobson edited this page Sep 30, 2025 · 3 revisions

Changes to Property Value Storage

The current scheme for storing property values uses an optimization wherein values that are never read are never stored. It turns out, though, that in our current implementation iterating over a property's values for the entire population (or large subsets thereof) is a pathological access pattern.

First let's make explicit some concepts that are already implicit in the current codebase. Then we apply the new concepts to modify the context.get_person_property() implementation. Finally, we look at consequences for "query result iterators" in the context of queries and sampling.

A generalization of PersonProperty::is_derived()

Some time ago I suggested an optimization of "even lazier" property value initialization wherein initial values of properties that have default values are not written upon first access but rather are only written upon first write. It turns out this idea doesn't work. An illustrative example is the case that a property is initialized based on the state of the world at the time it is accessed—or an even more extreme example, is initialized with a randomly generated number. Then until the property is written to, repeated accesses of the property value would return different values, but a change event would never be fired. This contradicts the desired semantics.

What I didn't realize at the time is that we already have two distinct ways of specifying the initial value of a property:

  1. define_person_property!(), which accepts an optional function / closure that takes context: &Context and person_id: PersonId as arguments and computes the initial value.
  2. define_person_property_with_default(), which accepts a constant expression independent of context and person_id with which the property's initial value is set.

Thus, we already have the following distinct semantics, which I now encode in this enum:

/// The kind of initialization that a property has.
#[derive(Copy, Clone, Eq, PartialEq, Debug)]
pub enum PropertyInitializationKind {
    /// The property is not derived and has no initial value.
    Normal,
    /// The property is a derived property (it's value is computed dynamically from other property values)
    Derived,
    /// The property is given a constant initial value. Its initialization does not trigger a change event.
    Constant,
    /// The property is given an initial value computed by a function of the
    /// `context` and `person_id`. Its initialization does not trigger a change event.
    Dynamic,
}

In the Constant case, it is safe to do "even lazier" property value initialization.

We make the following changes to PersonProperty:

pub trait PersonProperty{
    // ...
    /// The kind of initialization this property has. Implementors (macros) implement 
  	/// this method instead of `is_derived()`.
    fn property_initialization_kind() -> PropertyInitializationKind;

    /// Whether this property is derived.
    fn is_derived() -> bool {
        Self::property_initialization_kind() == PropertyInitializationKind::Derived
    }
  	// ...
}

Old context.get_person_property Algorithm

context.get_person_property(person_id):

  • If T::is_derived(), return T::compute(context, person_id)

  • If the PersonId is beyond the bounds of the property vector, OR if the value is not set (vec[person_id.0] == None), then

  1. Extend the length of the property vector if needed, and
  2. set the value to the default value computed by Property::compute(self, person_id);.

New context.get_person_property Algorithm

context.get_person_property(person_id):

  • let initialization_kind = Property::property_initialization_kind()

  • If initialization_kind==Derived, return Property::compute(context, person_id)

  • If the PersonId is beyond the bounds of the property vector, OR if the value is not set (vec[person_id.0] == None), then

    • match Property::property_initialization_kind() {...
      • if PropertyInitializationKind::Normal, panic (or return None for variants used internally)
      • if PropertyInitializationKind::Constant, return Property::compute(context, person_id)
      • if PropertyInitializationKind::Dynamic (the only other case):
  1. Extend the length of the property vector if needed, and
  2. set the value to the default value computed by Property::compute(self, person_id);.

Iterators in Querying and Sampling

The Dynamic initialization case is the most pessimistic. Because the query and sample operations require all values to be assigned, and because we hold immutable references during the query/sampling, we must initialize the property for the entire population before we do these operations.

But this isn't new, it just happened implicitly before and is more explicit now (in the in-progress reimplementation). In fact:

  • Previously, writing a new property triggered this:
if index >= values.len() {
    values.resize(index + 1, None);
}

So when we iterate over monotonically increasing PersonIds, this code resizes the vector by one on every call!

  • We know beforehand that we will be iterating over the population in the query / sampling case. So we modify the code to allocate and initialize the property vector once up front. This guarantees that Dynamic case in the third bullet under the "New Algorithm" above never occurs during querying and sampling.

For (unindexed) Derived properties, all Dynamic transitive dependencies would similarly need to be initialized beforehand.

Clone this wiki locally