Skip to content

homeId may not be unique #282

@eqmooring

Description

@eqmooring

Background

Each synthetic person in the synthetic population generated by scripts/create_synthetic_population has a homeId, which is a 17-digit number. The first 11 digits are based on the person's state and PUMA (will be based on state + county + census tract after #281 is merged in). The next 6 digits are a household-level identifier (the house_number) that is defined state-wide as a contiguous set of integers.

What's the problem?

Because there could be 1 million or more households in the synthetic population, restricting to just 6 digits could lead to household-level identifiers de facto being reused. In practice, in most instances, the overall homeId values will still be unique because of the PUMA / census tract component.

A similar issue may arise with workplaces and, theoretically, with schools.

What are possible solutions?

One solution would be to change the length of the household-identifier component in homeID to, say, 8-digits, which would allow for up to 99 million households in the synthetic population. This would be trivially easy to change in the code, but we'd have to check for any down-stream consequences on the how the homeId values are parsed. A more principled solution would be to create household-level identifiers that are sequential and nested within census tracts, though this would require more changes to the code to implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions