Skip to content

aramshiva/nomen

image

✍️ Nomen

A web viewer and parser for every name listed on a social security card between 1880-2024.

Nomen is a website that allows people to explore and visualize name data from the US Social Security Adminstration (SSA). You can search through names, compare them, see charts and use actuary data to see how many people with a given name are alive.

The dataset is sourced from the US Social Security Administration's Baby Names from Social Security Card Applications Dataset and the US Social Security Adminstration's Actuarial Tables Dataset.

Development

Nomen runs on Next.js and Tailwind CSS. The database is powered by mySQL and Drizzle ORM. The database scripts are made in python.

Running locally

Creating a database

To run nomen locally, first you need to spin up a database with name data populated inside. You'll first want to spin up a mySQL database. Then you will want to go to the db folder and populate the .env file with your mySQL creds (example creds can be found on the .env.example file).

Once you've done that, run the main.py file, this will go through the data and populate the database's names tables.

Note

Main names table schema:

name VARCHAR(255),
sex CHAR(1),
amount INT,
year INT

Once that's done, it will create the uniquenames and unique_names (confusing names I know lol) databases (this is done automatically, just leave the script running)!

Then go to the actuary folder in there and run the main.py folder in there, this will spin up the actuary tables for death/age prediction.

Spinning up the site locally

Once you have the database working locally, it's really easy to setup up the website. First install bun, and then run bun i (this will install the packages onto the repository on your computer). Now just run bun dev and then go to localhost:3000, there you will see nomen running locally!

API

For detailed API documentation, see the API README.


Important Notes About the Data:

  • The database contains approximately 2.1 million records spanning from 1880 to 2024
  • Names with 5 or fewer occurrences within a specific sex and year are defaulted to 5 by the SSA to protect privacy
  • The sex field is recorded as a single character: "M" (Male) or "F" (Female), (there may be other sex field's due to errors/unknowns when logged by the SSA, these are not shown).
  • The year represents the birth year, not the registration year
  • Raw data is organized in annual files (yobYYYY.txt) with the format "name,sex,number" in the sql/names folder.
  • Names are sorted by sex and then by occurrences in descending order
  • Ties in occurrences are resolved alphabetically

Note

This does not include any social security numbers. The only data stored is the name, frequency, sex, year born This is public data given by the Social Security Administration. No PII is stored

(Tabulated based on Social Security records as of May 5th, 2025), any person who registered for social security between 1880 and 2024 are on the dataset.

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Contributors 3

  •  
  •  
  •