Automated ACS-HUD data linking for housing analysis: eligibility determination, protected-class analysis, and analysis-ready county summaries.
hudlink provides reliable, high-quality, reproducible datasets for housing economists, researchers, planners, and policy makers. It automatically downloads, processes, and links American Community Survey (ACS) microdata with HUD's Picture of Subsidized Housing data to produce analysis-ready datasets at both household and county levels.
Key Features:
- Automated data integration from multiple federal sources (IPUMS ACS, HUD PSH, HUD Income Limits)
- Household-level HUD program eligibility determination at 30%, 50%, and 80% AMI thresholds
- Protected class demographic analysis for fair housing research
- County-level program allocation rates and gap analysis
- Reproducible research workflows with comprehensive configuration options
This is the main recommended approach - it allows full customization through the configuration file:
git clone https://github.com/sdabney5/hudlink.git
cd hudlink
pip install -e .
Click the badge above, choose Save a copy in Drive, paste your IPUMS token,select options, then hit Run hudlink.
Simplest installation; customization options using command line:
pip install hudlink
hudlink uses the IPUMS USA API to download ACS microdata. You'll need a free IPUMS account:
- Register at https://usa.ipums.org/usa/
- Get your API token from your account dashboard
- Add your token to the existing secrets file:
- Open the
secrets
folder in your hudlink directory - Open the file
ipums_token.txt
- Replace
YOUR TOKEN HERE
with your actual IPUMS API token - Save the file
- Open the
If you cloned the repository (recommended), customize your analysis by editing src/hudlink/config.py
:
CONFIG = {
# Geographic and temporal scope
"states": ["FL", "CA", "NY"], # State abbreviations
"ipums_years": [2022, 2023], # ACS 5-year estimates
# HUD program selection
"program_labels": [
"Summary of All HUD Programs",
"Housing Choice Vouchers",
"Public Housing",
"LIHTC"
],
# Advanced processing options
"split_households_into_families": True, # Family vs household analysis
"exclude_group_quarters": False, # Remove institutional populations
"income_limit_agg": "max", # Multiple income limit handling
# Custom variable selection
"additional_ipums_vars": "CLASSWKR,TRANWORK,GRADEATT", # Employment, commute, education
# Output settings
"output_directory": "./outputs",
"api_settings": {
"use_ipums_api": True, # Automatic data download
"clear_api_cache": True # Clean up temporary files
}
}
Then run:
hudlink
hudlink also provides a CLI for quick analyses without editing the config file:
# Basic usage - Florida and Texas, 2023 data
hudlink --states FL, TX --years 2023
# Multiple years and custom output directory
hudlink --states CA, NY --years 2022, 2023 --output-dir ./my_analysis
# Include additional IPUMS variables
hudlink --states FL --years 2023 --additional-vars "CLASSWKR,TRANWORK,GRADEATT"
# Advanced options
hudlink --states FL --years 2023 \
--split-families \
--exclude-group-quarters \
--income-agg median \
--programs "Housing Choice Vouchers" "Public Housing"
CLI Options:
--states
: State abbreviations (e.g., FL, CA, NY)--years
: ACS years to process (e.g., 2022 2023)--output-dir
: Output directory path--additional-vars
: Additional IPUMS ACS variables (comma-separated)--split-families
: Split multi-family households for family-level analysis--exclude-group-quarters
: Exclude group quarters from eligibility counts--income-agg
: Income limit aggregation method (max, min, mean, median, mode)--programs
: Specific HUD programs to analyze--create-gap-visual
: Create choropleth map visualization after processing--help
: Show detailed help with all options
For programmatic control and integration into larger analysis workflows:
For programmatic control and custom workflows:
from hudlink import process_eligibility
from hudlink.config import CONFIG
# Basic usage with default settings
eligibility_df, summary_df = process_eligibility(
states=["FL", "GA"],
years=[2023],
ipums_api_token="your_token_here"
)
# Advanced configuration
custom_config = CONFIG.copy()
custom_config.update({
"split_households_into_families": True,
"additional_ipums_vars": "CLASSWKR,TRANWORK,GRADEATT",
"exclude_group_quarters": True,
"income_limit_agg": "median"
})
eligibility_df, summary_df = process_eligibility(
states=["FL", "CA", "NY"],
years=[2022, 2023],
config=custom_config,
ipums_api_token="your_token_here"
)
If you cloned the repository, you can customize analysis by editing src/hudlink/config.py
:
CONFIG = {
# Geographic and temporal scope
"states": ["FL", "CA", "NY"], # State abbreviations
"ipums_years": [2022, 2023], # ACS 5-year estimates
# HUD program selection
"program_labels": [
"Summary of All HUD Programs",
"Housing Choice Vouchers",
"Public Housing",
"LIHTC"
],
"create_gap_visual": True, # Create a sample viz of allocation rates at 50% AMI for selected states
"open_visualizations": True, # Automatically open viz when complete
# Advanced processing options
"split_households_into_families": True, # Family vs household analysis
"exclude_group_quarters": False, # Remove institutional populations
"income_limit_agg": "max", # Multiple income limit handling
# Custom variable selection
"additional_ipums_vars": "CLASSWKR,TRANWORK,GRADEATT", # Employment, commute, education
# Output settings
"output_directory": "./outputs",
"api_settings": {
"use_ipums_api": True, # Automatic data download
"clear_api_cache": True # Clean up temporary files
}
}
Then run:
cd hudlink
hudlink
hudlink produces two primary datasets:
Household/family-level microdata with:
- Complete ACS variables: All demographic and economic characteristics
- Eligibility flags:
Eligible_at_30%
,Eligible_at_50%
,Eligible_at_80%
- Protected class indicators: Race, ethnicity, disability status, veteran status, age
- Geographic identifiers: County, PUMA for spatial analysis
- Survey weights: For population-representative estimates
County-level aggregations with:
- Eligibility counts: Total and demographic-specific weighted counts and percentages
- HUD program data: Linked Picture of Subsidized Housing administrative data
- Allocation rates: Units available / eligible households
- Gap estimates: Unmet housing need by county and demographic group
When create_gap_visual
is enabled, hudlink generates an interactive choropleth map showing HUD program allocation rates across all processed states:
hud_allocation_gap_map_[year].html
: Interactive county-level map- Color-coded by allocation rate (at 50% ami Threshold)
- Hover tooltips show county name, state, and exact allocation percentage
- Uses "Summary of All HUD Programs" data, or first program in your config list
- Includes all states specified in your configuration
- Opens in any web browser, no internet connection required
- Fully self-contained HTML file suitable for sharing or embedding
Note: Visualizations are created after all state processing is complete and are saved to your main output directory alongside the individual state folders.
- IPUMS ACS Microdata: 50+ demographic and economic variables via API
- HUD Picture of Subsidized Housing: Administrative data on program utilization
- HUD Income Limits: County-specific AMI thresholds for eligibility determination
- Geographic Crosswalks: PUMA-to-county allocation factors for all US counties
- Family vs. Household Analysis: Option to analyze multi-family households at family unit level
- Protected Class Analysis: Comprehensive demographic flags for fair housing research
- Custom Variable Selection: Include any IPUMS ACS variable beyond defaults
- Income Limit Aggregation: Handle multiple income limits per county (max, min, mean, median, mode)
- Group Quarters Handling: Option to exclude institutional populations
In some cases, a multi-family household contains 2 or more families that would individually qualify for housing vouchers. hudlink provides the option to 'split' these households into distinct families and determine their eligibility for HUD programs separately, rather than treating the entire household as a single unit.
This setting is turned off by default but can be enabled in the configuration:
"split_households_into_families": True
This option provides more precise eligibility determination by analyzing each family unit independently, which can significantly impact eligibility counts in areas with high rates of multi-generational or multi-family households.
hudlink can exclude households listed in the census as living in group quarters (institutional populations such as nursing homes, prisons, college dormitories, etc.) from eligibility counts.
"exclude_group_quarters": True # Default: False
When enabled, these households will still appear in the eligibility dataset but will not be marked as eligible regardless of their income. This provides a more realistic estimate of households that could actually utilize housing assistance programs.
For some states (such as Connecticut or Vermont), HUD assigns multiple income limits for each county. Since hudlink is not equipped to accommodate multiple limits per county, it aggregates them using a specified method:
"income_limit_agg": "max" # Options: "max", "min", "mean", "median", "mode"
The default setting uses "max" to provide the most conservative (highest) eligibility threshold, resulting in more conservative eligibility count estimates. Using "min" would provide the most restrictive eligibility criteria.
Beyond the comprehensive default variables, you can include any IPUMS ACS variable in your analysis:
"additional_ipums_vars": "CLASSWKR,TRANWORK,GRADEATT" # Employment, commute, education variables
This allows for unlimited exploration of relationships between housing eligibility and other socioeconomic characteristics. See the IPUMS variable list for all available options.
- Housing Choice Vouchers (Section 8)
- Public Housing
- Low-Income Housing Tax Credit (LIHTC)
- Section 236/BMIR
- Section 8 New Construction/Substantial Rehabilitation
- 811/PRAC (Disabled)
- 202/PRAC (Elderly)
- Multi-Family Other programs
- Estimate unmet housing assistance need by county and demographic group
- Identify geographic and demographic disparities in program coverage
- Evaluate allocation efficiency across different HUD programs
- Analyze correlations between housing assistance eligibility and economic outcomes
- Study geographic patterns of housing need and program accessibility
- Examine demographic disparities in federal housing program coverage
Fair Housing Analysis: Researchers can filter the eligibility dataset for voucher-eligible households and analyze patterns by protected characteristics such as race, ethnicity, and disability status. For example: "Of all housing voucher-eligible households in Orange County, CA, what percentage are Black non-Hispanic veterans with a college education?" This type of analysis helps identify potential disparities in housing assistance access across demographic groups.
Geographic Disparity Analysis:
Using the summary dataset, researchers can calculate allocation rates (available units divided by eligible households) and identify counties with the highest unmet housing need. This analysis reveals geographic patterns of housing assistance coverage and can inform policy decisions about resource allocation and program expansion priorities.
Program Efficiency Assessment: By comparing eligibility counts across different AMI thresholds (30%, 50%, 80%) with actual program utilization data from HUD's Picture of Subsidized Housing, researchers can evaluate how effectively different HUD programs are reaching their target populations and identify potential barriers to program access.
- Python: 3.9 or higher
- Memory: 4GB+ RAM recommended for large states
- Storage: ~1-2 GB per state-year combination
- Internet: Required for automated data downloads
- IPUMS Account: Free registration at https://usa.ipums.org/usa/
hudlink integrates data from multiple authoritative federal sources:
- IPUMS USA: American Community Survey microdata with comprehensive demographic and economic variables
- HUD Picture of Subsidized Housing: Administrative records on federal housing assistance programs
- HUD Income Limits: County-specific Area Median Income (AMI) thresholds for program eligibility
- MCDC Crosswalks: Geographic allocation factors for PUMA-to-county mapping
The methodology follows established practices for housing needs assessment and has been peer-reviewed (Dabney, 2024, Cityscape).
We welcome contributions! Please see our Contributing Guidelines for details on:
- Reporting bugs and requesting features
- Contributing code and documentation
- Development setup and testing procedures
If you use hudlink in your research, please cite:
@software{dabney_hudlink_2025,
author = {Dabney, Shane},
title = {hudlink: Automated ACS-HUD Data Linking for Housing Analysis},
version = {3.1.0},
year = {2025},
url = {https://github.com/sdabney5/hudlink},
doi = {10.5281/zenodo.16547053}
}
Related Publications:
- Dabney, Shane. "Calculating County-Level Housing Choice Voucher Gaps: A Methodology." Cityscape 26, no. 2 (2024): 401–12.
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Report bugs and request features via GitHub Issues
- Email: sdabney@fsu.edu for methodological questions
- Research Assistants: Iris Bui and Mira Scannapieco (UROP interns, Florida State University)
- Funding: Institute for Humane Studies
- Data Providers: IPUMS USA
- Institution: Florida State University
hudlink is developed and maintained by Shane Dabney at Florida State University.