Skip to content

usnistgov/retrofit-cost-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

74 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Retrofit Cost Prediction Tool

A Python package for predicting seismic retrofit costs using pre-trained machine learning models.

Overview

This tool provides pre-trained machine learning models to predict the cost of seismic retrofits for buildings. Users can select any of the models to make predictions on their own data.

Installation

Stable Release (GitHub)

pip install git+https://github.com/usnistgov/retrofit-cost-ml.git

Development (Internal via NIST GitLab)

pip install git+https://sgitlab.nist.gov/jff/retrofit-cost-tool.git

Requirements

  • Python 3.12
  • numpy==1.26.4
  • pandas==2.3.0
  • scikit-learn==1.7.0
  • joblib==1.5.1
  • matplotlib==3.10.3
  • seaborn==0.13.2

Usage

1. Command-Line Scripts

Making Predictions

# Use default synthetic data
python scripts/predict.py

# Use your own data file
python scripts/predict.py path/to/your/data.csv

# Use specific model
python scripts/predict.py path/to/your/data.csv random_forest_model

Available models: ridge_model, elastic_net_model, random_forest_model, gradient_boosting_model, ols_model, glm_gamma_model, best_model

Training Models

# Train models (saves to package models directory)
python scripts/main.py

2. Programmatic API

from retrofit_cost_tool import load_data, predict, main
import pandas as pd

# Load your data
data = load_data('path/to/your/data.csv')

# Make predictions
predictions = predict(data, model_name='best_model')

# Train new models
best_model_name, best_model, metrics, X_valid, y_valid = main(
    verbose=True,
    save_models=True,
    save_metrics=True
)

3. Interactive Jupyter Notebook

For users who prefer a graphical interface, use our interactive notebook:

# Launch Jupyter and open the prediction notebook
jupyter notebook notebooks/retrofit-cost-tool-predict.ipynb

Features:

  • πŸ“ File upload widget - drag and drop your CSV files
  • πŸ€– Model selection dropdown - choose from all available models
  • πŸ“Š Interactive plotting - visualize predictions vs actual values
  • πŸ’Ύ CSV export - save predictions with original features and metadata
  • πŸ“ˆ Feature analysis - summary statistics by building characteristics

Data Format

Your input data should be a CSV file with the following columns:

Required Columns

Column Description Type Example Valid Range
area Building area (sq ft) Numeric 5000 > 0
bldg_age Building age (years) Numeric 45 β‰₯ 0
stories Number of stories Numeric 3 β‰₯ 1
seismicity_pga050 Peak ground acceleration Numeric 0.4 0-1
p_obj_dummy Performance objective Binary 1 0 or 1
bldg_group_dummy Building type Binary 1 0 or 1
sp_dummy Structural performance Binary 0 0 or 1
occup_cond Occupancy condition Numeric 1 1-3
historic_dummy Historic designation Binary 0 0 or 1

Optional Columns

  • ystruct19: Actual retrofit costs (for model validation and comparison)
  • building_id: Building identifier (for tracking and reporting)

Sample Data

Download example data: synthetic_data.csv

Model Information

The package includes several pre-trained models:

  • Ridge Regression: Linear model with L2 regularization
  • Elastic Net: Linear model with L1 and L2 regularization
  • Random Forest: Ensemble of decision trees
  • Gradient Boosting: Boosted ensemble method
  • OLS: Ordinary Least Squares regression
  • GLM Gamma: Generalized Linear Model with Gamma distribution

The best_model automatically selects the best performing model based on cross-validation.

Training Data

Dataset Overview

The models were trained on a comprehensive dataset of seismic retrofit projects to ensure robust predictions across different building types and seismic conditions.1

Training Dataset Characteristics:

  • Size: 1526 buildings
  • Geographic Coverage: United States
  • Building Types: Office buildings, residential structures, mixed-use, historic buildings
  • Seismic Zones: Range of seismic hazard levels
  • Performance Objectives: Mainly Life Safety (LS) but also some Damage Control (DC) and Immediate Occupancy (IO)
  • Target: Structural retrofit cost (per square foot)

Data Preprocessing

  • Feature scaling and normalization
  • Categorical variable encoding
  • Interaction term creation

Limitations

  • Training data is comprehensive but outdated and does not represent current technical practices for seismic retrofits
  • Performance may vary in areas with different building codes
  • Costs scaled to 2019 USD. Users should adjust for inflation and local market factors

Synthetic Data for Testing

For users who want to test the package without real building data:

from retrofit_cost_tool.data_utils import load_data
from importlib import resources

# Load synthetic test data
with resources.path('retrofit_cost_tool.data', 'synthetic_data.csv') as data_path:
    test_data = load_data(str(data_path))

Synthetic Data Features:

  • Realistic building parameter distributions, based on original (training) data
  • Covers full range of model inputs
  • Includes ground truth target for validation
  • Safe for public sharing and testing

Examples

Basic Building Analysis

import pandas as pd
from retrofit_cost_tool import predict

# Create sample building data
building_data = pd.DataFrame({
    'area': [5000, 8000, 12000],
    'bldg_age': [45, 30, 60], 
    'stories': [3, 5, 8],
    'seismicity_pga050': [0.4, 0.6, 0.5],
    'p_obj_dummy': [1, 0, 1],
    'bldg_group_dummy': [1, 1, 1],
    'sp_dummy': [0, 1, 0],
    'occup_cond': [1, 2, 2],
    'historic_dummy': [0, 0, 1]
})

# Get predictions
costs = predict(building_data, model_name='best_model')
print(f"Predicted costs: ${costs[0]:,.0f}, ${costs[1]:,.0f}, ${costs[2]:,.0f}")

Portfolio Analysis

# Load building portfolio
portfolio = load_data('my_buildings.csv')

# Compare different models
for model in ['ridge_model', 'random_forest_model', 'best_model']:
    predictions = predict(portfolio, model_name=model)
    total_cost = sum(predictions)
    print(f"{model}: Total portfolio cost = ${total_cost:,.0f}")

Development

Project Structure

retrofit-cost-tool/
β”œβ”€β”€ src/retrofit_cost_tool/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                          # Model training
β”‚   β”œβ”€β”€ predict.py                       # Prediction functionality
β”‚   β”œβ”€β”€ data_utils.py                    # Data loading and preprocessing
β”‚   β”œβ”€β”€ model_utils.py                   # Model training functions
β”‚   β”œβ”€β”€ model_io.py                      # Model saving/loading
β”‚   β”œβ”€β”€ model_selection.py               # Model selection logic
β”‚   β”œβ”€β”€ plot_utils.py                    # Visualization utilities
β”‚   β”œβ”€β”€ data/                            # Training and synthetic data
β”‚   └── models/                          # Pre-trained models
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ main.py                          # Training script
β”‚   └── predict.py                       # Prediction script
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ retrofit-cost-tool-predict.ipynb # Interactive prediction interface
β”œβ”€β”€ pyproject.toml                       # Package configuration
└── README.md

Building from Source

# Install build dependencies
pip install build

# Build the package
python -m build

# Install in development mode
pip install -e .

Troubleshooting

Common Issues

"Column not found" errors:

  • Ensure your CSV has all required columns with exact names (case-sensitive)
  • Use the interactive notebook for automatic data validation

"Model not found" errors:

  • Available models: ridge_model, elastic_net_model, random_forest_model, gradient_boosting_model, ols_model, glm_gamma_model, best_model

Unexpected predictions:

  • Verify data ranges (area > 0, age β‰₯ 0, dummy variables are 0/1)
  • Check for missing values in your data

Jupyter notebook widget issues:

  • Install widget dependencies: pip install ipywidgets
  • Enable widgets: jupyter nbextension enable --py widgetsnbextension
  • Restart Jupyter after installation

Getting Help

License

NIST-developed software is provided by NIST as a public service. You may use, copy, and distribute copies of the software in any medium, provided that you keep intact this entire notice. You may improve, modify, and create derivative works of the software or any portion of the software, and you may copy and distribute such modifications or works. Modified works should carry a notice stating that you changed the software and should note the date and nature of any such change. Please explicitly acknowledge the National Institute of Standards and Technology as the source of the software.

NIST-developed software is expressly provided "AS IS." NIST MAKES NO WARRANTY OF ANY KIND, EXPRESS, IMPLIED, IN FACT, OR ARISING BY OPERATION OF LAW, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND DATA ACCURACY. NIST NEITHER REPRESENTS NOR WARRANTS THAT THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE, OR THAT ANY DEFECTS WILL BE CORRECTED. NIST DOES NOT WARRANT OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF THE SOFTWARE OR THE RESULTS THEREOF, INCLUDING BUT NOT LIMITED TO THE CORRECTNESS, ACCURACY, RELIABILITY, OR USEFULNESS OF THE SOFTWARE.

You are solely responsible for determining the appropriateness of using and distributing the software and you assume all risks associated with its use, including but not limited to the risks and costs of program errors, compliance with applicable laws, damage to or loss of data, programs or equipment, and the unavailability or interruption of operation. This software is not intended to be used in any situation where a failure could cause risk of injury or damage to property. The software developed by NIST employees is not subject to copyright protection within the United States.

Author

Juan F. Fung (juan.fung@nist.gov)

Version

1.0.0

Footnotes

  1. FEMA (1994). Typical Costs for Seismic Rehabilitation of Existing Buildings. Vol 1: Summary. FEMA 156, Second Edition. ↩

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published