This project explores financial loan data to uncover patterns in interest rates, loan amounts, borrower risk, and predictive modeling. Using Generalized Linear Models (GLM), LASSO Regression, and Chi-Square Tests, we analyze factors influencing loan approvals, interest rates, and defaults.
- Dataset: Financial Loan Dataset from Kaggle
- Total Observations: 10,000 records
- Selected Features: 26 variables out of 55 total
- Objective: Predict loan default probability and analyze borrower attributes
Variable | Description |
---|---|
interest_rate |
Loan interest rate assigned to a borrower |
annual_income |
Borrower’s reported annual income |
debt_to_income |
Debt-to-income ratio affecting loan risk |
homeownership |
Borrower’s homeownership status (Own, Rent, Mortgage) |
loan_status |
Loan repayment status (Default/No Default) |
loan_purpose |
Purpose of the loan (Debt Consolidation, Home Improvement, etc.) |
total_credit_utilized |
Total credit used by the borrower |
num_mort_accounts |
Number of mortgage accounts owned by the borrower |
- What are the trends in loan interest rates?
- Are higher incomes correlated with lower interest rates?
- How does homeownership status affect interest rates?
Higher debt-to-income ratios show increased risk, leading to higher interest rates.
- Are certain loan purposes more common among homeowners vs. renters?
- Do homeownership types influence the loan amounts received?
- Statistical Method Used: Chi-Square Test & ANOVA
Debt consolidation is the most common loan purpose across all homeownership statuses.
- Model Used: Logistic Regression with LASSO Regularization
- Predictor Variables: Annual Income, Debt-to-Income Ratio, Homeownership, Employment Length
The ROC curve shows a good model fit with an AUC of 0.72.
Feel free to fork this repository and suggest improvements.
🚀 Happy Analyzing!