This project uses machine learning to predict NBA players' "Win Rating" based on their game statistics. It demonstrates data preprocessing, exploratory data analysis, and linear regression modeling using Python and popular data science libraries.
The dataset contains statistics for 4000 NBA players, including:
season
: The season (yearly) the player played inposs
: Possessions playedmp
: Minutes playeddo_ratio
: A player's ratio of time spent in defense vs. offense; negative values mean more defense positioningpacing
: Player impact on team possessions per 48 minuteswin_rating
: Wins Above Replacement rating, how many additional wins a player is worth over a same-level replacement
- Data loading and preprocessing
- Exploratory data analysis
- Data preparation for modeling (train/test split, scaling)
- Training a linear regression model
- Model evaluation
- Performance comparison with different feature sets
- Prediction for a new player
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
To run this project, you need to install the following dependencies:
pip install pandas numpy scikit-learn matplotlib seaborn
- Clone this repository
- Ensure the 'NBA.csv' data file is in the same directory as the notebook
- Open and run the Jupyter notebook '2_Your_first_Regression_Challenge.ipynb'
The final model using multiple features (poss
, mp
, do_ratio
, pacing
) achieved an R² score of 0.64 on the test set, which is a significant improvement over the single-feature model.
Contributions to this project are welcome. Please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE
file for more details.