Machine learning system predicts English Second Division match outcomes using Logistic Regression on historical data. Features data preprocessing, feature engineering (goals, form), and visualizations (bar/pie charts). Streamlit app offers interactive predictions. Built with Python, Pandas, Scikit-learn, Matplotlib, Seaborn, Streamlit.
- Data Preprocessing: Cleans "England 2 CSV.csv" by handling missing values and filtering consistent teams.
- Feature Engineering: Derives features like average goals (
Avg_Home_Goals
,Avg_Away_Goals
) and recent form (Home_Form
,Away_Form
). - Prediction Model: Logistic Regression predicts match outcomes (home win, draw, away win) with probability scores.
- Visualizations:
- Bar charts for half-time goals, fouls, corners, and yellow/red cards.
- Pie charts for historical win/draw/loss distributions.
- Streamlit App: Interactive web interface for team selection, stat adjustments, and visualized predictions.
- User-Friendly Design: Prevents same-team selections and provides intuitive outputs.
- Python 3.8+
- Pandas & NumPy (data manipulation)
- Scikit-learn (machine learning)
- Matplotlib & Seaborn (visualization)
- Streamlit (web app)
- Pickle (model serialization)
train_and_predict.py
: Processes data, trains model, generates predictions, and creates visualizations.app.py
: Streamlit app for interactive predictions and visualizations.football_prediction_model.pkl
: Pre-trained Logistic Regression model (generate viatrain_and_predict.py
).README.md
: Project documentation.- Note:
England 2 CSV.csv
is not included; users must provide their own dataset.
- Python 3.8 or higher
- A CSV dataset (
England 2 CSV.csv
) with columns likeHomeTeam
,AwayTeam
,FTH Goals
,FTA Goals
,HTH Goals
,H Fouls
,H Corners
,H Yellow
,H Red
, etc. - Git (optional, for cloning)
-
Clone the Repository:
git clone https://github.com/SATTVIKO/football-match-prediction.git cd football-match-prediction
-
Install Dependencies: Create a virtual environment (optional) and install required packages:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install pandas numpy scikit-learn matplotlib seaborn streamlit
-
Prepare the Dataset:
- Place
England 2 CSV.csv
in the project root directory. - Ensure it matches the expected format (see Technologies Used for required columns).
- Place
-
Train the Model: Run the training script to process data, train the model, and save
football_prediction_model.pkl
:python train_and_predict.py
-
Run the Streamlit App: Launch the web application:
streamlit run app.py
Access it at
http://localhost:8501
in your browser.
- Training Script (
train_and_predict.py
):- Processes the dataset, trains the Logistic Regression model, and generates predictions for predefined matches (e.g., Blackburn vs. Portsmouth).
- Outputs visualizations (bar/pie charts) for team stats and saves the model.
- Streamlit App (
app.py
):- Open the app in your browser, select home and away teams, and adjust stats if desired.
- Click "Predict Match" to view the predicted outcome, probability distribution, and visualizations.
- Explore historical performance via pie charts and stat comparisons via bar charts.
- Prediction:
Blackburn vs Portsmouth: Home Win (H) Probabilities - H: 0.5234, D: 0.2345, A: 0.2421
- Visualizations: Bar charts for half-time goals, fouls, corners, cards; pie charts for win/draw/loss distributions.
- Integrate real-time data via sports APIs (e.g., Opta).
- Experiment with advanced models (e.g., XGBoost, neural networks).
- Add player-specific features (e.g., top scorer stats).
- Expand to other leagues (e.g., Premier League).
- Deploy to a cloud platform (e.g., Heroku) for public access.
Contributions are welcome!
For questions or feedback, reach out to sattviky@gmail.com