This project performs an exploratory data analysis (EDA) on a dataset of chess games played on Lichess. The goal is to uncover key factors that influence the outcome of a chess game, such as first-move advantage, player skill, and strategic opening choices. The analysis uses Python with pandas for data manipulation and seaborn for visualization, as well as SQL for data querying and aggregation.
- Source: Chess Game Dataset (Lichess) on Kaggle.
- Content: The dataset contains over 20,000 chess games, with detailed information including player ELO ratings, opening names, number of turns, and the final outcome of each game.
This analysis seeks to answer several key questions about chess strategy:
- Does the player with the White pieces have a statistically significant first-move advantage?
- How much does the skill difference (ELO rating) between players impact the game's outcome?
- Are certain openings played more frequently than others?
- Can a poor opening choice negate the first-move advantage?
- Do games between higher-rated players last longer than those between lower-rated players?
- Prerequisites: Make sure you have Python installed, along with the following libraries:
pandas
,matplotlib
,seaborn
. - Dataset: Download the
games.csv
file from the Kaggle link above and place it in the same directory as the notebook. - Run the Notebook: Open
ChessEDA.ipynb
in JupyterLab or Jupyter Notebook and run the cells sequentially. The notebook will:- Load and clean the dataset.
- Perform univariate and bivariate analysis.
- Generate visualizations for key insights.
- Save a cleaned version of the data to a SQLite database file (
chess_analysis.sqlite
).
- A Slight Edge: The data confirms a small but statistically significant first-move advantage for White.
- Skill is Decisive: This advantage is minor compared to player skill. The higher-rated player is the overwhelming favorite to win.
- Strategy Matters: A player's opening choice can be so critical that it can even reverse the first-move advantage.
- Complexity Increases with Skill: Games between higher-rated players are, on average, significantly longer and more complex.