Analysis of Voter Abstention on Brazil's Presidential Elections through Data Clustering

Disclaimer: This is a personal project intended for educational purposes only

This project performs an unsupervised learning analysis on official voter abstention justification data from Brazilian elections. Using clustering techniques, it seeks to identify behavioral patterns in abstention reasons across demographic and regional variables. You can read a paper on the work and our findings here (available only in PT-BR):

Overview

The project performs data preprocessing and cleaning, transforming it for machine learning. It then applies the K-Prototypes clustering algorithm to discover hidden structures in the data. Visualizations are shown to help interpret the results and understand the composition of the identified clusters.

Project Steps

1. Data Loading & Cleaning

Load the .csv dataset containing official voter justification records.
Drop irrelevant or uniform columns (e.g., protocol numbers, identical values).
Map binary values (SIM/NAO) to numeric format.
Handle missing or ambiguous values like "NAO INFORMADO".

2. Encoding & Scaling

Ordinal encoding of ordered categories.
One-hot encoding of nominal categorical variables.

3. Clustering

Apply Elbow Method to understand optimal amount of clusters
KMeans was originally used, but later replaced for KPrototypes.

4. Cluster Analysis

Evaluate and interpret each cluster based on feature distributions.

Technologies Used

Pandas for Data Cleaning, Scikit-learn for Encoding and Clustering, KModes for the final Clustering.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
jupyter-notebook		jupyter-notebook
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis of Voter Abstention on Brazil's Presidential Elections through Data Clustering

Overview

Project Steps

1. Data Loading & Cleaning

2. Encoding & Scaling

3. Clustering

4. Cluster Analysis

Technologies Used

About

Uh oh!

Releases

Packages

Languages

License

heitornolla/Analysis-of-Voter-Abstention-through-Clustering

Folders and files

Latest commit

History

Repository files navigation

Analysis of Voter Abstention on Brazil's Presidential Elections through Data Clustering

Overview

Project Steps

1. Data Loading & Cleaning

2. Encoding & Scaling

3. Clustering

4. Cluster Analysis

Technologies Used

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages