Predicting School Performance

Introduction to Data Science

Fundação Getulio Vargas - July 2019

This contest aimed at predicting how a high schools will perform in the Brazilian national exam ENEM.

Four datasets containing information about schools in São Paulo city were provided (they are in the Data folder). The file ENEM2015.csv contains the classification of the high schools in ENEM 2015, classifying each school in a scale from 0 to 4, where 4 corresponds to the best schools. The goal was to use information from the different sources in order to predict the performance of the students in the ENEM exam in 2015.

In order to build the classification model, the following steps were taken:

I extracted from each data set information related to the schools listed in the file ENEM2015.csv. Each school had a unique identification number, which was used to find the school in each data set.
I built a new data set (schools.csv), where the columns are a subset of the columns from the given data sets.
I used the new data set to perform training and test for a variety of tree-based models.

I used accuracy as the score to evaluate the prediction capacity of each model.

Files

Two notebooks were created to perform the analysis:

DataWrangling.ipynb: includes all the steps taken to clean and transform the data previous to the modeling stage; it generates a file (Data/Schools.csv) with the dataset to be used in the subsequent analyses.
Modeling.ipynb: loads and runs the tree-based models; it includes the steps taken to train andtest each model, as well as the search for the optimum configurations of hyperparameters; its final output is a list of the prediction accuracy obtained with each model.

All data files are stores in the /Data folder.

Requirements

To run the notebooks, it is necessary to have pandas, seaborn and scikit learn installed in a python 3.6 (or a posterior version) environment.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
Data		Data
DataWrangling.ipynb		DataWrangling.ipynb
DataWrangling.py		DataWrangling.py
Dicionário de Dados por Arquivo.md		Dicionário de Dados por Arquivo.md
Modeling.ipynb		Modeling.ipynb
Modeling.py		Modeling.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting School Performance

Files

Requirements

About

Uh oh!

Releases 1

Packages

Languages

vmorenojr/SchoolPerformance

Folders and files

Latest commit

History

Repository files navigation

Predicting School Performance

Files

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages