This repository allows for understanding and predicting the course enrollment dynamics at Universidad de los Andes. The project combines data analysis techniques, survival models, and interactive tools to study seat availability and anticipate when courses will fill up.
The repository is divided into five main components:
- Description: Scripts and notebooks dedicated to exploring and analyzing the academic offerings by semester. Trends, demand, and seat availability in courses are studied.
- Content: Visualizations, descriptive statistics, and temporal analysis of how courses fill up.
- Description: A web page that allows students to interact with the predictive model and simulate enrollment scenarios.
- Technology: Built with Dash/Jupyter, it enables dynamic visualization and personalized queries.
- Model Used: The primary model is Random Survival Forest (RSF), which is based on decision trees for survival data. This model can handle the temporal and censored nature of enrollment data, capturing complex patterns of interactions between variables.
- Source: Data is extracted from the public course offering system at Universidad de los Andes.
- Periods Included: 2024-19, 2024-20, and 2024-21.
- Features: Enrollments were recorded every five minutes, allowing for the construction of a very accurate time series on seat availability and enrollment trends.
-
Description: RSF was selected for its ability to model event times (course filling) and handle censored data.
-
Variables Used:
- Time (30-minute intervals): Identifies patterns based on the time of day.
- Days of the Week: One-hot encoding to differentiate enrollments by day.
- Class (encoded): The course category (e.g., “ADMI”).
- Cycle (encoded): The semester in which the course is offered, distinguishing between cycles and terms.
- Course Level: Indicates whether the course is basic, intermediate, or advanced.
-
Implementation: The model is trained on historical data and used to simulate and predict seat filling in real-time.
- Description: A tool that simulates the course filling process under controlled scenarios, generating synthetic data to validate and visualize the model's performance.
- Usefulness: Allows experimentation with different configurations and validates the accuracy of the system in representative situations.
-
Python 3.x
-
Jupyter Notebook
-
Main Packages:
- pandas
- numpy
- scikit-learn
- matplotlib/seaborn
- dash (for the dashboard)
- lifelines (survival model implementation)
- random-survival-forest
Install the necessary packages with:
pip install -r requirements.txt
-
Data Exploration: Open the main analysis notebook to explore the course offering and historical data.
jupyter notebook Analisis_2024-20.ipynb
-
RSF Model Training: Run the notebook/model to train the Random Survival Forest.
jupyter notebook Modelo.ipynb
-
Interactive Dashboard: To start the dashboard and query the predictive model in real-time:
python src/app.py
-
Enrollment Simulator: Run the simulator to generate synthetic scenarios and visualize course filling.
Before:
- Download Thunder Client and SQLite.
- Create a virtual environment (Command+Shift+P).
- Verify that the model and scaler are in the Dashboard folder, generated by running the notebook from Modelo/Modelo.ipynb.
Now:
- Each time the server is restarted, delete the file
instance/enrollment.db
(it is generated automatically when running the dashboard).
- Activate the virtual environment if not already activated (
source .venv/bin/activate
). - Open two terminals with the virtual environment activated. In the left terminal, navigate to
.venv
. - In the first terminal, run the simulator with the command
fastapi run
. The simulator starts with an empty dataset and should be executed in thesimulador
folder. - In the second terminal, run the dashboard with the command
python src/app.py
. This should be executed in theDashboard
folder. Wait for it to load. - Send a request to "simulador reset" by pressing "Send" in Thunder. If this does not work, you can issue this command in a new terminal.
curl -X POST "http://localhost:8000/restart" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"dataset_name\":\"2024-20\",\"tick_interval\":10}"
-
Check the
enrollment.db
database to ensure the data is being saved. -
At this point, the Dashboard should display the historical enrollment data at the endpoint (check the Preca example and analytics with Thunder):
http://127.0.0.1:8050/api/history/{NRC}
This is a mock server implementation of the Uniandes Course Offering API. It is used for testing purposes, replicating the same API structure with historical data. Developed using FastAPI. Note that this is a stateful server.
python -m venv venv
source .venv/bin/activate # or .\.venv\Scripts\activate
pip install -r requirements.txt
fastapi run
There are 2 modes of operation:
- On demand: The server will tick each time a request is made, updating the data with the next available value in chronological order.
- Real time: The server will tick every
tick_interval
seconds, updating the data with the next available value in chronological order.
What is a tick? A tick means going one step forward in time, updating the data with the next available value in chronological order. For example, if the current time is 2021-01-01 14:30 and the next available value is 2021-01-02 14:35, a tick will update the data with the values for 2021-01-02 14:35.
On startup, the server will start in On demand mode with the default dataset. To switch to Real time mode, use the /restart
endpoint.
Restarts the server data to the initial state with new configuration. The server will start in On demand mode if the tick_interval
is set to 0, otherwise it will start in Real time mode.
Example request:
curl -X POST "http://localhost:8000/restart" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"dataset_name\":\"2024-20\",\"tick_interval\":300}"
Example response:
{
"message": "Restarted",
"dataset_name": "2024-20",
"tick_interval": 300,
"change_on_demand": false
}
Returns the current server configuration and status.
Example request:
curl -X GET "http://localhost:8000/info" -H "accept: application/json"
Example response:
{
"last_restart": "2024-07-31T17:54:34",
"simulated_time": "2024-07-30T14:30:00",
"dataset_name": "2024-20",
"tick_interval": 300,
"change_on_demand": false,
"is_done": false
}
Returns a list of courses with the number of available seats updated to reflect status at the current simulated time. The schema is the same as the original API.
Example request:
curl -X GET "http://localhost:8000/api/courses" -H "accept: application/json"
Example response:
[
{
"nrc": "39342",
"class": "ADMI",
"course": "1101",
"section": "01",
"credits": "3",
"title": "FUNDAMENTALS OF MANAGEMENT AND ADMINISTRATION",
"enrolled": "84"
}
]
http://localhost:8000/redoc
http://localhost:8000/docs