- Pau López
- Genís López
- Jaume López
- Daniel Gargallo
Note
Here is the summary of the technologies and tools utilized in our project:
- Python: For all the backend and data processing. // pandas
In this project, we used a dataset to predict real estate house listing prices based on 10 months of historical data. We implemented a Random Forest algorithm to make these predictions. The dataset included various features such as location, size, number of rooms, and other relevant attributes.
One of the main challenges we faced were outliers in the dataset. We had to remove these outliers to improve the accuracy of our model. We also had to deal with missing values in the dataset. We used the mean value of the feature to fill in these missing values.
To run this project, you need to have Python installed on your machine. You can install Python from the official website. You also need to install the following libraries:
pip install -r requirements.txt
To run the project, you can use the following command:
python main.py
To run the predictions, you can use the following command:
Run generate_predictions.py
During this datathon, we all learnt a LOT! We are 2 students from Informatics Engineering and 2 from Mathematics. We had some probabilistic basic knowledge, but not as close as Data Science students do have. We learnt how to use Python libraries such as Pandas, Numpy, Scikit-learn, etc. We also learnt how to use Jupyter Notebooks and how to work with datasets, clean them, apply quantiles, NaN to mean conversions. We also learnt how to implement a Random Forest algorithm to make predictions and explode categorical features while condensing them to reduce the amount of columns.