This project is a collaboration between the Pacmann Data School's Introduction to Machine Learning (IML) and Machine Learning Process (MLP) courses.
Financial risk prediction is an important topic in financial analysis because it can assist businesses in reducing financial distress and taking appropriate actions in the future. Bankruptcy prediction has been a topic of discussion for nearly a century, and it remains one of the hottest topics in economics. The goal of predicting financial institutions' distress is to create a predictive model that combines various econometric measures and allows one to forecast a firm's financial situation at least 1 year in the future. Predicting corporate bankruptcy is critical for both investors and businesses.
The IML project is primarily concerned with machine learning modeling experiments. The 2 notebooks folder contains the source code for the modeling and testing scenarios that were completed. See: for a detailed modeling
https://docs.google.com/document/d/1wBAJakg23FQ721Ep7cdm6Ab7pohvkP0rBosVS8d8siY/edit?usp=sharing
The MLP project focuses on end-to-end machine learning workflows, beginning with understanding business problems, deriving business metrics, machine learning metrics, and creating fire services for the backend and front end services until they are ready for server deployment. Details about the report can be found at:
https://docs.google.com/document/d/10AAaHIiPcAMV1gkTiMz54yJ8PeRf4Cv_hlIvaSW_DGQ/edit?usp=sharing
- Check that you have Docker installed. If not, search for "How to install Docker on Windows/Ubuntu/Os that you used."
- Make sure you have Git installed as well. If not, search for "How to install git on Windows/Ubuntu/Os that you used."
- Make a clone of this repository :
git clone https://github.com/bimapriambodo/Company_Bankruptcy_Prediction-.git
- Navigate to the cloned folder's directory. Open a CMD terminal and compile the docker file as follows:
docker-compose up
Wait until everything is completed. If your internet connection is fast, the process will be as well.
- To test Api, open a browser and enter the following address :
http://0.0.0.0:8080/docs
- To try Streamlit, launch a browser and type the following address :
http://172.18.0.3:8501/
Congratulations, you are now free to use the application !!!
Input Api uses json format like this:
{
"Attr21": 0.0,
"Attr13": 0.0,
"Attr27": 0.0,
"Attr34": 0.0,
"Attr24": 0.0,
"Attr35": 0.0,
"Attr51": 0.0,
"Attr6": 0.0,
"Attr56": 0.0,
"Attr49": 0.0,
"Attr40": 0.0,
"Attr55": 0.0,
"Attr50": 0.0,
"Attr58": 0.0,
"Attr43": 0.0
}
VARIABLES | DATA TYPE |
---|---|
Attr21 | FLOAT |
Attr13 | FLOAT |
Attr27 | FLOAT |
Attr34 | FLOAT |
Attr24 | FLOAT |
Attr35 | FLOAT |
Attr51 | FLOAT |
Attr6 | FLOAT |
Attr56 | FLOAT |
Attr49 | FLOAT |
Attr40 | FLOAT |
Attr55 | FLOAT |
Attr50 | FLOAT |
Attr58 | FLOAT |
Attr43 | FLOAT |
Many scenarios have not yet been tried and would be quite interesting iterative combinations to investigate as material for future works, for example,
-
We do not remove outliers from the data in this experiment because we believe outliers are a natural phenomenon and provide information. However, what can be tested in future works if the outliers are removed?
-
Standard scalers are not the only option for standardizing data. There is a min/max scaler as well as other methods. What do you think if you compare it with other methods? Is there any other influence? It's possible.
-
Not only are machine learning methods used in this trial, but many are widely used, such as neural networks, Cat-boost, light GBM, and others. Can try to be compared, it appears to be interesting.
-
We do not perform feature engineering or variable reduction based on existing variables. Then, for example, try dimensional reduction using PCA, how does it work?
-
Deploy to Cloud Services !
We hope that this trial will provide an overview of highly influential iterative processes in data science. Matt, stay hungry and foolish!
Chang, Hanxu. 2019. The Application of Machine Learning Models in Company Bankruptcy Prediction. ICSEB: International Conference on Software and e-Business. Tokyo: Japan
Zieba,Maciej. 2016. Ensemble Boosted Trees with Synthetic Features Generation in Application to Bankruptcy Prediction. Expert Systems with Applications: An International Journal
Thi Kha Nguyen. 2018. PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS. JOURNAL OF SCIENCE AND TECHNOLOGY, NO. 12(133). THE UNIVERSITY OF DANANG
https://chirag-sehra.medium.com/decision-trees-explained-easily-28f23241248
https://ai.plainenglish.io/xgboost-in-a-nutshell-211e170e8b48
https://towardsdatascience.com/curse-of-dimensionality-a-curse-to-machine-learning-c122ee33bfeb