Skip to content

pooruss/ML-Framework-for-Diverse-Applications-in-Trading-and-Finance

Repository files navigation

Framework for ML in Finance

📚 Background

Machine learning needs in the finance field: Machine learning algorithms can be used in many aspects such as risk management, asset management, market analysis and trading strategies, and have become a key tool in the field of finance and trading.

Goals and Significance

Goals

  • Develop a custom Python machine learning framework
  • Intergrate basic machine learning algorithms implemented through packages such as numpy
  • Evaluate the algorithms in a variety of financial scenarios

Significance

  • Develop a machine learning framework for the financial field to provide solutions more suitable for financial problems instead of just relying on general machine learning libraries
  • A framework for machine learning beginners to quickly get started with basic machine learning algorithms

📈 Applications

Problem Solution Algorithm
Risk Management Classification, Regression Principle Component Analysis(PCA), Adaboost
Financial Fraud Detection Classification, Clustering K-Nearest Neighbor(KNN), K-Means, DBSCAN
Customer Relationship Management Classification Naive Bayes, Adaboost
Financial Forecast Regression Support Vector Machine(SVM)
Investment and Asset Management Regression Linear Regression

🎮 Other Machine learning Algorithms

  • Multilayer Perceptron, used for classification / regression, can be applied to promotion, fraud detection and so on.
  • Decision Tree, used for classification, can be applied to direct marketing, risk management and so on.
  • ...

Main Content

✨Here is an overview of this framework.


🔧 Setup

  • Install.
pip install -i requirements.txt

🕹️ Run

  • Init weight. Create a model config yaml file under ./config/, which indicate the initial weight name and value of the model. Examples can be found in the existing config directory.
  • Write the bash command under ./scripts/. Examples can be found in the existing config directory.
bash scripts/run_svm.sh

During running, you need to enter natural language that describe how you would like to preprocess the data. After model training, you also need to enter the evaluation metric and the visualization method you would use.

📁 Main File Sturcture

├── main.py
├── pipeline.py
├── ...
├── /dataset/
│  ├── base.py
│  ├── risk_management.py
│  ├── investment_and_asset_management.py
│  ├── ...
│  └── finance_prediction.py
├── /algorithms/
│  ├── base.py
│  ├── svm.py
│  ├── linear_regression.py
│  ├── ...
│  └── pca.py
├── /evaluate/
│  └── utils.py
├── /visualization/
│  └── utils.py

📊 Results and Evaluation

Common financial problems that can be solved by applying machine learning methods can be categorized into the following five categories: financial fraud detection, customer relationship management, financial forecasting, risk management, investment and asset management. In each class of problems, a suitable dataset as well as a reasonable method was selected for testing and validation, and the results are as follows.

Financial fraud detection

  • Dataset: Credit Card Fraud Detection Dataset 2023
  • Algorithms: k-Nearest Neighbor

The K-nearest neighbors (KNN) algorithm applied to classify credit card fraud dataset achieved an accuracy of 78.9%. With precision values of 74.2% for identifying non-fraudulent transactions and 86.0% for detecting fraudulent transactions, the algorithm demonstrates good performance in accurately predicting both classes. The recall values of 88.9% for non-fraudulent transactions and 68.9% for fraudulent transactions indicate that the algorithm effectively captures a high proportion of both classes. Overall, the KNN algorithm shows promising performance in accurately classifying credit card fraud, with balanced precision and recall values.

Customer relationship management

  • Dataset: Credit Card Customer Churn Prediction
  • Algorithms: Naive Bayes

The Naive Bayes algorithm for predicting customer churn achieved an accuracy of 82.1%, with precision values of 85.1% for non-churned customers and 57.3% for churned customers. However, the recall values were 94.2% for non-churned customers and 32.1% for churned customers, indicating room for improvement in accurately identifying churned customers.

img

Financial forecasting

  • Dataset: Stock Price of Netflix
  • Algorithms: SVM

Given history volumns, predict if a stock’s open price is higher or lower than the close price. The label is 0 if the close price is higher than the open price, and is 1 if the open price is higher than the close price. Predictions were 56% accurate. However, all predictions are label 1, might be overfitted.

img

Risk management

  • Dataset: creditcard_2023
  • Algorithms: Principal Component Analysis(PCA)

The Silhouette Coefficent is 0.1366, indicating that the data after principal component analysis (PCA) has a certain degree of clustering structure.The two principal components together retained approximately 81.44% of the total variance, providing a fairly good dimensional compression of the original data.

img

Investment and asset management

  • Dataset: boston_house_prices
  • Algorithms: Linear Regression

This output provides a prediction model based on this dataset.Meanwhile, it gives 4 evaluation function.Mse=54.39 MAE=4.97 RMSE=7.37 R2=0.18. These four evaluation functions indicate that the predicted results are very close to real data, and the model performs well.

img

Future Work

  • Develop interactive UI
  • Intergrate More algorithms
  • Encapsulate the data preprocessing process to reduce the cost of getting started

About

Final project of COMP 7409 Machine Learning in Trading and Finance – Group 7.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •