This project applies machine learning to predict the future trend in the VN30 Futures Index, making it actively following the trend without lagging behind like other common momentum strategies. The backtest results show promising performance.
The futures market has high votility, resulting in high returns and also huge risk. Common momentum strategies rely on indicators to make decisions. However, as indicators only reflect past states based on past data, they often late to the trend.
With the raise of machine learning in many fields, particularly in time-series forecasting, we are motivated to utilize it for solving the lagging problem of momentum strategies.
Prediction model: a customized Time Series Transformer model.
- Context length: 700 tokens (5 mins interval/token -> ~14 trading days).
- Prediction length: 50 tokens (5 mins interval/token -> ~1 trading day).
- Input features:
- OHLCV of VN30 and VN30F1M
- Time features: time of day, day of year, days until next expiration.
Output features: H and L of VN30F1M- Output: A Student's t-distribution for each value.
For each predicted series, open at most 1 position, only 1 contract per position.
Unlimited holdings, as long as
Let
if maximum - minimum > FEE:
# Fee, by default, is 0.47 per closed position
if minima < maxima:
pos_type = "LONG"
entry_point = minimum
take_profit_point = maximum
stop_loss_point = minimum - 0.1
elif maxima < minima:
pos_type = "SHORT"
entry_point = maximum
take_profit_point = minimum
stop_loss_point = maximum + 0.1
- OHLCV of VN30 and VN30F1M
- Interval: 1-minute
- Source: SSI FastConnect API
- Range: 20/12/2019 - 16/01/2025
- Align records of VN30 and VN30F1M by trading time
- Merge records in a day into 5-minutes intervals.
- For each record, add:
- time of day
- day of year
- days until next expiration
- Split:
- 60% for model training
- 20% for model validation and in-sample backtesting
- 20% for model testing and out-of-sample backtesting
The results in this document were produced on an Nvidia RTX 3080Ti.
We also provide the results on an CPU-only device with an Intel Core Ultra 7 265K at ./results_cpu
.
Set a debug environment variable to ensure reproducibility on CUDA device:
export CUBLAS_WORKSPACE_CONFIG=:4096:8 # for Linux
set CUBLAS_WORKSPACE_CONFIG=:4096:8 # for Windows command line
$env:CUBLAS_WORKSPACE_CONFIG = ":4096:8" # for Windows PowerShell
Create and activate virtual environment:
python3 -m venv .venv
source .venv/bin/activate # for Linux/MacOS
.\venv\Scripts\activate.bat # for Windows command line
.\venv\Scripts\Activate.ps1 # for Windows PowerShell
which python3
Upgrade pip:
pip3 install --upgrade pip
Install Pytorch:
# For Linux and Windows, CUDA
pip3 install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
# For Linux and Windows, CPU-only
pip3 install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
# For MacOS
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
Install other dependencies:
pip3 install -r requirements.txt
The processed data is available at data/data.json
.
Put your SSI FastConnect Data API information into configs/fc_data_config.py
:
consumerID = 'fb20f607926a447fa50c83xxxxxxxxxx'
consumerSecret = '478c1923481c48858c8b3dxxxxxxxxxx'
Run the script to collect and process data:
python3 collect_data.py
The data will be stored in data/data.json
.
The trained model is available at model_checkpoint/checkpoint.pt
.
python3 train.py
The model will be stored at model_checkpoint/checkpoint.pt
.
Go to configs/config.py
and set
backtest_optimized_algo = False
if you want to use the naive algorithm, or
backtest_optimized_algo = True
optimized_algo_params = {
"p_highs": 0.39,
"p_lows": 0.66,
"p_stoploss": 0.01,
"using_dp": False
}
to use optimized algorithm and change it's parameters.
Run the python script for backtesting:
python3 backtest.py
In configs/config.py
, set the number of parallel processes, the total number of startup trials and the total number of trial:
TOTAL_PROCESSES = 8
n_startup_trials = 160
n_trials = 1600
Run the python script to start optimizing:
python3 optimization.py
# trading agent configuration
BALANCE = 1000
FEE = 0.47
MARGIN_RATIO = 0.175
ASSEST_RATIO = 0.8
# where the output results from backtest and optimization go
results_dir = "./results"
ROI | Trades | Win rate | MDD | Sharpe ratio |
---|---|---|---|---|
0.05% | 88 | 3.41% | -3.04% | -1.03 |
ROI | Trades | Win rate | MDD | Sharpe ratio |
---|---|---|---|---|
-0.49% | 55 | 1.82% | -3.10% | -1.09 |
Let
Let
Pseudo code:
for i in range(n):
dp[i] = dp[i - 1]
for j in range(1, i - 1):
dp[i] = max(dp[i], dp[j - 1] + f(j, i))
After that, we run backtracking to get the list of trades that result in our best returns
The model's prediction provide us with the probability distributions. We can use them to make adjustment.
For a cumulative distribution function (cdf):
We have the inverse cdf:
Let
And the stoploss is adjusted by:
in case of SHORT position, or:
in case of LONG position.
- Objective: Maximizing Sharpe ratio
- Sampler: TPESampler
- Number of startup trails: 160
- Number of trails: 1600
- Number of parallel processes: 8
- Parameters to search for:
$p_H$ $p_L$ $p_{stoploss}$ - using_dp
Parameter | Value |
---|---|
0.39 | |
0.66 | |
0.01 | |
using_dp | False |
Algorithm | ROI | Trades | Win rate | MDD | Sharpe ratio |
---|---|---|---|---|---|
Naive | 0.05% | 88 | 3.41% | -3.04% | -1.03 |
Optimized | 26.93% | 139 | 8.63% | -4.00% | 2.57 |
Algorithm | ROI | Trades | Win rate | MDD | Sharpe ratio |
---|---|---|---|---|---|
Naive | -0.49% | 55 | 1.82% | -3.10% | -1.09 |
Optimized | 12.95% | 139 | 7.19% | -2.87% | 1.45 |