This project provides a framework for generating realistic synthetic equity option market data using Generative Adversarial Networks (GANs). It is an implementation based on the methodologies presented in the paper "Deep Hedging: Learning to Simulate Equity Option Markets" by Wiese et al. (2019) [arXiv:1911.01700].
Financial modeling and the backtesting of trading strategies often require large datasets that capture complex market dynamics. Real-world data can be limited or costly. This framework aims to address this by providing tools to simulate option market behavior, specifically focusing on the generation of realistic Discrete Local Volatility (DLV) surfaces over time.
- GAN-based Simulation: Leverages Generative Adversarial Networks to learn the complex dynamics of multivariate financial time series (option surfaces).
- Paper Replication: Implements key components described in Wiese et al. (2019):
- LSTM-GAN: A recurrent architecture for sequence generation.
- TCN-GAN: A temporal convolutional network architecture for sequence generation.
- PCA Compression: Uses Principal Component Analysis (PCA) to reduce the dimensionality of high-dimensional option surface data (DLVs) before feeding it to the GANs.
- Modular Design: Code is organized into distinct modules for:
- Data Loading & Preprocessing (
src/data
) - Generative Models (GANs) (
src/models
) - Evaluation Metrics & Visualization (
src/evaluation
) - Utility Functions (
src/utils
)
- Data Loading & Preprocessing (
- Synthetic Data Generation: Includes functionality to create an initial synthetic dataset for demonstration and testing purposes.
- Evaluation Framework: Provides tools to quantitatively and qualitatively assess the realism of the generated data compared to the source data (using metrics like ACF, distribution comparison, etc.).
- Extensibility: Designed with the potential to incorporate new generative models, different data sources (e.g., real market data), other asset classes (e.g., futures), or custom evaluation metrics.
A Conda environment is recommended for managing dependencies.
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Create and activate Conda environment:
conda create -n gansynth python=3.11 # Or your preferred Python 3.x version conda activate gansynth
-
Install dependencies:
- Install core dependencies:
pip install -r requirements.txt
- TensorFlow Note: Ensure you have a compatible version of TensorFlow installed for your system (CPU/GPU). The
requirements.txt
specifiestensorflow>=2.4.0
. Installation might vary based on your OS and hardware (especially on M1/M2 Macs). You might have installed this separately in your conda environment already. - Other Dependencies: The
requirements.txt
should cover other necessary packages likenumpy
,pandas
,scikit-learn
,matplotlib
,seaborn
,statsmodels
. If you encounterModuleNotFoundError
during runtime, install the missing package using pip or conda.
- Install core dependencies:
The main script to run the simulation pipeline is run.py
.
-
Activate your environment:
conda activate gansynth
-
Run the simulation:
python run.py [options]
Example: Train both LSTM-GAN and TCN-GAN for 50 epochs using the generated synthetic data:
python run.py --train_lstm --train_tcn --epochs 50
Key Command-Line Arguments:
--train_lstm
: Train the LSTM-GAN model.--train_tcn
: Train the TCN-GAN model.--epochs INT
: Number of training epochs (default: 10).--n_samples INT
: Number of time samples for the synthetic dataset (default: 1000).--n_strikes INT
: Number of option strikes (default: 8).--n_maturities INT
: Number of option maturities (default: 4).--n_components INT
: Number of PCA components for dimension reduction (default: 5).--sequence_length INT
: Input sequence length for models (default: 10).--noise_dim INT
: Dimension of the noise vector for GAN generators (default: 32).--generate_length INT
: Length of the sequence to generate during evaluation (default: 100).--log_dir PATH
: Directory to save logs and model weights (default: 'logs').--plots_dir PATH
: Directory to save evaluation plots and metrics (default: 'plots').--seed INT
: Random seed for reproducibility (default: 42).
Use
python run.py --help
to see all available options. -
Outputs:
- Logs & Models: Training logs and saved model weights (
.weights.h5
files) are stored in timestamped subdirectories within the specified--log_dir
(e.g.,logs/20231027-120000/LSTM-GAN/
). - Evaluation Results: Comparison metrics (
model_comparison.csv
) and visualization plots (.png
files comparing statistical properties) are saved in a timestamped subdirectory within the specified--plots_dir
(e.g.,plots/20231027-120000/
).
- Logs & Models: Training logs and saved model weights (
-
Demo Notebook:
- Explore the framework interactively using the Jupyter notebook:
notebooks/demo.ipynb
.
- Explore the framework interactively using the Jupyter notebook:
.
├── logs/ # Stores training logs and model weights
├── notebooks/
│ └── demo.ipynb # Interactive demo notebook
├── plots/ # Stores evaluation results (metrics, plots)
├── src/ # Source code
│ ├── data/ # Data loading, transformation, PCA
│ ├── evaluation/ # Evaluation metrics and visualization
│ ├── models/ # GAN model implementations (Base, LSTM, TCN)
│ ├── utils/ # Helper functions
│ ├── __init__.py
│ └── main.py # Main simulation pipeline logic
├── README.md # This file
├── requirements.txt # Python dependencies
└── run.py # Script to execute the main pipeline
The framework evaluates the quality of the generated time series by comparing their statistical properties to the original data. This includes:
- Visual comparison of time series paths.
- Distribution analysis (histograms, Q-Q plots).
- Autocorrelation Function (ACF) comparison.
- Other relevant financial time series metrics (defined in
src/evaluation/metrics.py
).
Results are saved in the --plots_dir
for each run, allowing comparison between different models (LSTM vs. TCN) and configurations.
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs, feature requests, or suggestions.
[Specify License Here - e.g., MIT License]
If you use this code or build upon the concepts in your research, please cite the original paper:
@misc{wiese2019deep,
title={Deep Hedging: Learning to Simulate Equity Option Markets},
author={Magnus Wiese and Lianjun Bai and Ben Wood and Hans Buehler},
year={2019},
eprint={1911.01700},
archivePrefix={arXiv},
primaryClass={q-fin.CP}
}
Link: https://arxiv.org/abs/1911.01700
This project is based on the work of Magnus Wiese, Lianjun Bai, Ben Wood, and Hans Buehler.