This project implements a robust time series forecasting pipeline to predict the closing prices of the IBOVESPA index, Brazil’s main stock market benchmark. It combines advanced data preprocessing techniques using scikit-learn with a deep learning architecture that integrates convolutional layers (Conv1D), recurrent layers (LSTM and GRU), and dense layers. The model incorporates regularization (L2) and dropout to prevent overfitting. Additionally, the training experiments and results are systematically tracked using MLflow to ensure reproducibility and facilitate hyperparameter tuning.
flowchart LR
A["📥 Input (DataFrame)"] --> B["⚙️ Preprocessor (preprocessor.pkl)"]
B --> C["🤖 Model (Keras)"]
C --> D["🔄 Postprocessor (postprocessor.pkl)"]
D --> E["📤 Transformed Output"]
Open the notebook 0.0-amp-main.ipynb and run all cells.
flowchart LR
A["🔄 Start"]
B["📦 dataset.py"]
C["🧮 features.py"]
D["🎯 tuning.py"]
E["🤖 train.py"]
F["🔮 predict.py"]
G["🏁 End"]
A --> B --> C
C --> D
C --> E
D --> F
E --> F
F --> G
- Data normalization and inverse transformation with scikit-learn for effective model training and interpretability.
- A hybrid neural network architecture designed for sequential financial data.
- Use of MLflow for experiment tracking and model management.
- Modular codebase enabling experimentation with training parameters and architectures.
- Modular architecture with strategy and template patterns for easy extensibility and maintenance.
- Reproducible ML pipelines orchestrated and tracked with MLflow.
- Decoupled configuration using .yaml files.
- Integration with notebooks and dashboards for data exploration and results presentation.
- Clear separation of concerns between ingestion, transformation, modeling, and visualization.
Run the following commands in your terminal (requires-python = "~=3.10.0"):
git clone https://github.com/moreira-and/time-series-prediction-rnn.git
cd time-series-prediction-rnn
python -m venv .venv
.venv\Scripts\activate.bat # On Linux e macOS: source .venv/bin/activate
pip install -r requirements.txt
This file maps external data series by source. It supports:
yfinance
: Financial tickers from Yahoo Finance (e.g.,^BVSP
,BTC-USD
).bcb
: Economic indicators from Brazil's Central Bank (SGS). Find series IDs at www3.bcb.gov.br/sgspub.DataReader
: Macroeconomic series (e.g., FRED/USA). Use fred.stlouisfed.org to find codes likeDEXBZUS
,CPIAUCSL
.
The strategies load data dynamically based on these IDs. Just update the YAML—no code changes needed.
After running the notebook 0.0-amp-main.ipynb, all experiments and metrics are tracked using MLflow.
To open the MLflow tracking UI locally, activate a virtual environment and run the command below in the root directory:
mlflow ui
├── LICENSE <- Project license.
├── Makefile <- Utility commands for automation (e.g., make train).
├── README.md <- Main project description.
├── configs/ <- YAML configuration files for datasets.
│ ├── dataset.yaml <- Input parameters, normalization, and splits.
│ └── model.yaml <- Model architecture and hyperparameters.
├── data/ <- Data organized by processing stage.
│ ├── raw/ <- Raw, original data.
│ ├── processed/ <- Data prepared for modeling (arrays, pickles).
│ └── predicted/ <- Prediction results in production.
├── docs/ <- Project documentation (e.g., mkdocs).
├── mlruns/ <- Directory managed by MLflow (experiment tracking).
├── models/ <- Trained and serialized models (.h5, .pkl, etc.).
├── notebooks/ <- Notebooks for experimentation and exploration.
│ └── 0.0-amp-main.ipynb <- Main project execution pipeline.
├── pyproject.toml <- Python package metadata and configurations.
├── references/ <- Data dictionaries and supporting materials.
├── reports/ <- Analytical outputs (reports, charts, dashboards).
│ ├── figures/ <- Automatically generated figures.
│ └── pbi/ <- Power BI dashboards (e.g., amp-fynance.pbip).
├── requirements.txt <- Python dependencies.
├── src/ <- Core project source code.
│ ├── config.py <- Global variables and configuration loading.
│ ├── dataset.py <- Data loading and orchestration logic.
│ ├── features.py <- Feature extraction from data.
│ ├── modeling/ <- Model training, tuning, and prediction.
│ │ ├── train.py <- Script to train the machine learning model.
│ │ ├── tune.py <- Script to optimize model hyperparameters.
│ │ └── predict.py <- Script to make predictions with the trained model.
│ ├── plots.py <- Custom plotting and visualization.
│ └── utils/ <- Domain-specific strategies and utilities.
│ ├── dataset/ <- Cleaning, calendar, and loading strategies.
│ ├── features/ <- Pre/post processors, splitters, transformers.
│ ├── log/ <- Custom logging strategies.
│ ├── predict/ <- Templates and wrappers for prediction and MLflow.
│ ├── train/ <- Callbacks, compilation, and training templates.
│ └── tune/ <- Hyperparameter tuning strategies.
└── tests/ <- Automated unit and integration tests.
Tinker With a Neural Network Right Here in Your Browser: playground.tensorflow