Tensor Symbolic Regression Symbolic Regression for Predicting Stock Jumps and Streaks
This project uses symbolic regression to discover interpretable mathematical expressions that can predict rare but meaningful stock price behaviors—like jumps (sudden increases) or upward streaks—based on historical price patterns. The goal is not just prediction, but understanding.
🔍 Project Purpose Predict price jumps — e.g., stock closes 1%+ higher within 5 days.
Predict streaks — e.g., 3+ consecutive daily increases.
Evolve readable equations that generalize well.
Compare to logistic regression to evaluate interpretability vs. raw performance.
Focus on recall and precision for rare events rather than accuracy.
⚙️ How It Works Downloads data using yfinance for selected tickers (e.g., NVDA, QQQ, etc.).
Adds target labels:
JumpTarget: future price rise exceeds threshold (e.g., 1%).
StreakTarget: streak of consecutive daily closes higher than opens.
Builds time-windowed input tensors (flattened sequences).
Runs ensemble symbolic regression models using PySR.
Outputs top formulas (hall_of_fame.csv) for inspection.
Compares with logistic regression baseline (accuracy, F1, confusion matrix).
SAMPLE OUTPUT JumpTarget ~ sin(x173 / (x173 - x137)) Where: x137 = ^DJI_Low_6 x173 = ^IXIC_Close_8
📁 File Overview main.py — pipeline driver
load_data.py — loads multi-ticker historical data
make_targets.py — creates classification targets (jump, streak)
make_windows.py — turns time-series into windowed flat input
train_model.py — symbolic regression and logistic baseline
outputs/ — symbolic formulas, metrics, and intermediate results
🛠️ To Run Clone the repo
Install requirements
Run: python main.py
🚧 Future Improvements Balance class imbalance with smarter sampling or weighting
Add support for real-time prediction
Tune symbolic regression settings for interpretability vs. F1
Visualize equation surfaces and decision boundaries
📜 License MIT License