Skip to content
/ SR4MDL Public

Official implementation of "Symbolic Regression via MDLformer-guided Search: From Minimizing Prediction Error to Minimizing Description Length" (ICLR 2025) as well as its extended version "An MDL-oriented Search Framework for Symbolic Regression" (submitting to TPAMI)

Notifications You must be signed in to change notification settings

yuzhTHU/SR4MDL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SR4MDL

Official implementation of the paper: Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length (ICLR 2025), as well as its extended journal submission version:An MDL-oriented Search Framework for Symbolic Regression (submitting to TPAMI)

NOTE: We are organizing the code and data for the extended version (An MDL-oriented Search Framework for Symbolic Regression), which will be updated in a week (before Sep. 18, 2025).

Installation

Before starting, you may wanna create a virtual environment to avoid conflicts with other packages:

conda create --prefix ./venv python=3.12 -y
conda activate ./venv

Our code is based on nd2py library, which is a symbolic system written in pure Python. You can install it via pip or clone the repo:

# Install via pip
pip install git+https://github.com/yuzhTHU/nd2py

# Or clone the repo
git clone https://github.com/yuzhTHU/nd2py nd2py_package
pip install ./nd2py_package

Train

To train the MDLformer model, you can run the following command:

python train.py --name demo

It will train the model on the synthetic dataset and save the model in the ./results/train/demo/ directory.

Test

To test the trained MDLformer model, you can run the following command:

python test.py --name demo --load_model ./results/train/demo/checkpoint.pth

Symbolic Regression

To use the trained MDLformer model for symbolic regression, you have to:

  1. Move the trained model to ./weights/checkpoint.pth. (We provided a trained model in the Github release page as well as Dropbox)
  2. Run the following command:
python search.py --load_model ./weights/checkpoint.pth --name demo --function "f=x1+x2*sin(x3)"

The running result will be shown in the terminal, as well as saved in the ./results/search/demo/ directory and ./results/aggregate.csv file.

If you wanna test this model on Feynman & Strogatz dataset, you have to:

  1. Install PMLB package from https://github.com/EpistasisLab/pmlb (pip install pmlb is not recommended since it does not contains these datasets, see https://epistasislab.github.io/pmlb/using-python.html)
cd data
git clone https://github.com/EpistasisLab/pmlb pmlb
pip install ./pmlb
cd ..
  1. Run the following command:
python search.py --load_model ./weights/checkpoint.pth --name demo --function "Feynman_II_27_18"

The running result will be shown in the terminal, as well as saved in the ./results/search/demo/ directory and ./results/aggregate.csv file.

Run in SRBench

To test our method on the SRBench benchmark, you have to:

  1. Clone the SRBench repo from here, save it to ./benchmark/srbench/ directory:
git clone https://github.com/cavalab/srbench ./benchmark/srbench/
  1. Create ./srbench/experiment/methods/sr4mdl directory, and create an empty __init__.py file in it.
mkdir -p ./benchmark/srbench/experiment/methods/sr4mdl
touch ./benchmark/srbench/experiment/methods/sr4mdl/__init__.py
  1. Move regressor.py to ./srbench/experiment/methods/sr4mdl/, remember to replace /path/to/weights/checkpoint.pth with the path to the trained model.
cp ./regressor.py ./benchmark/srbench/experiment/methods/sr4mdl/regressor.py
# Replace the /path/to/weights/checkpoint.pth to, for example, ./weights/checkpoint.pth
  1. Run the following script starting from the ./srbench/experiment/ directory:
#!/bin/bash
method=sr4mdl
for seed in 29910 14423 28020 23654 15795 16850 21962 4426 5390 860; do
for noise in 0.000 0.001 0.01 0.1; do
for exp in strogatz_vdp2 feynman_I_6_2a strogatz_bacres2 strogatz_bacres1 feynman_II_27_18 feynman_II_3_24 feynman_I_6_2 feynman_II_8_31 feynman_I_12_1 feynman_I_12_5 feynman_I_14_4 feynman_I_39_1 strogatz_vdp1 feynman_I_25_13 feynman_I_26_2 feynman_I_29_4 strogatz_barmag1 feynman_II_11_28 feynman_II_38_14 strogatz_glider1 feynman_III_12_43 strogatz_shearflow2 strogatz_shearflow1 strogatz_predprey2 strogatz_barmag2 strogatz_predprey1 strogatz_lv2 strogatz_lv1 feynman_I_34_27 strogatz_glider2 feynman_I_12_4 feynman_III_17_37 feynman_I_43_31 feynman_I_14_3 feynman_III_15_27 feynman_I_15_10 feynman_I_16_6 feynman_I_18_12 feynman_I_39_11 feynman_III_15_14 feynman_III_15_12 feynman_II_13_34 feynman_II_13_23 feynman_I_27_6 feynman_II_10_9 feynman_I_30_3 feynman_I_30_5 feynman_I_37_4 feynman_I_34_1 feynman_III_8_54 feynman_I_47_23 feynman_I_10_7 feynman_II_15_4 feynman_II_34_2 feynman_II_34_29a feynman_II_34_2a feynman_test_10 feynman_II_37_1 feynman_I_48_2 feynman_III_7_38 feynman_II_4_23 feynman_I_34_14 feynman_I_6_2b feynman_II_27_16 feynman_II_24_17 feynman_II_8_7 feynman_II_15_5 feynman_I_43_16 feynman_test_5 feynman_I_34_8 feynman_I_50_26 feynman_test_3 feynman_I_38_12 feynman_I_39_22 feynman_test_15 feynman_test_11 feynman_I_8_14 feynman_I_43_43 feynman_test_8 feynman_III_10_19 feynman_I_24_6 feynman_II_13_17 feynman_II_34_11 feynman_II_11_27 feynman_I_32_5 feynman_III_4_33 feynman_III_21_20 feynman_II_38_3 feynman_II_6_11 feynman_II_6_15b feynman_I_12_2 feynman_III_4_32 feynman_I_29_16 feynman_I_13_4 feynman_I_15_3t feynman_I_18_4 feynman_III_13_18 feynman_I_18_14 feynman_I_15_3x feynman_I_12_11 feynman_II_2_42 feynman_test_7 feynman_test_4 feynman_II_34_29b feynman_II_11_3 feynman_II_11_20 feynman_test_18 feynman_II_35_18 feynman_I_44_4 feynman_test_14 feynman_test_13 feynman_test_12 feynman_II_35_21 feynman_test_9 feynman_I_41_16 feynman_III_19_51 feynman_I_13_12 feynman_III_14_14 feynman_II_21_32 feynman_III_9_52 feynman_I_32_17 feynman_test_2 feynman_test_19 feynman_test_17 feynman_II_6_15a feynman_I_11_19 feynman_I_40_1 feynman_test_16 feynman_test_20 feynman_test_1 feynman_test_6 feynman_II_36_38 feynman_I_9_18; do
    python evaluate_model.py ../../data/pmlb/datasets/$exp/$exp.tsv.gz \
        -ml $method \
        -seed $seed \
        -target_noise $noise \
        -results_path "./results-$method-$noise"
done
done
done

Or

python ./benchmark/srbench/experiment/analyze.py \
    ./data/pmlb/datasets/strogatz_* \  # Strogatz datasets
    -time_limit 00:15 \  # 15 monutes
    -ml sr4mdl \ # Test sr4mdl method
    -n_jobs 2 \  # 2 cores
    -results ./results/srbench/ \  # Save results to this directory
    -sym_data \
    --local \
    -script ./benchmark/srbench/experiment/evaluate_model 

python ./benchmark/srbench/experiment/analyze.py \
    ./data/pmlb/datasets/feynman_* \
    -ml sr4mdl \
    --local \
    -n_jobs 2 \
    -results ./results/srbench/ \
    -script ./benchmark/srbench/experiment/evaluate_model 
    -sym_data

About

Official implementation of "Symbolic Regression via MDLformer-guided Search: From Minimizing Prediction Error to Minimizing Description Length" (ICLR 2025) as well as its extended version "An MDL-oriented Search Framework for Symbolic Regression" (submitting to TPAMI)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages