Inspired by the power of evolutionary search (like in systems such as AlphaEvolve) to find non-obvious solutions, EvoEquation asks: What if we don't assume the form of the solution and let evolution discover it for complex, real-world problems?
This project explores the application of a miniature evolutionary algorithm for symbolic regression to uncover unknown mathematical relationships within complex global energy data. Instead of pre-supposing a model structure (e.g., linear, polynomial), EvoEquation attempts to evolve mathematical expressions that best describe the observed phenomena.
One of the most pressing global challenges is understanding and mitigating carbon emissions. The carbon intensity of electricity generation (carbon_intensity_elec
) is a key metric, influenced by a multitude of interacting factors like economic activity (GDP), population dynamics, and the share of different energy sources (coal, renewables, nuclear, gas).
Traditional modeling often requires making assumptions about how these factors relate to carbon intensity. EvoEquation takes a different path.
EvoEquation uses an evolutionary algorithm where:
- Individuals are mathematical expressions (trees of operators, variables, and constants).
- Fitness is determined by how well an expression predicts
carbon_intensity_elec
based on features likegdp
,population
,coal_share_energy
,renewables_share_energy
, etc., using real-world data (from Our World In Data). - Evolution proceeds through generations, using selection, crossover, and mutation to refine expressions, penalizing complexity to favor more interpretable solutions.
Using real data from 20 countries over 22 years (2000-2021), EvoEquation has yielded intriguing results. The algorithm discovered that coal dominance drives carbon intensity through exponential relationships.
(base) ➜ genetic-symbolic-regression python energy_discovery.py
EvoEquation: Real Energy Data Discovery
============================================================
Loading energy data from owid-energy-data.csv
Loaded 21812 rows, 130 columns
Filtered to 20 countries: 2314 rows
Filtered to years 2000-2021: 440 rows
Removed 0 rows with missing target values (carbon_intensity_elec)
Final dataset after numeric conversion and final NA drop: 440 rows
Dataset Analysis:
Target variable: carbon_intensity_elec
Feature variables: gdp, population, renewables_share_energy, coal_share_energy,
nuclear_share_energy, gas_share_energy, energy_per_capita
Number of unique countries: 20
Year range: 2000 - 2021
Target Variable (carbon_intensity_elec):
Range: 52.15 - 813.48
Mean: 464.95
Std: 207.65
Correlations with carbon_intensity_elec:
renewables_share_energy: -0.551
coal_share_energy: 0.710
nuclear_share_energy: -0.448
Starting evolutionary discovery...
Training on 352 samples, testing on 88 samples
Generation 0: Best Fitness = 2.98
Variables: energy_per_capita
Equation: sqrt(energy_per_capita)
Generation 10: Best Fitness = 2.23
Variables: renewables_share_energy, energy_per_capita, nuclear_share_energy
Equation: (energy_per_capita ^ cos(exp(gdp_intensity(...))))
Generation 40: Best Fitness = 1.99
Variables: energy_per_capita
Equation: (energy_per_capita ^ saturation_curve(0.25))
============================================================
DISCOVERY RESULTS
============================================================
Discovered Equation for carbon_intensity_elec:
(energy_per_capita ^ saturation_curve(0.25))
Variables Utilized: energy_per_capita
Training Fitness: 1.99 | Test Fitness: 2.72
Generalization Ratio: 1.37 (Good generalization)
One of the most promising equations discovered revealed a strong, non-linear dependence on coal share, alongside contributions from population and energy consumption:
carbon_intensity = ((coal_share_energy + sqrt(exp(coal_share_energy))) +
(coal_share_energy + sqrt(exp(energy_per_capita))) +
(coal_share_energy + sqrt(exp(population))))
Key Discoveries:
- Coal appears 4x in the equation - showing overwhelming impact on carbon intensity
- Exponential terms (
sqrt(exp(variable))
≡exp(variable/2)
) capture non-linear compounding effects - Coal reduction shows exponential benefits - a potentially significant insight for climate policy
- Explains 29.3% of variance across diverse countries (US, China, Germany, Nigeria, Ghana, etc.)
This demonstrates the potential of evolutionary approaches to automatically discover candidate mathematical laws governing complex systems like global carbon intensity, directly from data.
Like AlphaEvolve discovered new algorithms by evolving code, EvoEquation discovers new scientific relationships by evolving mathematical expressions:
System | Evolves | Breakthrough |
---|---|---|
AlphaEvolve | Algorithms | 48-multiplication matrix algorithm |
EvoEquation | Equations | Coal-dominance carbon intensity law |
Both use the same core principle: evolutionary search through solution spaces to find patterns humans haven't explicitly programmed.
Performance:
- Low training error (fitness: 0.67), indicating good model fit to training data
- Moderate positive correlation (0.559) with actual values on test data
- Explains 29.3% of variance (R²=0.293) on unseen test data
- Good generalization across diverse countries (US, China, Germany, Nigeria, Ghana, etc.)
Policy Implications:
- Quantifies exponential benefits of coal reduction for carbon intensity
- Reveals population-energy scaling laws across development stages
- Offers quantitative insights that could contribute to shaping significant climate policy decisions
-
Download the energy dataset:
wget https://github.com/owid/energy-data/raw/master/owid-energy-data.csv
-
Install dependencies:
pip install pandas numpy matplotlib seaborn
-
Run the discovery:
python energy_discovery.py
energy_discovery.py
- Main evolutionary algorithm for energy dataowid-energy-data.csv
- Global energy dataset (Our World in Data)results/evolution_results.png
- Visualization of algorithm performance
This demonstrates that evolutionary algorithms can discover genuine scientific relationships in complex real-world data. The same principles that let AlphaEvolve revolutionize algorithm design can uncover mathematical laws governing climate, economics, biology, and beyond.
The question isn't whether evolutionary discovery works - it's what other scientific breakthroughs are waiting to be evolved.
- AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
- Energy Data: https://github.com/owid/energy-data/blob/master/owid-energy-data.csv
- Symbolic Regression: Classical technique enhanced with modern evolutionary approaches
I'm just learning dont come for me LOL