Left: posterior sharpening over time. Right: hidden ground-truth minefield
A robot with a noisy sensor explores minefield using Bayesian inference and expected free energy (EFE). The posterior over mines sharpens step by step as the agent balances risk and information gain.
This repo contains a simple demo of a robot exploring a noisy minefield using expected free energy (EFE) and Bayesian statistics.
The robot holds a generative model of the world (beta-bernoulli over mine presence and a noisy mine sensor). At each stage we compute the posterior mine probabilities (the risk) and take off the difference between the prior entropy and the evidence wieghted entropy of each outcome of a given move. The move with the lowest expected free energy is then selected.
The robot balances:
- Epistemic gain: information gain from probing new cells.
- Extrinsic risk: risk of stepping on a mine.
We consider two models:
- naieve_EFE_no_revisit.ipynb — the robot samples each tile at most once. Revisits confer no additional information.
- EFE_revisit.ipynb — the robot can re-probe tiles. Revisits confer additional information and yield a sharper posterior.
Written in haste.
Each cell (s_i \in {0,1}) (mine or no mine) has a bernoulli parameter of 0.1
The joint distribution of the grid is therefore:
Let a be probability of detecting a mine when there is a mine present (0.95):
Let 1-a be the probability of not detecting a mine when there is a mine present 0.05:
Let b be the probability of detecting a mine when there is no mine present (0.3)
Let 1-b be the probability of not detecting a mine when there is no mine present (0.7)
In summary the sensor has a confusion matrix:
Now we can define the likelihood of performing n mine detections with k detects and n-k non detects for a single cell given there is a mine there:
and define the likelihood of performing n mine detections with k detects and n-k non detects for a single cell given there is no mine there:
We can now use Bayes' rule with those likelihoods to form a posterior:
Substituting the likelihoods:
For an action (a) (moving in a particular direction compass direction N,S,W,E,NE,NW,SE,SW):
Entropy of a Bernoulli with parameter (p):
The agent picks the action (a^*) with lowest EFE, trading off exploration (epistemic gain) against exploitation (low risk).
Posterior PDFs
Left: single-visit posterior stays diffuse. Right: revisits drive probabilities sharply toward 0/1.
Boards
Ground truths of the boards.
You can view the notebooks on GitHub or run them in Colab: