How to represent Markov random processes over 2D and 3D grids? #466

juliohm · 2025-04-29T11:29:53Z

juliohm
Apr 29, 2025

Suppose I have a function p(i, j) that returns the conditional probabilities of k categorical values at a grid cell (i, j) given its 8 neighboring cells:

Additionally, suppose I have observed values in n cells of the grid z(i1, j1), z(i2, j2), ..., z(in, jn). Is there an efficient method in RxInfer.jl to sample the associated Markov random process Z(i, j) given the conditional probability function and observed values? Each sample is a full image where observed values are honored and where the conditional probability function is utilized to build the full distribution over all cells of the grid.

ismailsenoz · 2025-04-29T15:28:11Z

ismailsenoz
Apr 29, 2025
Maintainer

Hello @juliohm, thanks for the question. As far as I understand you are trying to create a discrete Markov random field. Technically, RxInfer can handle the problem of obtaining the associated Markov random process with few hacks. Once the posterior is obtained sampling should not be an issue. However, in order to scale this problem you would need proprietary algorithms that are in RxInferPro.

6 replies

albertpod Apr 29, 2025
Maintainer

@wouterwln maybe you can demo 10x10 grid here?

bvdmitri Apr 29, 2025
Maintainer

@HoangMHNguyen we need your brain power here too. Since each sample is a full image, GP sounds like a natural choice here? 10x10 grid, means 100 points you can evaluate your GP in + a limited number of observations (inducing points?). Maybe we can connect GPs with Binomial or Multinomial likelihood (1, 2). Just brainstorming btw. Or perhaps Wouter's DiscreteTransition node "will just work".

ismailsenoz Apr 29, 2025
Maintainer

using RxInfer, LogExpFunctions, Distributions

function π_to_ψ(π)
    K = length(π)
    ψ = zeros(K-1)
    ψ[1] = logit(π[1])
    @inbounds for k in 2:(K-1)
        @views ψ[k] = logit(π[k] / (1 - sum(π[1:(k-1)])))
    end
    return ψ
end

function ψ_to_π(ψ)
    K = length(ψ) + 1
    π = zeros(K)
    @inbounds for k in 1:(K-1)
        @views π[k] = logistic(ψ[k]) * (1 - sum(π[1:(k-1)]))
    end
    π[K] = 1 - sum(π[1:(K-1)])
    return π
end


@model function markov_random_field(L, W, k, Σ, neighbor_fn, p, z)
    local probs_p, probs_ψ, log_probs
    for i in 1:L, j in 1:W
        neighbors = neighbor_fn(i, j, k)
        probs_p = [p(i, j, neighbors, n) for n in 1:k]
        probs_ψ = π_to_ψ(probs_p)
        log_probs[i,j] ~ MvNormalMeanCovariance(probs_ψ, Σ)
        z[i, j] ~ MultinomialPolya(1, log_probs[i,j]) where {dependencies = RequireMessageFunctionalDependencies(ψ = MvNormalWeightedMeanPrecision(zeros(k-1), diageye(k-1)))}

    end
end

L = 20
W = 20
k = 8
Σ = diageye(k-1)
neighbor_fn = (i, j, k) -> begin
    neighbors = []
    # Check all 4 directions (up, down, left, right)
    for (di, dj) in [(-1, 0), (1, 0), (0, -1), (0, 1)]
        # Only include neighbors that are within bounds
        ni, nj = i + di, j + dj
        if 1 <= ni <= L && 1 <= nj <= W
            push!(neighbors, (ni, nj))
        end
    end
    # Return only up to k neighbors (or all if fewer than k)
    return neighbors[1:min(k, length(neighbors))]
end
p_func = (i, j, neighbors, n) -> 1/k

obs_indices_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
obs_indices_2 = [1, 4, 5, 6, 8, 10]

X = Matrix{Union{Missing, Vector{Float64}}}(missing, L, W)
for i in obs_indices_1, j in obs_indices_2
    # Using different probabilities for each category
    probs = rand(Dirichlet(ones(k))) # Assuming k=8
    X[i, j] = rand(Multinomial(1, probs))
end
result = infer(
    model = markov_random_field(L = L, W = W, k = k, Σ = Σ, neighbor_fn = neighbor_fn, p = p_func),
    data = (z=X, ),
    showprogress = true,
    options = (
        limit_stack_depth = 100,
    )
)

p_matrix = map(ψ -> ψ_to_π(ψ), mean.(result.posteriors[:log_probs]))

posteriors = map(p -> Multinomial(1, p), p_matrix)

posteriors_samples = map(p -> rand(p), posteriors)

I am not really sure of the details of the specification for $p$ function and other problem specifications but perhaps what you are looking is something along these lines? Would be nice if the problem is specified more rigorously.

juliohm Apr 29, 2025
Author

Thank you. I will try to play with it to understand the potential. Will get back to you as soon as possible.

juliohm May 13, 2025
Author

@ismailsenoz this example seems to scale well. Tried with L=W=100 and k=3 and it returned in a few seconds. There are some parts of it that I still don't understand though like the MultinomialPolya node, but I would like to ask something first...

Did I understand correctly that the posteriors in the example have the marginals at each pixel of the image? My main goal is to sample all pixels jointly to produce a coherent (natural image), but the posterior_samples are produced by looping over the pixels and calling rand independently, which is not equivalent.

Could you please clarify if my goal is achievable? Please let me know if I am missing something.

wouterwln · 2025-04-30T08:11:37Z

wouterwln
Apr 30, 2025
Maintainer

Hi @juliohm , thanks for trying out RxInfer!
If I understand correctly, you are trying to model a Markov Random Field with a grid structure, so every variable depends jointly(!) on its 8 neighbors. This joint dependency is probably what will cause most of the computational burden in this solution, since we introduce a dependency between a variable and its two-step neighborhood directly in this way. If this is not exactly what you meant, let's discuss! I currently assume the most computationally infeasible interpretation of your explanation, so you can see this comment as an upper bound of what is possible in RxInfer.

Let's start with the model structure, the model structure relies on the DiscreteTransition node, which can model conditional probability tables for discrete variables. The model definition is a bit clunky because I want to properly handle the boundaries of the grid, while also showing that this is not an issue for the RxInfer internals. Note that I model a Conditional Random Field (which is a markov random field with a likelihoodon every variable) with an identity likelihood. This is a bit nitpicky, but it will improve the quality of our predictons (more details here but I don't want to break the flow of my answer). Let's get into the model:

using RxInfer

n = 10

@model function markov_random_field(y, B, n)
    for i in 1:n
        for j in 1:n
            x[i, j] ~ DiscreteTransition(y[i, j], diageye(3)) # Identity likelihood, can always be changed
        end
    end
    for i in 1:n
        for j in 1:n
            # Handle edges and corners
            if i == 1 && j == 1  # Top-left corner
                x[i, j] ~ DiscreteTransition(x[i+1, j], B[i, j], x[i, j+1], x[i+1, j+1])
            elseif i == 1 && j == n  # Top-right corner 
                x[i, j] ~ DiscreteTransition(x[i+1, j], B[i, j], x[i, j-1], x[i+1, j-1])
            elseif i == n && j == 1  # Bottom-left corner
                x[i, j] ~ DiscreteTransition(x[i-1, j], B[i, j], x[i, j+1], x[i-1, j+1])
            elseif i == n && j == n  # Bottom-right corner
                x[i, j] ~ DiscreteTransition(x[i-1, j], B[i, j], x[i, j-1], x[i-1, j-1])
            elseif i == 1  # Top edge
                x[i, j] ~ DiscreteTransition(x[i, j-1], B[i, j], x[i+1, j], x[i, j+1], x[i+1, j-1], x[i+1, j+1])
            elseif i == n  # Bottom edge
                x[i, j] ~ DiscreteTransition(x[i, j-1], B[i, j], x[i-1, j], x[i, j+1], x[i-1, j-1], x[i-1, j+1])
            elseif j == 1  # Left edge
                x[i, j] ~ DiscreteTransition(x[i-1, j], B[i, j], x[i, j+1], x[i+1, j], x[i-1, j+1], x[i+1, j+1])
            elseif j == n  # Right edge
                x[i, j] ~ DiscreteTransition(x[i-1, j], B[i, j], x[i, j-1], x[i+1, j], x[i-1, j-1], x[i+1, j-1])
            else  # Interior points
                x[i, j] ~ DiscreteTransition(x[i-1, j-1], B[i, j], x[i-1, j], x[i, j-1], x[i+1, j], x[i, j+1], x[i-1, j+1], x[i+1, j-1], x[i+1, j+1])
            end
        end
    end
end

This model definition is quite big, and we can probably come up with something smarter, but I'm shooting from the hip here. The main point is that the DiscreteTransition node can handle it all. This model structure does introduce some computational burdens though:

All edges in your grid are now modelled as bidirectional dependencies, so we have 2 DiscreteTransition nodes connecting each pair of neighbors
All x[i+1, j+1] and x[i-1, j-1] (so a two-step path) are connected to the same DiscreteTransition node, introducing two-step dependencies everywhere in your graph
This "problem" exponentially multiplies because, for example x[i, j+1] and x[i, j-1] are connected through x[i,j], x[i+1, j] and x[i-1,j].
I'm sure your actual model structure admits a way easier factorization, but again, we're talking absolute worst case here. Let's generate some data. I will generate a set of random conditional probability tables (and store them in B), but this, in your case, should obviously be replaced by whatever your p(i,j) outputs:

# Generate random transition matrices, doesn't really matter now
B = Matrix{Array{Float64}}(undef, n, n)

for i in 1:n
    for j in 1:n
        # Determine number of connections based on position
        if i == 1 && j == 1  # Top-left corner
            B[i, j] = rand(3, 3, 3, 3)  # Self + right + bottom + diagonal
        elseif i == 1 && j == n  # Top-right corner
            B[i, j] = rand(3, 3, 3, 3)  # Self + left + bottom + diagonal
        elseif i == n && j == 1  # Bottom-left corner
            B[i, j] = rand(3, 3, 3, 3)  # Self + top + right + diagonal
        elseif i == n && j == n  # Bottom-right corner
            B[i, j] = rand(3, 3, 3, 3)  # Self + top + left + diagonal
        elseif i == 1  # Top edge
            B[i, j] = rand(3, 3, 3, 3, 3, 3)  # Self + left + right + bottom + 2 diagonals
        elseif i == n  # Bottom edge
            B[i, j] = rand(3, 3, 3, 3, 3, 3)  # Self + left + right + top + 2 diagonals
        elseif j == 1  # Left edge
            B[i, j] = rand(3, 3, 3, 3, 3, 3)  # Self + top + bottom + right + 2 diagonals
        elseif j == n  # Right edge
            B[i, j] = rand(3, 3, 3, 3, 3, 3)  # Self + top + bottom + left + 2 diagonals
        else  # Interior points
            B[i, j] = rand(3, 3, 3, 3, 3, 3, 3, 3, 3)  # Self + 8 neighbors
        end
        B[i, j] ./= sum(B[i, j])  # Normalize
    end
end

# Generate some data, a 50% chance of a datapoint being missing
y = Matrix{Union{Missing, Vector{Float64}}}(undef, n, n)

for i in 1:n
    for j in 1:n
        if rand() < 0.5
            y[i, j] = zeros(3)
            y[i, j][rand(1:3)] = 1
        else
            y[i, j] = missing
        end
    end
end

Note that we inject missing into our data, RxInfer will natively be able to handle this missing data.

Now on to the actual inference. Now that we've setup this model and data, we can do inference with just a couple of lines:

initialization = @initialization begin
    μ(x) = vague(Categorical, 3)
end


result = infer(model = markov_random_field(B=B, n=n), data=(y=y,), initialization=initialization, iterations=10, options = (limit_stack_depth=500,))

Which gives us a posterior probability distribution over all missing values. Then, we can use either the Gibbs sampling algorithm, or sequential sampling with message passing to obtain samples out of this MRF. Let's benchmark our solution:

callbacks = RxInferBenchmarkCallbacks()

for i in 1:10
    result = infer(model = markov_random_field(B=B, n=n), data=(y=y,), initialization=initialization, iterations=10, callbacks=callbacks, options = (limit_stack_depth=500,))
end

Which (on my machine) gives:

╭────────────────┬────────────┬────────────┬────────────┬────────────┬────────────╮
│      Operation │        Min │        Max │       Mean │     Median │        Std │
├────────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ Model creation │   7.997 ms │  29.282 ms │  10.487 ms │   8.476 ms │   6.609 ms │
│      Inference │    3.595 s │    3.860 s │    3.708 s │    3.664 s │ 104.204 ms │
│      Iteration │ 345.368 ms │ 590.176 ms │ 370.538 ms │ 363.971 ms │  36.645 ms │
╰────────────────┴────────────┴────────────┴────────────┴────────────┴────────────╯

I think that, when writing this model down this naively, about 99% of all computations done are entirely redundant, and I'm more than comfortable that with both a smarter model structure and a specialized inference algorithm (the DiscreteTransition is a general implementation that can definitely be made 10x faster in high-dimensional environments) this inference can be 100+ times faster. Let me know if you have any additional questions!

3 replies

juliohm May 13, 2025
Author

Thank you @wouterwln for the very didactic example. I finally got some time to test it.

Then, we can use either the Gibbs sampling algorithm, or sequential sampling with message passing to obtain samples out of this MRF.

Could you please elaborate on this final step? How can we use result to sample random images efficiently?

wouterwln May 21, 2025
Maintainer

Hi @juliohm , unfortunately these procedures are not implemented in RxInfer out of the box . This is because RxInfer decomposes the the joint distribution into node-local joint distributions. This implies that every connection in the factor graph has its local contribution to the total joint distribution, and if we sample some pixels, we have to propagate this information to the rest of the graph to update their posterior distributions. The Gibbs sampling algorithm (or at least, a very naive version of it), would look something like this:

resulting_image = y
for x in 1:7, y in 1:7
    if !ismissing(resulting_image[x, y])
        posterior = infer(model = markov_random_field(B=B, n=n), data=(y=resulting_image,), initialization=initialization, iterations=10, callbacks=callbacks, options = (limit_stack_depth=500,))
        pixel_sample = rand(last(posterior.posteriors[:x][x, y]))
        resulting_image[x, y] = pixel_sample
    end
end

resulting_image

Of course, as with my previous reply, this is a very naive and suboptimal implementation, which recomputes inference on the entire graph to propagate information about a single pixel value. Probably your model structure allows you to be smarter about this (sample multiple pixels at the same time, infer on a smaller model to propagate sampling information), but a Gibbs sampling approach, albeit inefficient when implemented naively, would look like this.

Hope this helps! If you have any questions feel free to ask.

juliohm May 21, 2025
Author

Thank you @wouterwln , your answer is already very helpful. I will try to find time to experiment with the approach to see if it could work in specific situations.

ReactiveBayes

How to represent Markov random processes over 2D and 3D grids? #466

Uh oh!

juliohm Apr 29, 2025

Replies: 2 comments · 9 replies

Uh oh!

ismailsenoz Apr 29, 2025 Maintainer

Uh oh!

albertpod Apr 29, 2025 Maintainer

Uh oh!

bvdmitri Apr 29, 2025 Maintainer

Uh oh!

ismailsenoz Apr 29, 2025 Maintainer

Uh oh!

juliohm Apr 29, 2025 Author

Uh oh!

juliohm May 13, 2025 Author

Uh oh!

Uh oh!

wouterwln Apr 30, 2025 Maintainer

Uh oh!

juliohm May 13, 2025 Author

Uh oh!

wouterwln May 21, 2025 Maintainer

Uh oh!

juliohm May 21, 2025 Author

juliohm
Apr 29, 2025

Replies: 2 comments 9 replies

ismailsenoz
Apr 29, 2025
Maintainer

albertpod Apr 29, 2025
Maintainer

bvdmitri Apr 29, 2025
Maintainer

ismailsenoz Apr 29, 2025
Maintainer

juliohm Apr 29, 2025
Author

juliohm May 13, 2025
Author

wouterwln
Apr 30, 2025
Maintainer

juliohm May 13, 2025
Author

wouterwln May 21, 2025
Maintainer

juliohm May 21, 2025
Author