Skip to content

Supporting more MCMCChains variable names #211

Open
@sethaxen

Description

@sethaxen

Currently from_mcmcchains assumes that all variable names are single-bracket-delimited or dot-delimited. However, quite complicated names are possible:

julia> using Turing, LinearAlgebra

julia> @model function foo()
           a ~ Normal()
           bar = (b=Matrix{typeof(a)}(undef, 2, 3), c=Vector{typeof(a)}(undef, 3))
           bar.b[1, :] .~ Normal(a)
           bar.b[2, 1:2] ~ MvNormal(I(2))
           bar.b[2, 3:end] .~ Normal(a)
           bar.c[end:-1:1] .~ Normal(a)
       end
foo (generic function with 2 methods)

julia> chns = sample(foo(), NUTS(), 1_000)
┌ Info: Found initial step size
└   ϵ = 1.6
Sampling 100%|█████████████████████████████████████████████████████████████████████████| Time: 0:00:01
Chains MCMC chain (1000×22×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 1.21 seconds
Compute duration  = 1.21 seconds
parameters        = a, bar.b[1,:][1], bar.b[1,:][2], bar.b[1,:][3], bar.b[2,1:2][1], bar.b[2,1:2][2], bar.b[2,3:3][1], bar.c[3:-1:1][1], bar.c[3:-1:1][2], bar.c[3:-1:1][3]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
        parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec 
            Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64 

                 a    0.0278    0.9966     0.0315    0.0724    139.5528    1.0152      115.1426
     bar.b[1,:][1]    0.0178    1.4143     0.0447    0.0841    205.0380    1.0096      169.1733
     bar.b[1,:][2]    0.0268    1.3678     0.0433    0.0765    292.3916    1.0072      241.2472
     bar.b[1,:][3]    0.0395    1.3866     0.0438    0.0852    234.2846    1.0061      193.3041
   bar.b[2,1:2][1]    0.0395    1.0091     0.0319    0.0260   1017.5267    0.9990      839.5435
   bar.b[2,1:2][2]   -0.0270    0.9839     0.0311    0.0260   1495.1411    0.9990     1233.6147
   bar.b[2,3:3][1]    0.0506    1.3798     0.0436    0.0724    347.4877    1.0032      286.7061
  bar.c[3:-1:1][1]    0.0156    1.4357     0.0454    0.0835    275.8889    1.0079      227.6311
  bar.c[3:-1:1][2]   -0.0010    1.4186     0.0449    0.0720    362.6178    1.0072      299.1896
  bar.c[3:-1:1][3]    0.0146    1.4061     0.0445    0.0783    333.0312    1.0069      274.7783

Quantiles
        parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
            Symbol   Float64   Float64   Float64   Float64   Float64 

                 a   -1.8465   -0.6502   -0.0164    0.7037    2.0664
     bar.b[1,:][1]   -2.7699   -0.9711    0.0028    0.9771    2.7813
     bar.b[1,:][2]   -2.5305   -0.9451    0.0408    0.9232    2.7366
     bar.b[1,:][3]   -2.6779   -0.9212    0.0311    1.0146    2.7213
   bar.b[2,1:2][1]   -1.8999   -0.6486    0.0397    0.7054    2.1453
   bar.b[2,1:2][2]   -1.9572   -0.7017   -0.0031    0.6454    1.8877
   bar.b[2,3:3][1]   -2.6416   -0.8997   -0.0014    0.9453    3.0213
  bar.c[3:-1:1][1]   -2.8470   -0.9299   -0.0128    0.9509    2.8362
  bar.c[3:-1:1][2]   -2.8268   -0.9619    0.0671    0.9812    2.7441
  bar.c[3:-1:1][3]   -2.7313   -0.9267    0.0001    0.9474    2.8913

If we call from_mcmcchains on this, we get an uninformative error.

Ideally we would like to get an InferenceData with bar.b and bar.c as variables. However, since the modeler can arbitrarily index and reindex and call getproperty to make arbitrarily complicated types, always doing the right thing is probably not possible. Also, MCMCChains's own machinery for combining flattened parameters into parameter arrays doesn't do a great job here.

For the short term then, I think it makes the most sense to raise an informative error of splitting by brackets produces anything more complicated than a tuple of integer indices. If users find this constraining and open issues, we can discuss supporting slightly more complicated indexing syntaxes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions