@@ -25,3 +25,29 @@ evaluations, with greater reductions for more challenging posteriors.
25
25
While the evaluations in @zhang_pathfinder:2022 found that
26
26
single-path and multi-path Pathfinder outperform ADVI for most of the models in the PosteriorDB evaluation set,
27
27
we recognize the need for further experiments on a wider range of models.
28
+
29
+ ## Diagnosing Pathfinder
30
+
31
+ Pathfinder diagnoses the accuracy of the approximation by computing the density ratio of the true posterior and
32
+ the approximation and using Pareto-$\hat{k}$ diagnostic (Vehtari et al., 2024) to assess whether these ratios can
33
+ be used to improve the approximation via resmapling. /, the
34
+ normalization for the posterior can be estimated reliably (Section 3, Vehtari et al., 2024), which is the
35
+ first requirement for reliable resampling. If estimated Pareto-$\hat{k}$ for the ratios is smaller than 0.7,
36
+ there is still need to further diagnose importance sampling estimates by taking into account also the expetant
37
+ function (Section 2.2, Vehtari et al., 2024). If estimated Pareto-$\hat{k}$ is larger than 0.7, then the
38
+ estimate for the normalization is unreliable and any Mote Carlo estimate may have a big error. The resampled draws
39
+ can still contain some useful information about the location and shape of the posterior which can be used in early
40
+ parts of Bayesian workflow (Gelman et al, 2020).
41
+
42
+ ## Using Pathfinder for initializing MCMC
43
+
44
+ If estimated Pareto-$\hat{k}$ for the ratios is smaller than 0.7, the resampled posterior draws are almost as
45
+ good for initializing MCMC as would indepepent draws from the posterior be. If estimated Pareto-$\hat{k}$ for the
46
+ ratios is larger than 0.7, the Pathfinder draws are not reliable for posterior inference directly, but they are still
47
+ very likely better for initializing MCMC than random draws from an arbitrary pre-defined distribution (e.g. uniform from
48
+ -2 to 2 used by Stan by default). If Pareto-$\hat{k}$ is larger than 0.7, it is likely that one of the ratios is much bigger
49
+ than others and the default resampling with replacement would produce copies of one unique draw. For initializing several
50
+ Markov chains, it is better to use resampling without replacement to guarantee unique initialization for each chain. At the
51
+ moment Stan allows turning off the resampling completely, and then the resampling without replacement can be done outside of
52
+ Stan.
53
+
0 commit comments