Skip to content

Calculate/make use of weights in sampling distribution #1105

@nucleosynthesis

Description

@nucleosynthesis

I'm creating this issue here to hopefully inspire someone to do the development work in combine.

For significance calculations with toys, we could improve the toy generation step by using importance sampling. In the test-statistic that we use in combine for this, we would need to do the following,

  1. Generate a test-statistic distribution for $\mu=\mu^{\prime}$ where $\mu^{\prime}$ is some fixed chosen value of the signal strength, in addition to the distribution that we already calculated for $\mu=0$.
  2. Calculate weights for each entry in the distribution based on the following,

for $\mu=0$ the weight for each toy is calculated as

$$ w = 1, \text{ if } L(0,\hat{\theta}) > L(\mu^{\prime},\hat{\theta})\text{, and } w=0~\text{ otherwise } $$

and for the toys where $\mu=\mu^{\prime}$,

$$ w = \dfrac{L(0,\hat{\theta})}{L(\mu^{\prime},\hat{\theta})}, \text{ if } L(0,\hat{\theta}) < L(\mu^{\prime},\hat{\theta})\text{, and } w=0~\text{ otherwise } $$

  1. Merge the two sets of toys into a single distribution, weighted with $w$ when calculating p-values.

This needs to be done in https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/src/HybridNew.cc and would require setting the sampling distribution and sampling distribution weights appropriately.

The example below is a distribution of the root of the usual discovery statistic but calculated using MultiDimFit instead as a proof of concept that the method appears to work well.

The black points are the standard toys we already calculate in combine, the blue are using the procedure above with the importance sampling weights.

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions