Skip to content

Commit 4318102

Browse files
committed
Merge remote-tracking branch 'origin/main' into update-tutorial-for-113x
2 parents 250357f + b08e1f3 commit 4318102

File tree

2 files changed

+46
-45
lines changed

2 files changed

+46
-45
lines changed

docs/part5/longexercise.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Topics covered in this section:
4444
- A: Computing limits using the asymptotic approximation
4545
- Advanced section: B: Computing limits with toys
4646

47-
We will begin with a simplified version of a datacard from the MSSM $\phi\rightarrow\tau\tau$ analysis that has been converted to a one-bin counting experiment, as described above. While the full analysis considers a range of signal mass hypotheses, we will start by considering just one: $m_{\phi}$=800GeV. Click the text below to study the datacard (`datacard_part1.txt` in the `cms-das-stats` directory):
47+
We will begin with a simplified version of a datacard from the MSSM $\phi\rightarrow\tau\tau$ analysis that has been converted to a one-bin counting experiment, as described above. While the full analysis considers a range of signal mass hypotheses, we will start by considering just one: $m_{\phi}$=800GeV. Click the text below to study the datacard (`datacard_part1.txt` in the `longexercise` directory):
4848

4949
<details>
5050
<summary><b>Show datacard</b></summary>
@@ -68,11 +68,8 @@ CMS_eff_t_highpt lnN 1.1 1.1 1.1 -
6868
acceptance_Ztautau lnN - - 1.08 - -
6969
acceptance_bbH lnN - - - - 1.05
7070
acceptance_ttbar lnN 1.005 - - - -
71-
lumi_13TeV lnN 1.025 1.025 1.025 - 1.025
7271
norm_jetFakes lnN - - - 1.2 -
73-
xsec_Ztautau lnN - - 1.04 - -
7472
xsec_diboson lnN - 1.05 - - -
75-
xsec_ttbar lnN 1.06 - - - -
7673
```
7774
</details>
7875

@@ -93,7 +90,7 @@ combine -M [method] [datacard] [additional options...]
9390

9491
### A: Computing limits using the asymptotic approximation
9592

96-
As we are searching for a signal process that does not exist in the standard mode, it's natural to set an upper limit on the cross section times branching fraction of the process (assuming our dataset does not contain a significant discovery of new physics). Combine has dedicated method for calculating upper limits. The most commonly used one is `AsymptoticLimits`, which implements the CLs criterion and uses the profile likelihood ratio as the test statistic. As the name implies, the test statistic distributions are determined analytically in the asymptotic approximation, so there is no need for more time-intensive toy throwing and fitting. Try running the following command:
93+
As we are searching for a signal process that does not exist in the standard model, it's natural to set an upper limit on the cross section times branching fraction of the process (assuming our dataset does not contain a significant discovery of new physics). Combine has dedicated method for calculating upper limits. The most commonly used one is `AsymptoticLimits`, which implements the [CLs criterion](https://inspirehep.net/literature/599622) and uses the profile likelihood ratio as the test statistic. As the name implies, the test statistic distributions are determined analytically in the [asymptotic approximation](https://arxiv.org/abs/1007.1727), so there is no need for more time-intensive toy throwing and fitting. Try running the following command:
9794

9895
```shell
9996
combine -M AsymptoticLimits datacard_part1.txt -n .part1A
@@ -128,7 +125,7 @@ Run the following command:
128125
```shell
129126
combine -M HybridNew datacard_part1.txt --LHCmode LHC-limits -n .part1B --saveHybridResult --fork 0
130127
```
131-
In contrast to `AsymptoticLimits` this will only determine the observed limit, and will take around five minutes. There will not be much output to the screen while combine is running. You can add the option `-v 1` to get a better idea of what is going on. You should see combine stepping around in `r`, trying to find the value for which CLs = 0.05, i.e. the 95% CL limit. The `--saveHybridResult` option will cause the test statistic distributions that are generated at each tested value of `r` to be saved in the output ROOT file.
128+
In contrast to `AsymptoticLimits` this will only determine the observed limit, and will take a few minutes. There will not be much output to the screen while combine is running. You can add the option `-v 1` to get a better idea of what is going on. You should see combine stepping around in `r`, trying to find the value for which CLs = 0.05, i.e. the 95% CL limit. The `--saveHybridResult` option will cause the test statistic distributions that are generated at each tested value of `r` to be saved in the output ROOT file.
132129

133130
To get an expected limit add the option `--expectedFromGrid X`, where `X` is the desired quantile, e.g. for the median:
134131

@@ -145,13 +142,13 @@ Calculate the median expected limit and the 68% range. The 95% range could also
145142

146143
Next plot the test statistic distributions stored in the output file:
147144
```shell
148-
python $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/plotTestStatCLs.py --input higgsCombine.part1B.HybridNew.mH120.root --poi r --val all --mass 120
145+
python3 $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit/test/plotTestStatCLs.py --input higgsCombine.part1B.HybridNew.mH120.root --poi r --val all --mass 120
149146
```
150147

151148
This produces a new ROOT file `cls_qmu_distributions.root` containing the plots, to save them as pdf/png files run this small script and look at the resulting figures:
152149

153150
```shell
154-
python printTestStatPlots.py cls_qmu_distributions.root
151+
python3 printTestStatPlots.py cls_qmu_distributions.root
155152
```
156153

157154
#### Advanced exercises
@@ -209,7 +206,7 @@ There are two other differences with respect to the one-bin card:
209206

210207
The syntax of the "shapes" line is: `shapes [process] [channel] [file] [histogram] [histogram_with_systematics]`. It is possible to use the `*` wildcard to map multiple processes and/or channels with one line. The histogram entries can contain the `$PROCESS`, `$CHANNEL` and `$MASS` place-holders which will be substituted when searching for a given (process, channel) combination. The value of `$MASS` is specified by the `-m` argument when combine. By default the observed data process name will be `data_obs`.
211208

212-
Shape uncertainties can be added by supplying two additional histograms for a process, corresponding to the distribution obtained by shifting that parameter up and down by one standard deviation. These shapes will be interpolated quadratically for shifts below $1\sigma$ and linearly beyond. The normalizations are interpolated linearly in log scale just like we do for log-normal uncertainties.
209+
Shape uncertainties can be added by supplying two additional histograms for a process, corresponding to the distribution obtained by shifting that parameter up and down by one standard deviation. These shapes will be interpolated (see the [template shape uncertainties](../part2/settinguptheanalysis.md#template-shape-uncertainties) section for details) for shifts within $\pm1\sigma$ and linearly extrapolated beyond. The normalizations are interpolated linearly in log scale just like we do for log-normal uncertainties.
213210

214211
![](images/shape_morphing.jpg)
215212

@@ -252,7 +249,7 @@ We will now explore one of the most commonly used modes of combine: `FitDiagnost
252249
- A "background-only" (b-only) fit: first POI (usually "r") fixed to zero
253250
- A "signal+background" (s+b) fit: all POIs are floating
254251

255-
With the s+b fit combine will report the best-fit value of our signal strength modifier `r`. As well as the usual output file, a file named `fitdiagnostics.root` is produced which contains additional information. In particular it includes two `RooFitResult` objects, one for the b-only and one for the s+b fit, which store the fitted values of all the **nuisance parameters (NPs)** and POIs as well as estimates of their uncertainties. The covariance matrix from both fits is also included, from which we can learn about the correlations between parameters. Run the `FitDiagnostics` method on our workspace:
252+
With the s+b fit combine will report the best-fit value of our signal strength modifier `r`. As well as the usual output file, a file named `fitDiagnosticsTest.root` is produced which contains additional information. In particular it includes two `RooFitResult` objects, one for the b-only and one for the s+b fit, which store the fitted values of all the **nuisance parameters (NPs)** and POIs as well as estimates of their uncertainties. The covariance matrix from both fits is also included, from which we can learn about the correlations between parameters. Run the `FitDiagnostics` method on our workspace:
256253

257254
```shell
258255
combine -M FitDiagnostics workspace_part2.root -m 800 --rMin -20 --rMax 20
@@ -421,6 +418,7 @@ rate 198.521 683.017
421418
----------------------------------------------------------------------------------------------------------------------------------
422419
CMS_eff_b lnN 1.02 1.02 1.02 1.02 - - - - - - - - - - - -
423420
CMS_eff_e lnN - - - - - 1.02 - - 1.02 1.02 - - - - - -
421+
...
424422
```
425423
</details>
426424

@@ -445,7 +443,7 @@ is perfectly valid and only one `rateParam` will be created. These parameters wi
445443

446444
**Tasks and questions:**
447445

448-
- Run `text2workspace.py` on this combined card and then use `FitDiagnostics` on an Asimov dataset with `r=1` to get the expected uncertainty. Suggested command line options: `--rMin 0 --rMax 2`
446+
- Run `text2workspace.py` on this combined card (don't forget to set the mass and output name `-m 200 -o workspace_part3.root`) and then use `FitDiagnostics` on an Asimov dataset with `r=1` to get the expected uncertainty. Suggested command line options: `--rMin 0 --rMax 2`
449447
- Using the RooFitResult in the `fitDiagnosticsTest.root` file, check the post-fit value of the rateParams. To what level are the normalisations of the DY and ttbar processes constrained?
450448
- To compare to the previous approach of fitting the SR only, with cross section and acceptance uncertainties restored, an additional card is provided: `datacard_part3_nocrs.txt`. Run the same fit on this card to verify the improvement of the SR+CR approach
451449

@@ -480,7 +478,7 @@ Another thing the `FitDiagnostics` mode can help us with is visualising the dist
480478
To produce these distributions add the `--saveShapes` and `--saveWithUncertainties` options when running `FitDiagnostics`:
481479

482480
```shell
483-
combine -M FitDiagnostics workspace_part3.root -m 200 --rMin -1 --rMax 2 --saveShapes --saveWithUncertainties
481+
combine -M FitDiagnostics workspace_part3.root -m 200 --rMin -1 --rMax 2 --saveShapes --saveWithUncertainties -n .part3B
484482
```
485483

486484
Combine will produce pre- and post-fit distributions (for fit_s and fit_b) in the fitDiagnosticsTest.root output file:
@@ -489,8 +487,8 @@ Combine will produce pre- and post-fit distributions (for fit_s and fit_b) in th
489487

490488
**Tasks and questions:**
491489

492-
- Make a plot showing the expected background and signal contributions using the output from `FitDiagnostics` - do this for both the pre-fit and post-fit. You will find a script `postFitPlot.py` in the `data/tutorials/longexercise` directory that can help you get started.
493-
The bin errors on the TH1s in the fitdiagnostics file are determined from the systematic uncertainties. In the post-fit these take into account the additional constraints on the nuisance parameters as well as any correlations.
490+
- Make a plot showing the expected background and signal contributions using the output from `FitDiagnostics` - do this for both the pre-fit and post-fit. You will find a script `postFitPlot.py` in the `longexercise` directory that can help you get started.
491+
The bin errors on the TH1s in the fitDiagnostics file are determined from the systematic uncertainties. In the post-fit these take into account the additional constraints on the nuisance parameters as well as any correlations.
494492

495493
- Why is the uncertainty on the post-fit so much smaller than on the pre-fit?
496494

@@ -616,7 +614,7 @@ Now the plot should look correct:
616614

617615
![](images/freeze_second_attempt.png)
618616

619-
We added the `--breakdown Syst,Stat` option to the plotting script to make it calculate the systematic component, which is defined simply as $\sigma_{\text{syst}} = \sqrt{\sigma^2_{\text{tot}} - \sigma^2_{\text{stat}}}.
617+
We added the `--breakdown Syst,Stat` option to the plotting script to make it calculate the systematic component, which is defined simply as $\sigma_{\text{syst}} = \sqrt{\sigma^2_{\text{tot}} - \sigma^2_{\text{stat}}}$.
620618
To split the systematic uncertainty into different components we just need to run another scan with a subset of the systematics frozen. For example, say we want to split this into experimental and theoretical uncertainties, we would calculate the uncertainties as:
621619

622620
$\sigma_{\text{theory}} = \sqrt{\sigma^2_{\text{tot}} - \sigma^2_{\text{fr.theory}}}$
@@ -669,21 +667,23 @@ An example physics model that just implements a single parameter `r` is given in
669667
<details>
670668
<summary><b>Show DASModel.py</b></summary>
671669
```python
672-
from HiggsAnalysis.CombinedLimit.PhysicsModel import *
670+
from HiggsAnalysis.CombinedLimit.PhysicsModel import PhysicsModel
671+
673672

674673
class DASModel(PhysicsModel):
675674
def doParametersOfInterest(self):
676675
"""Create POI and other parameters, and define the POI set."""
677676
self.modelBuilder.doVar("r[0,0,10]")
678-
self.modelBuilder.doSet("POI", ",".join(['r']))
677+
self.modelBuilder.doSet("POI", ",".join(["r"]))
679678

680679
def getYieldScale(self, bin, process):
681680
"Return the name of a RooAbsReal to scale this yield by or the two special values 1 and 0 (don't scale, and set to zero)"
682681
if self.DC.isSignal[process]:
683-
print 'Scaling %s/%s by r' % (bin, process)
682+
print("Scaling %s/%s by r" % (bin, process))
684683
return "r"
685684
return 1
686685

686+
687687
dasModel = DASModel()
688688
```
689689
</details>

0 commit comments

Comments
 (0)