Skip to content

Exercise C: Warping Studies

aolivier23 edited this page Jun 4, 2021 · 14 revisions

Exercise C: Warping Studies

Once we have subtracted estimated backgrounds from the data, we are left with a selected event distribution in reconstructed variables. The migration matrix we found with the event loop maps our best estimate of true variables to reconstructed variables, so we want to use its inverse for unfolding. Unfortunately, experimental migration matrices are almost never invertible because of effects like smearing during reconstruction and statistical fluctuations in the Monte Carlo. Regularization procedures use extra information from our Monte Carlo simulation to find a minimally biased estimate of the inverse of the migration matrix.

But suppose that our Monte Carlo simulation predicts the wrong event distribution. It might not predict enough 2p2h interactions for example. Then, unfolding our background-subtracted data could further bias our published result away from the physics process we're trying to measure.

Warping studies look for this effect and help us figure out how little regularization bias we can get away with. We change the underlying Monte Carlo simulation in ways that we suspect could better simulate some aspect of the natural process we're measuring and apply it to the unmodified MC selected distribution, which we call "fake data". We then perform a Chi squared test between the unfolded "fake data" and the unmodified selected signal truth distribution (the efficiency numerator). As we increase the amount of regularization, measured by the number of times the iterative d'Agostini algorithm is applied, we want this chi squared statistic to approach the number of degrees of freedom (number of bins).

How to Interpret Warping Study Results

We want to choose the minimum number of d'Agostini iterations that produces an acceptable chi squared statistic across multiple warped models. It's also desirable for the chi squared statistic to be stable with number of iterations around the number we pick. A tool called TransWarpExtractor produces a chi squared versus iterations histogram like this:

TODO: example TransWarp histogram

It's a little more complicated because TransWarpExtractor also simulates (Poisson-distributed) statistical fluctuations on each bin. We usually use the truncated mean chi squared to choose a number of iterations for a given warp. If the chi squared seems unusually large or the truncated mean is very far from the mean (a different histogram in the same TDirectory), then we also look at a heat map of all statistical throws like this:

TODO: example TransWarp heat map

Finally, since the statistical throws make the variance in the mean chi squared depend on total number of events, we need a standard for the sample size used in warping studies. Our standard is 12e20 Monte Carlo POT to match the data exposure. We don't have time to process 12e20 MC POT during MINERvA101 (although FermiGrid might help), so this tutorial is only a first look at a warping study for an inclusive analysis.

Your Warps

  • Turn the 2p2h tune off. This is a very aggressive tune that most results violently disagree with. But the inclusive analysis is so generic that it can survive this test.
  • Change our pion model. MnvTunev2, the successor to MnvTunev1, has a pion-based reweight that we'll use. The low Q2 pion suppression reweight is an empirical correction for the data/MC discrepancy we see in our coherent pion analysis.
  • Change our DIS model. In the Low Energy 2D inclusive result, Amy pointed out that DIS dominates in the high pz, high pT regions where the data and MC disagree most violently. Our colleagues from Aligarh Muslim University had just developed a new model for neutrino Deep Inelastic Scattering that we compared to MnvTunev1. We're going to reweight GENIE to this AMU model to simulate different DIS kinematics in nature.

Your Task

Modify runEventLoop to produce warped migration matrices. Look for the Model object set up in main() and add/remove Reweighters to set up Your Warps. TransWarpExtractor only uses the CV migration matrix, so use runEventLoop with systematics turned off for the warps to make sure you can finish this exercise in time.

The runTransWarp.sh script maps runEventLoop's histograms to TransWarpExtractor's command line arguments. Run it like runTransWarp.sh fakeData.root warped.root. It will produce a file named TODO each time it is run. Open these files like this:

root -l TODO.root
TBrowser tb; //Graphical display of directories, histograms, and other ROOT objects

Now, look at the truncated mean chi2 versus iteration histograms in directory TODO which are named TODO and choose the minimum number of iterations for each warp. We usually publish a result using the largest minimum number of iterations among 3-4 warps.

Solution

We'll discuss in person what number of iterations we should choose and whether our warps are reasonable. I'll fill in some plots here later.

Clone this wiki locally