-
Notifications
You must be signed in to change notification settings - Fork 16
SQ2.50 Procedural Framework: Part 7 (Group Me)
Part 6 ends with completion of the preliminary "StandardData" and "SampleData" sheets, and at this point in SQUID 2.50, data-processing can be temporarily (or permanently) halted, and the SQUID-book can be saved and closed for further processing later. (Of course, as an ordinary XLS, it can also be subjected to "offline" manipulation in the meantime.)
In SQUID 2.50, further processing of the SampleData sheet is performed using the green "Group Me" button, which is placed in the top-left corner of the sheet. Pressing the button triggers subroutine "GroupThis", which creates separate sheets of fully-reduced data for different samples. This process essentially involves specifying sample-name prefixes for as many distinct samples as were measured (or grouping all analyses together irrespective of prefix, if desired), tailoring the final common-Pb correction for the SampleData, and specifying the 'preferred' common-Pb corrected age-type for which an attempt will be made to "group" analyses for each of the separated samples.
Because "Group Me" can be pressed in a previously-saved SQUID-book opened in a newly-initiated instance of Excel, it is necessary for SQUID 2.50 to perform a series of validations: ensuring that the work book contains a sheet named "SampleData", making sure its format and structure is valid, and re-initialising fundamental variables. This documentation attempts to skip over all of that code, extracting only "data-oriented" validation checks potentially applicable to Squid3 (e.g. the number of analyses on the SampleData sheet, etc.), rather than "platform-oriented" checks stemming from the VBA environment.
Essentially, the preliminary checks involve (1) finding a worksheet named "SampleData" in the active Excel workbook (else End), and (2) checking that it is formatted correctly. In SQUID 2.50, this involves cycling through all the visible worksheets to find the text "SquidSampleData" in cell E1. When it finds that text, it then searches for the row containing the column-header data (formatted double-underline in the Excel worksheet). In a SampleData sheet unaffected by "offline" modification, the column-headings are in row 6.
As illustrated in the Grouping video (a copy of which is available for download from https://github.com/CIRDLES/Squid/issues/349), primary functionality of the Grouping algorithm in SQUID 2.50 is controlled by four mostly-independent controls:
- Integer piOverCtCorrType (portrayed as 3-option radio-button)
- Boolean pbGrpAll (portrayed as 2-option radio-button)
- Boolean pbExtractAgeGroups (portrayed as check-box)
- Boolean pbGrpCommPbSpecific (portrayed as check-box)
SQUID 2.50 applies these controls across ALL unknowns analysed concurrently with the set of StandardData analyses (in Squid3, we would say that all of these controls apply "Project-wide"). That's not necessarily the most sensible way for ALL of them to operate, and there is more about that in the individual descriptions below.
This controls whether any attempt will be made to correct ALL SampleData analyses for apparent overcounts (or undercounts) of 204Pb based on analyses of the primary 206Pb/238 (or 208Pb/232Th) reference material (i.e. StandardData). These can arise because count-rates at mass 204 are extremely low, so measurements there are vulnerable to inaccuracy (e.g. if ~0.1 counts/second of the total 204Pb count-rate is contributed by some nuclide that has mass 204 but is not actually 204Pb, significant systematic errors will arise in the 204Pb-based correction for common Pb, which will give inaccurate results). One way to assess the accuracy of 204Pb counts at session-scale is to monitor the (biweight mean) 204Pb-corrected 207Pb/206Pb date measured on the reference material.
The "Perm1 0725" SQUID-book we prepared for the Korea Workshop is an excellent example. IDTIMS data for the GSC reference zircon Z6266 is truly concordant: 206/238 age = 207/206 age = 559 Ma. But if you scroll to the right on the StandardData sheet and look at the base of the spot-rows, you’ll see that the biweight mean 204Pb-corrected 207Pb/206Pb date measured for the session is 544 +/- 9 Ma (95% confidence). This is not within error of the IDTIMS reference value, so there is a problem, which can often be traced to the accuracy of the 204Pb-correction. The fact that the measured mean is 'young' relative to the reference value suggests that 204Pb has been "overcounted" (i.e. 'too much' signal has been measured at mass 204, and some of it is not 204Pb). The spot-specific magnitude of this "overcount" is defined by ["204 overcts/sec (fr. 207)"] column on the StandardData sheet, with the biweight mean for the session coming out at 0.083 +/- 0.06 counts/second (95% confidence).
Some analysts might be concerned about the effect of these "extra" counts at mass 204Pb on the "204Pb-corrected" data obtained from their unknown samples. One way to correct for this effect is to essentially "subtract" 0.083 +/- 0.06 counts/second from the measured counts/second at mass 204 for each and every unknown analysis. The "Force concordance of 207Pb/235U - 206Pb/238U ages" radio-button enforces this subtraction for EVERY analysis across ALL Grouped-samples. Similarly, the "Force concordance of 208Pb/232Th - 206Pb/238U ages" radio-button enforces an analogous subtraction for EVERY analysis across ALL Grouped-samples, but based rather on 204Pb-corrected 208Pb/232Th, rather than 204Pb-corrected 207Pb/206Pb.
In arithmetical terms, the 204Pb-overcount correction potentially involves replacing the originally calculated ["204/206"] for all analyses on the SampleData sheet (i.e. across all sample-prefixes). It is important to note that in theory, the analyst cannot "choose" to apply 204-overcounts to some Grouped-samples but not others: in theory, if the StandardData set indicates the presence of discernible overcount, the correction ought to be applied to all the Grouped-samples in the same fashion (i.e. piOverCtCorrType is truly a "Project-wide" control, in Squid3 terms). In SQUID 2.50, the user must specify one of three options:
-
piOverCtCorrType = 0: "No overcount correction - use measured 204Pb" indicates that ["204/206"] as originally calculated for the SampleData sheet should be retained through all subsequent calculations. This is also the default.
-
piOverCtCorrType = 1: "Force concordance of 207Pb/235U - 206Pb/238U ages" indicates that the originally calculated ["204/206"] should be recalculated spot-by-spot for all sample-prefixes (i.e. every row in SampleData). This recalculation uses the original ["204Cps"], ["BackgroundCps"] and ["206Cps"] values, and institutes the subtraction of the biweight mean ["204 overcts/sec (fr. 207)"] value from each and every spot-value. The associated spot-specific ["204/206 %err"] value is also recalculated via quadratic addition of the biweight 95% confidence interval of ["204 overcts/sec (fr. 207)"] to the other sources of uncertainty. The arithmetic is detailed in Part 7a.
-
piOverCtCorrType = 2: "Force concordance of 208Pb/232Th - 206Pb/238U ages" indicates that the originally calculated ["204/206"] should be recalculated spot-by-spot for all sample-prefixes (i.e. every row in SampleData). This recalculation uses the original ["204Cps"], ["BackgroundCps"] and ["206Cps"] values, and institutes the subtraction of the biweight mean ["204 overcts/sec (fr. 208)"] value from each and every spot-value. The associated spot-specific ["204/206 %err"] value is also recalculated via quadratic addition of the biweight 95% confidence interval of ["204 overcts/sec (fr. 208)"] to the other sources of uncertainty. The arithmetic is detailed in Part 7a.
This controls whether ANY attempt will be made to break out SampleData into sample-specific worksheets for subsequent calculations, based on 'sample-prefixes' specified using the first few characters of the spot-names. Meaningful sample-prefixes very usually do exist in real SHRIMP data, so pbGrpAll is FALSE the vast majority of the time. (It follows that pbGrpAll is necessarily a Project-wide control; it cannot be applied sensibly at any smaller scale.)
However, it is necessary to account for the pbGrpAll = TRUE case (which is most commonly used when a mount or analytical session analyses only a single unknown, and the 'sample-prefix' is omitted because it is unnecessary). Another important use-case for pbGrpAll = TRUE relates to "legacy" SHRIMP data, in this case meaning pre-2000 SHRIMP data which predated the advent of SQUID-1. At that time, SHRIMP mounts containing multiple samples were not very common, and the concept of 'sample-prefixes' was not widely applied by analysts. (In terms of practical usage by analysts, 'sample-prefixes' were in fact defined and popularised by SQUID itself, rather than having evolved independently.)
This controls whether any attempt will be made to extract a statistically coherent population of analyses of ONE specified "Group Date Type" (see list below), per iteration of Group Me. The SQUID 2.50 master-list of Group Date Types, indexed by the integer piGrpDateType, as follows:
0 = Total (uncorr) 206Pb/238U Age (by default, this is not available to users)
1 = 204corr 206Pb/238U Age
2 = 207corr 206Pb/238U Age
3 = 208corr 206Pb/238U Age
4 = 204corr 207Pb/206Pb Age
5 = 204corr 208Pb/232Th Age
6 = 207corr 208Pb/232Th Age
7 = 208corr 207Pb/206Pb Age
So the mathematical procedure governed by pbExtractAgeGroups is essentially the calculation of a one-dimensional, inverse variance weighted mean similar to that calculated for StandardData via "WtdMeanA", with the important difference that usually no attempt is made to measure or quantify the "external error"; the procedure attributes excess scatter (relative to the specified probability-of-equivalence threshold) to geological disturbance, and therefore seeks to identify and eliminate "outliers" from the 1D WtdAv calculations.
For each sample-prefix (noting that when pbGrpAll = TRUE, the 'sample-prefix' corresponds to the entire contents of the SampleData sheet), the ExtractAgeGroups process finds the largest population of spot-analyses for which a weighted mean with a probability-of-fit above a user-specified threshold (usually 0.05) can be calculated. In addition, this largest population must meet a user-specified 'proportion threshold' relative to the total number of analyses with that sample-prefix (usually 0.20).
SQUID 2.50 exerts some control (via 'spinners' on the Group Me form) over the options available to the user for both thresholds. For probability-of-fit, the minimum value is 0.01, followed by 0.05, then increments of 0.05 up to 0.40 (which broadly corresponds to MSWD ~ 1), with the most commonly used value being 0.05 (for "95% confidence"). For population proportion, the minimum value is 0.20 (i.e. 20% of the total number of spots with that sample-prefix), then increments to 1.00, with the most commonly used value being 0.20 (essentially because it encompasses all higher values).
The most usual setting for pbExtractAgeGroups is TRUE, because the extraction of a coherent age for the specified Group Date Type "does no harm" (i.e. an analyst is always free to subsequently modify the Group, or ignore it completely).
One quirk of SQUID 2.50 is that it is necessary to set pbExtractAgeGroups = TRUE in order to access spot-specific 'second iteration' common-Pb corrections corresponding to pbGrpCommPbSpecific = FALSE (see below). Furthermore, when pbExtractAgeGroups = FALSE, SQUID 2.50 unilaterally specifies pbGrpCommPbSpecific = TRUE, with the "user-defined" common-Pb ratios returned to their 'default' values (sComm64, sComm74, etc.) through the entire data-calculation process. Neither of these Boolean linkages makes sense in an "isotopic" context; essentially, there is no reason why they should not be completely independent of each other, and in the following sexctions, I have recast the SQUID 2.50 code in an effort to reflect this.
In SQUID 2.50, the user-specified choices of Group Date Type, pbExtractGroups and pbGrpCommPbSpecific are ALL applied at Project-scale, to ALL user-defined sample-prefixes (as per the Grouping video). This can be a useful feature, particularly when all the sample-prefixes in the Project derive from samples with a high degree of geological similarity. However, if there is sufficient diversity in the geological samples being analysed that some sample-prefixes would be better "grouped" using a different combination of Group Date Type, pbExtractGroups and pbGrpCommPbSpecific values is of interest, it is necessary to conduct a second iteration of the Group Me process.
A functional improvement that could be made in Squid3 would be to give the user the option of "devolving" the preferred combination of Group Date Type, pbExtractGroups and pbGrpCommPbSpecific to each sample-prefix. The question (for any SQUID user reading this!) is whether the option would be particularly attractive. It might be that manually selecting a combination of Group Date Type, pbExtractGroups and pbGrpCommPbSpecific for each of several sample-prefixes occurring in a single Project is unduly onerous. The reason it needs to be considered is that Squid3 might not have an equivalent to SQUID 2.50's multiple iterations of Group me according to different combinations of Group Date Type, pbExtractGroups and pbGrpCommPbSpecific parameters.
In SQUID 2.50, this controls the nature of the common-Pb correction applied to the set of Grouped-samples. When pbGrpCommPbSpecific = TRUE, every analysis across ALL of the Grouped-sample sheets is corrected using a single, constant set of common-Pb isotopic compositions (206Pb/204Pb, 207Pb/206Pb and 208Pb/206Pb) explicitly specified by the user. In the absence of suitable input, SQUID 2.50 defaults these values to present-day (0 Ma) values derived from the Stacey & Kramers (1975) model.
When pbGrpCommPbSpecific = FALSE (and in SQUID 2.50, this option is only available when pbExtractAgeGroups = TRUE, for reasons that are not clear; see above), every analysis across all of the Grouped-sample sheets determines its own, spot-specific common-Pb composition in order to perform the common-Pb correction, based on the 'preliminary' age of the analysis (as determined using sComm64, sComm76 and sComm86 during initial calculation of the SampleData sheet) and the Stacey & Kramers (1975) model. In practice, this involves a spot-specific, iterative calculation that ultimately matches the Stacey-Kramers age used to derive the relevant common-Pb compositions, with the common-Pb corrected age of that sample spot.
In general, most geological applications are best served by pbGrpCommPbSpecific = FALSE, because even in scenarios dominated by analyses of similar-aged/cogenetic samples, the overall behaviour of pbGrpCommPbSpecific = FALSE converges towards that established by pbGrpCommPbSpecific = TRUE, while continuing to flexibly accommodate isolated departures from the assumption of a cogenetic suite of analyses.
However, when pbGrpCommPbSpecific = TRUE, there is no geological reason why the same specific values need be used across all the Grouped-samples in a Project (unless the Grouped-samples happened to have very similar geological character). Usually it would be desirable to choose the applicable specific values on a Grouped-sample-by-sample basis. So this is another parameter for which "devolution" from Project-scale to Grouped-sample-level might be useful.
Part 7 (Group Me) comprises three fairly distinct segments of code and calculations, in order:
- Part 7a (204-overcounts)
- Part 7b (Common Pb correction), including subroutine SetSKageForCPb and documentation for intended LudwigLibrary function SingleStagePbR
- Part 7c (ExtractGroup), which mostly just ties together subroutines ExtractGroup, FindCoherentGroup, SimpleWtdAv, and sqConcAge