-
Notifications
You must be signed in to change notification settings - Fork 1
SortByStatistics
This utility sorts a set of images based on simple image statistics (mean, stddev, means in quadrants etc.). The output is a sorted selection file that may be used to quickly identify junk particles in the data set.
$ sort_by_statistics [input] ...
Parameters
- `` Input selfile of the images to be sorted
- `` Rootname for output files (
sort_junk
by default) - `` Selfile with "nice" particles to use for training the sorting
- `` Cut-off value for Z-scores (
1
by default, use negative numbers to use no cutoffs)
This program calculates a number of different features for each image. Currently, these features are: mean, stddev maximum, minimum, number of pixels with intensity > 1 stddev, number of pixels with intensity < 1 stddev, stddev of average densities within 4 quadrants, and the values of the 1st, 2nd, 3rd and 4th component of the radial average of the image power spectrum. This list could potentially be expanded with other (better?) features. If you have any good ideas, use the form below.
For each feature one calculates the mean (\mu) and the stddev (\sigma) of that feature over the entire data set. Then, one calculates a Z-score of that feature for each image:
Z^{\text{feature}}{i} = \frac{|\text{feature}i - \mu{\text{feature}}|}{\sigma{\text{feature}}}
If one uses a Z-score cutoff (option-zcut
), then: ifZ^{\text{feature}}{i}< \text{cutoff}thenZ^{\text{feature}}{i}=0. If one uses a training set (option-train
) then the\muand\sigma for all features are calculated over the training set only, instead of using the entire input set.
The number used for the sorting is just the summation over all features:
Z_i = \sum_{\text{feature}} Z^{\text{feature}}_{i}
Sort all images in a selfile:
$ sort_by_statistics -i all_images.sel
Or using a selfile with pre-selected "nice" particles:
$ sort_by_statistics -i all_images.sel -train my_nice_images.sel
And then visualize the results using:
$ show -sel sort_junk.sel
The nicest particles will typically be on the top, while the bad particles (outliers) will be at the bottom. The sum of all Z-scores used in the sorting will be stored in a file calledsort_junk.sumZ
, which may be visualized using your favorite 2D plotting software (gnuplot, xmgr, etc.). If you want even more control, you may have a look at thesort_junk.indZ
file, which stores the individual Z-scores for each of the features calculated. It also generates asort_junk_good.sel
with those images whose Z-score is smaller than the Z-cut.
--Main.SjorsScheres - 11 Mar 2008