Skip to content

SortByStatistics

Adrian Quintana edited this page Dec 11, 2017 · 1 revision

sort_by_statistics

Purpose

This utility sorts a set of images based on simple image statistics (mean, stddev, means in quadrants etc.). The output is a sorted selection file that may be used to quickly identify junk particles in the data set.

Usage


$ sort_by_statistics [input] ...


Parameters

  • `` Input selfile of the images to be sorted
  • `` Rootname for output files (sort_junk by default)
  • `` Selfile with "nice" particles to use for training the sorting
  • `` Cut-off value for Z-scores (1 by default, use negative numbers to use no cutoffs)

How does it work?

This program calculates a number of different features for each image. Currently, these features are: mean, stddev maximum, minimum, number of pixels with intensity > 1 stddev, number of pixels with intensity < 1 stddev, stddev of average densities within 4 quadrants, and the values of the 1st, 2nd, 3rd and 4th component of the radial average of the image power spectrum. This list could potentially be expanded with other (better?) features. If you have any good ideas, use the form below.

For each feature one calculates the mean (\mu) and the stddev (\sigma) of that feature over the entire data set. Then, one calculates a Z-score of that feature for each image:

Z^{\text{feature}}{i} = \frac{|\text{feature}i - \mu{\text{feature}}|}{\sigma{\text{feature}}}

If one uses a Z-score cutoff (option-zcut), then: ifZ^{\text{feature}}{i}< \text{cutoff}thenZ^{\text{feature}}{i}=0. If one uses a training set (option-train) then the\muand\sigma for all features are calculated over the training set only, instead of using the entire input set.

The number used for the sorting is just the summation over all features:

Z_i = \sum_{\text{feature}} Z^{\text{feature}}_{i}

Examples and notes

Sort all images in a selfile:


$ sort_by_statistics -i all_images.sel 


Or using a selfile with pre-selected "nice" particles:


$ sort_by_statistics -i all_images.sel -train my_nice_images.sel


And then visualize the results using:


$ show -sel sort_junk.sel


The nicest particles will typically be on the top, while the bad particles (outliers) will be at the bottom. The sum of all Z-scores used in the sorting will be stored in a file calledsort_junk.sumZ, which may be visualized using your favorite 2D plotting software (gnuplot, xmgr, etc.). If you want even more control, you may have a look at thesort_junk.indZ file, which stores the individual Z-scores for each of the features calculated. It also generates asort_junk_good.sel with those images whose Z-score is smaller than the Z-cut.

USER SUGGESTIONS

--Main.SjorsScheres - 11 Mar 2008

Clone this wiki locally