-
Notifications
You must be signed in to change notification settings - Fork 1
MLalign2D
This utility allows you to perform (multi-reference) 2D-alignment, using a maximum-likelihood (ML) target function. The mathematical details behind this approach are explained in detail in
Scheres et. al. (2005) J. Mol. Biol. 348(1), pp. 139-149
Please cite this paper if this program is of use to you!
Our recommended way of performing ML alignment is to introduce as little bias in the intial reference(s) as possible. This can be done by calculting average images of random subsets of the (unaligned!) input experimental images, using the-nref
option. Note that the estimates for the standard deviation in the noise and in the origin offsets are re-estimated every iteration, so that the initial values should not matter too much, as long as they are "reasonable". For Xmipp-normalized images, the standard deviation in the noise can be assumed to be 1. For reasonably centered particles the default value of 3 for the offsets should do the job.
The output of the program consists of the refined reference images (weighted averages over all experimental images). The experimental images are not altered at all. In terms of the ML approach, optimal transformations and references for each image do not play the same role as in the conventional cross-correlation (or least-sqaures) approach. By default, this program does not output optimal transformation and reference assignment parameters. However, using the-output_docfile
these values can be output, and subsequently the headerinfo program may be used to assign these values to the headers of the experimental images. Note that if the mirror flag was set in theMLalign2D, it should also be used in the headerinfo program!!
This program can also be used for reference-free 2D-alignment using only a single reference: just supply-nref 1
and save the optimal transformations using-output_docfile
. Thereby this program becomes a (more robust?) alternative for the pyramidal combination of single images to create an initial reference structure in the align2d program.
Although the calculations can be rather time-consuming (especially for many, large experimental images and a large number of references), we strongly recommend to let the calculations converge. In our experience this takes in the order of 10-100 iterations, depending on the number images, the amount of noise, etc. The default stopping criterium has yielded satisfactory results in our experience. A parallel version of this program has been implemented in MPI_MLAlign2D.
$ ml_align2d ...
Parameters
- `` Input selfile with experimental images
-
__OR__
Selfile with initial reference images / single reference image OR number of reference images. The latter will automatically generatenref
references by averaging overnref
equally-sized random subsets of the input selfile -
-o [rootname
MLalign2D] = Rootname for all output files -
-noise [float
1] = Initial estimate for the standard deviation of the pixel noise. This parameter is re-estimated every iteration. The default initial value of 1 is adecuate if you normalized the images to a standard deviation of 1 -
-offset [float
3] = Initial estimate for the standard deviation in the origin offsets. This parameter is re-estimated every iteration - `` If this flag is set, then also the mirror versions of the images will be checked
- ``Please cite this paper if this option is of use to you See:
Scheres et.al. (2005) Bioinformatics 21 (Suppl 2), ii243-244 If this flag is set part of the integration over all references, rotations and translations is skipped. The program will store all (N_imgs*N_refs
) origin offsets that yield the maximum probability of observing each experimental image, given each of the references. In the first iterations a complete integration over all references, rotations and translations is performed for all images. In all subsequent iterations, for all combinations of experimental images, references and rotations, the probability of observing the image given the optimal origin offsets from the previous iteration is calculated. Then, if this probability is not considered "significant", we assume that none of the other translations will be significant, and we skip the integration over the translations. A combination of experimental image, reference and rotation is considered as "significant" if the probability at the corresponding optimal origin offsets is larger than C times the maximum of all these probabilities for that experimental image and reference (by default C=1e-12). This version may run up to ten times faster than the original, complete-search approach, while practically identical results may be obtained.
-
-C [float
1e-12] = This flag serves to set the "significance criterion" C, as explained for the-fast
option. The default value of 1e-12 gave satisfactory results in our experiments. Larger values (e.g. 1e-8) will result in faster program execution, but at an increased risk of deviating from the complete-search approach - `` Use all-zero optimal origin offsets in the first iteration. This will speed up a lot the first iteration if the -fast option is also given. Note that using this option may be a bad idea for particles that are poorly centered.
-
-psi_step [float
5] = In-plane rotation sampling interval [deg] -
-frac [docfile
""] = Docfile with expected model fractions. The first colum of this docfile should contain the expected model fraction for the reference image in the corresponding selfile (with the same order). Note that the sum of expected fractions for all models should sum to 1. If the-mirror
option is given, then the second column should contain the expected fraction ([0,1]
) of mirror-related images for that model. The default option (no docfile is given) initially assumes an even distribution of the references is assumed, each with a 0.5 fraction of mirror-related images. Anyway, these distributions are re-estimated every iteration -
-eps [float
5e-5] = Stopping criterium. If the power of the changes in ALL reference images with respect to the previous iteration is less than the eps times the power of the corresponding reference images, then the program will stop. In our tests, the default value of 5e-5 gave good results -
-iter [int
100] = Maximum number of iterations to be performed. It is strongly recommended to let the calculations converge until the eps stopping criterium is reached! - `` Restart a run that has ended too soon
- `` Do not re-estimate the standard deviation in the pixel noise
- `` Do not re-estimate the standard deviation in the origin offsets
- `` Do not re-estimate the model fractions
- `` If this flag is set, then no document file is output. The default is to output a docfile, with for each experimental image the optimal transformation and reference (i.e. those that give maximum probability)
- `` If this flag is set, no selection file is output for each reference image. The default is to output these selfiles, containing the experimental images that are "assigned" to these references, i.e. they score the highest probability weight for that reference. These last two parameters may be useful to perform a classical classification: in terms of which image belongs to which reference, and with which alignment parameters
- `` If this flag is set, then no intermediate files are written after each iteration
- `` Save memory by trading it for (quite a lot of CPU): in the -fast version of the algorithm, this option will recalculate all real-space rotations on the fly.
- `` Cheaper way of saving memory (recommended over -save_memA and -save_memC): limit probability-weighted averages up to translations of 3 times the estimated standard deviation of all translations. (By default, up to 6 standard deviations are calculated).
- `` Only for the very desperate: save a lot of memory by rotating the experimental images instead of keeping rotated versions of all references in memory. NOT RECOMMENDED, as rotating noisy images leads to strong interpolation artefacts.
- `` Use internal normalization (MLn2D algorithm) for data with normalization errors.
A typical use of this program would be:
$ ml_align2d -i g0u.sel -nref 3 -o MLg0u -mirror
Which would perform a maximum-likelihood 2D-alignment, using 3 reference images, with the experimental images ing0u.sel
. The program will output reference images calledMLg0u_it?????_ref0000[1-3].xmp
and a logfile calledMLg0u_it?????.log
after each iteration. A selfile calledMLg0u_alliter.sel
will be updated constantly, containing all refined reference images of all iterations. After the program has stopped, final reference imagesmultiref_ref0000[1-3].xmp
, as well as selfileMLg0u.sel
(containing only the final references) and logfileMLg0u.log
will be output. TheMLalign2D-logfiles are actually Xmipp-style document files, with the following fixed format:
; MLalign2D-logfile: Number of images= 4494 LL= -2.61e+07 <Pmax/sumP>= 0.72077
; -noise 0.9959504 -offset 2.128339 -istart 101
; -ref refs.sel -i g0u.sel -nref 3 -o MLg0u -mirror
; columns: model fraction (1); mirror fraction (2); 1000x signal change (3)
; MLg0u_it00001_ref00001.xmp
1 3 0.01694 0.40617 0.06089
; MLg0u_it00001_ref00002.xmp
2 3 0.01959 0.52284 0.04798
; MLg0u_it00001_ref00003.xmp
3 3 0.00752 0.53236 0.03550
In the first line,LL`` is the total log-likelihood. This value should increase during the optimization process. =<Pmax/sumP>`` is the average over all images of the Pmax/sumP-value (also output for each image seprarately if the =-output_docfile
option is given). ThePmax/sumP-value
for each experimental image is the maximum probability divided by the sum of all probabilities (for all rotations, translations and references). If this value approaches 1, then almost all probability weights are assigned to a single transformation and reference, and thus the difference between the ML and LSQ target functions disappears.
In the second line,-noise
is the re-estimated value for the std.dev. in the noise in the pixel values,-offset
the re-estimated std.dev. in the origin offsets, and-istart
the next iteration number.
The third line of the logfile contains all command line arguments that were given to run this program.
The fourth line is a comment explaining the structure of the rest of the file: For each reference image (filename as a comment on the line above), a document line with the re-estimated model fraction, mirror fraction and 1000x the relative signal change are given, respectively. If for all references the relative signal change drops below eps (default= 5e-5), then the calculation is stopped.
The specific structure of this logfile allows using it as-frac
argument, OR for restarting a run that has ended too soon (for example, in the logfile given above the first reference image has not reached the default stopping criterium yet).
The syntax for restarting this run would be:
$ ml_align2d -restart MLbentffG01_it00002.log
Any argument given on the second line will overrule arguments given on the third line of the logfile. So the maximum number of iterations to be performed could be increased by setting-iter 200
on the second line.
Conventional classification
To perform a conventional "alignment and classification":
$ ml_align2d -i g0u.sel -nref 3 -o MLg0u -mirror
The program will also output a document file calledMLg0u.doc
and three selection files calledMLg0u_ref0000[1-3].sel
. The document file contains all assignment parameters (based on maximum probability weights). The angles and origin offsets can be applied using the headerinfo program (note the-mirror
flag, which is required if it was also used inMLalign2D!!):
$ headerinfo -assign -i MLg0u.doc -mirror
The selection filesMLg0u_ref0000[1-3].sel
contain the experimental images assigned to the corresponding references. Now if the R-value has converged to a value close to one, the following command should yield an average image resembling closely the (first) refined reference image:
$ average -i MLg0u_ref00001.sel
--Main.AlfredoSolano - 25 Jan 2007