-
Notifications
You must be signed in to change notification settings - Fork 1
ML3DClassificationParameters
Adrian Quintana edited this page Dec 11, 2017
·
1 revision
- Initial reference map: this map does not need to contain high frequencies (as we recommend filtering this map anyway, see below). But the low-resolution features of the map ARE very important. Incorrect maps may lead to failure in the classification, as the refinement protocol may suffer from model bias. An indication for problems related to model bias is a run where the reference maps hardly change their overall features, but only build up noise. In that case, create an alternative model (preferably from the data themselves and not from a pdb or a different dataset) and try again. In quite some experimental cases encountered thus far, the quality of the initial model was of crucial importance to the successful classification of the data.
- Correct the grey scale?: This is only necessary if your initial reference comes from a different package and is not on the absolute grey-scale of the experimental images (because for example it has been normalized). Anyway, this step is relatively fast and usually does no harm, so switch it on in case of doubt. An indication that your map was not on the right greyscale is a very elongated reference after running ML3D classification without correcting for the grey scale.
- Low-pass filter initial reference?: Apart from the quality of the initial reference map, this is the second most important parameter for setting up the ML3D classification! Its importance lies in removing bias towards a predominant conformation in your data set or towards a incorrect feature in your initial reference. Therefore, we recommend filtering "as much as you can". That means, still allowing correct convergence. Try different values (50, 60, 80 Ang) and see whether ML3D classification still converges to something reasonable, or whether your map only becomes "a ball".
- Number of seeds to generate: We strongly recommend following this protocol to generate random (bias-free!) seeds. The number depends on your expectation about the variability in the sample and is ultimately limited by the size of your data set. Usualluy, one runs multiple jobs with different numbers of seeds.
- Angular sampling: For computational reasons, usually we recommend not to go below 10 degrees.
- Number of CPUs to use in parallel: Once inside the ML3D classification (i.e. the most computationally intensive part), the algorithm consists of three separate step: 1. project the reference maps 2. calculate probabilities for all orientation/translations of each image and 3. reconstruct the new references. Keep in mind that ONLY step 2 is performed in parallel. Step 1 is usually very fast (because of the relatively coarse angular samplings), but especially for large (>64x64 pixels) images the reconstruction step (wlsART) can be relatively slow. Therefore, especially for small data sets with large images, sometimes it is not worth to use a very large number of CPUs as they will all be waiting during the reconstruction step. You may have a look at the stderr to see how long each step takes.
-- Main.SjorsScheres - 19 Oct 2007