Skip to content
Adrian Quintana edited this page Dec 11, 2017 · 1 revision

classify_sammon_projection

Purpose

Sammon is a non-linear mapping technique intended to map a set of high-dimensional input data into a lower dimensional space (usually 2) by trying to preserve the distances and local geometric relations of the original space. Sammon program receives a set of feature vectors as input and produces a set of output vectors in a low dimensional space (-dim parameter).

Usage


$ classify_sammon_projection ...


Parameters

  • ``The input data file (raw file). It should be a text file with each row representing the data items and each column representing the variables. It should have the following format:
 3 1000 12 34 54 -12 45 76 ... 32 45 76 

The first line indicates the dimension of the vectors (in this case 3) and the number of vectors (in this case 1000). Please note that vector components (variables) are separated by empty spaces. Additionally, the last column can also be used as a label for the vector. Example:

 3 1000 12 34 54     labelA -12 45 76   labelB ... 32 45 76     labelN 
  • `` The output code vectors file. This parameter will set the base name for the generated output files. Sammon produces two output files,basename.sam where the mapped input vectors are stored and`basename.inf` with information about the parameters used and the produced sammon stress (mapping error). The generated data also follows the same format as the input data. Example:

2 1000 0.4 0.2 labelA -0.1 0.3 labelB ... 0.2 0.5 labelN The first line first indicates the dimension of the vectors (in this case 2 but it depends on the-dim parameter) and the number of vectors (in this case 1000). The rest of the lines represent the mapped input vectors. The number of output vectors is the same as the number of input vectors, except if two input vectors are identical where one of them is removed. *`` This parameter will set the dimension of the output space. Sammon will take a set of vectors in some high dimensional space and produce a set of corresponding vectors in the dim-space by trying to preserve the distances and local geometric relations of the original space. (by default 2 is used)

  • `` his parameter will set the number of iterations used in the algorithm. By default 100000 is used. (NOTE: a very high number of iterations are desirable for aceptable statistical accuracy)
  • `` If this parameter is used (by default is not used) the input data will be normalized
  • `` Information level that is given as output while running:
    • `` No information (default)
    • `` Progress bar with the elapsed time and estimated time to finish.
    • `` Sammon stress (mapping error) in each iteration.

Examples and notes

Example 1: Map a set of data stored in "test.dat" file into a two-dimensional space


$ classify_sammon_projection -i test.dat -o test.sam


In this case the following parameters are set by default:


Input data file : test.dat
Output file : test.sam
Algorithm information output file : test.inf
Output space dimension = 2
Input data not normalized
Total number of iterations = 100000


A two dimensional output file is generated (test.sam) storing the new mapped data. The algorithm information file is stored intest.inf

--Main.AlfredoSolano - 26 Jan 2007

Clone this wiki locally