-
Notifications
You must be signed in to change notification settings - Fork 1
FSOM
FuzzySOM stands for "Fuzzy Self-Organizing Map". It maps a set of high dimensional input vectors into a two-dimensional grid using a modification of the well-known Fuzzy c-means functional. For more information see the following reference:
Smoothly Distributed Fuzzy c-Means: a New Self-Organizing Map.
Pascual-Marqui RD. Pascual-Montano AD. Kochi K. Carazo JM.
Pattern Recognition 34:2395-2402, 2001
$ classify_fsom -i ...
Parameters
- ``The input data file (raw file). It should be a text file with each row representing the data items and each column representing the variables. It should have the following format:
3 1000 12 34 54 -12 45 76 ... 32 45 76
The first line indicates the dimension of the vectors (in this case 3) and the number of vectors (in this case 1000). Please note that vector components (variables) are separated by empty spaces. Additionally, the last column can also be used as a label for the vector. Example:
3 1000 12 34 54 labelA -12 45 76 labelB ... 32 45 76 labelN
- `` The output code vectors file. This parameter will set the base name for the generated output files.BatchSOM produces several files with different information and all of them will use this name but with different extensions. The generated files will be:
-
[basename].cod
resulting code vectors. The generated code vectors also follows the same format as the input data, except that a few extra information is also stored in the first line of the file. Example:
-
3 rect 10 7 gaussian 11 31 52 labelA -10 43 71 labelB ... 29 39 71 labelN
The first line first indicates the dimension of the vectors (in this case 3), the topology of the map (in this case rectangular), the XY dimension (in this case 10x7) and the "gaussian" label that is only there to be fully compatible with the Kohonen's SOM_PAKHttp://www.cis.hut.fi/research/som_lvq_pak.shtmlPackage
-
[basename].inf
Information file about the parameters used and the resulting quantification error. It will look like this:
Kohonen BatchSOM algorithm Input data file : test.dat Code vectors output file : test.cod Algorithm information output file : test.inf Number of feature vectors: 150 Number of variables: 4 Horizontal dimension (Xdim) = 10 Vertical dimension (Ydim) = 7 Hexagonal topology Initial neighborhood radius (radius) = 10 Total number of iterations = 1000 Input data not normalized Quantization error : 0.349357
-
[basename].his
Information about the number of input vectors assigned to each code vector. It is like an histogram of the resulting code vectors. The file contains two columns, the first column is the number of the code vector and the second column is the number of input vectors assigned to it -
[basename].err
Average quantization error for each code vector. The file contains two columns, the first column is the number of the code vector and the second column is the average quantization error for each codevector - `` The input code vectors file. This parameter is optional and it is useful when the code vectors are going to be initialized with a set of predefined values. Usually when a several runs of the algorithm are going to be used and the output of one run is going to be used as input to the next one.
- `` Save a file for each code vector with a list of the input items that were assigned to it. It will generate a file for each codevector containing a list of the indexes of the input vectors assigned to it. Example: If a 10x7 map is used, then 70 files named
[basename].[Codevector Index]
(`baseneme.0`,`basename.1`, etc) will be generated. - `` Horizontal size of the map
- `` Vertical size of the map
- `` Rectangular Topology (Default)
- `` Hexagonal Topology. The following picture will help in inderstanding the differences between both topologies and the map axis convention: Xdim is ------> HEXAGONAL: O O O O O O O O O
O O O & & & O O O O O & @ @ & O O O O O & @ + @ & O O O O & @ @ & O O O O O O & & & O O O O O O O O O O O O
RECTANGULAR: O O O O O O O O O 0 O O O & O O O O O O O & @ & O O O O O & @ + @ & O O O O O & @ & O O O O O O O & O O O O O O O O O O O O O
- `` Use truly randomized codevectors. The code vectors are initialized to real random values
- `` Smoothness factor (Default = 0.5) It is important to note that when this parameter is set to zero, thenFuzzySOM reverts to the well known Fuzzy c-means clustering algorithm.
- `` Iterations number (Default = 1000)
- `` Stopping criteria. This means that the algorithm will stop when the codevectors value don't change more than eps between iterations or when the number of iteration steps are reached. By default a value of
1e-7
is used. - `` Normalize input data (Default = No)
- `` Information level while running:
- `` No information (default)
- `` Progress bar with the elapsed time and estimated time to finish
- `` Code vectors changes between iterations
- `` Number of deterministic annealing steps. It should be a value > 1. By default 1000 is used
- `` Initial fuzzy constant (Default = 2). It should be a value > 1
- `` Final fuzzy constant (Default = 1.01). It should be a value > 1
If deterministic annealing steps are used, then the for each step a different fuzzy constant is used, varying fromm0
down tom1
. This will preventFuzzySOM to fall into an undesirable local minima of the cost function it optimizes. If no deterministic annealing steps are used (steps`` 0), then, the algorithm will only produce one run and =m1
is going to be used as the fuzzy constant.
Additionally, ifreg
0= andm0
andm1
are close to 1, then theFuzzySOM algorithm reverts into the classical k-Means algorithm
Example 1: Maps a set of data stored in "test.dat" file into a 10x7 rectangular map
$ classify_fsom -i test.dat -o test -xdim 10 -ydim 7
In this case the following parameters are set by default:
Input data file : test.dat
Output file name : test
Horizontal dimension (Xdim) = 10
Vertical dimension (Ydim) = 7
Rectangular topology
Initial Fuzzy constant (m0) = 2
Final Fuzzy constant (m1) = 1.01
Smoothness factor (reg) = 0.5
Deterministic annealing steps = 1000
Total number of iterations = 1000
Stopping criteria (eps) = 1e-07
verbosity level = 0
Do not normalize input data
So, we are going to generate an 10x7 (-xdim
10 and-ydim
7) output map using 1000 iterations (-iter
1000). A rectangular topology is going to be used (-rect
). The algorithm will stop when the code vectors don't change more than 1e-7 between iterations (-eps 1e-7). In this case no textual information will be given in the output console (-verb
0).
As results, theFuzzySOM application will generate the following output files:
-
test.cod
The final code vector file in the format described above -
test.inf
Information file about the parameters used and the resulting quantification error -
test.his
Information about the number of input vectors assigned to each code vector. It is like an histogram -
test.err
Average quantization error for each code vector
Example 2: Maps a set of data stored in "test.dat" file into a 10x7 hexagonal map using no deterministic annealing steps
$ classify_fsom -i test.dat -o test -xdim 10 -ydim 7 -steps 0 -reg 4.5 -verb 1 -m1 2.5 -hexa
In this case the following parameters are set by default:
Input data file : test.dat
output file name : test
Horizontal dimension (Xdim) = 10
Vertical dimension (Ydim) = 7
Hexagonal topology
Fuzzy constant (m1) = 2.5
Smoothness factor (reg) = 4.5
Deterministic annealing steps = 0 (no deterministic annealing)
Total number of iterations = 1000
Stopping criteria (eps) = 1e-07
verbosity level = 1
Do not normalize input data
In this case we are going to generate an 10x7 (-xdim
10 and-ydim
7) utput map with a regularization factor of 4.5 (-reg
4.5) . An hexagonal topology is going to be used (-hexa
) The algorithm will stop when the code vectors don't change more than 1e-7 between iterations (-eps
1e-7). In this case a progress bar and elpased/estimated time will be shown in the output console (-verb
1).
The following files are going to be generated:
-
test.cod
The final code vector file in the format described above -
test.inf
Information file about the parameters used and the resulting quantification error -
test.his
Information about the number of input vectors assigned to each code vector. It is like an histogram -
test.err
Average quantization error for each code vector
Example 3: Use Fuzzy c-means (no regularization) to cluster a set of data stored in "test.dat" file into a 5 clusters
$ classify_fsom -i test.dat -o test -xdim 5 -ydim 1 -steps 0 -reg0 0 -verb 1 -saveclusters -m1 2.5
In this case the following parameters are set by default:
Input data file : test.dat
output file name : test
Horizontal dimension (Xdim) = 10
Vertical dimension (Ydim) = 7
Fuzzy constant (m1) = 2.5
Smoothness factor (reg) = 0
Deterministic annealing steps = 0 (no deterministic annealing)
Total number of iterations = 1000
Stopping criteria (eps) = 1e-07
verbosity level = 1
Do not normalize input data
In this case we are going to generate 5 cluster centers (-xdim
5 and-ydim
1). The algorithm will stop when the code vectors don't change more than 1e-7 between iterations (-eps
1e-7). No deterministic annealing is used (-steps
0). The fuzzy constant used is 2.5 (-m1
2.5). Since the-saveclusters
parameter is used a list of input data assigned to each cluster is stored in thetest.0
totest.4
files. In this case a progress bar and elpased/estimated time will be shown in the output console (-verb
1).
In this case, the following files are going to be generated:
-
test.cod
The final code vector file in the format described above -
test.inf
Information file about the parameters used and the resulting quantification error -
test.his
Information about the number of input vectors assigned to each code vector. It is like an histogram -
test.err
Average quantization error for each code vector -
test.0
totest.4
Each file is a list of the input data vectors assigned to each codevector (cluster)
It is important to note that since no regularization is used the algorithm reverts into a clustering technique named Fuzzy c-means, so the codevectors in this case are considered as cluster centers.
--Main.AlfredoSolano - 24 Jan 2007