One of the original approaches to explain the Self Organizing Maps (SOM) is to say that SOM is a topology preserving transformation from high dimensional space to a low dimensional space. This low dimensional space is usually a two dimension or one dimensional grid. From this point of view SOM can be compared to other dimensionality reduction and data unfolding methods such as LLE, ISOMap, t-SNE and other methods.
However, from the point of view of Topological Data Analysis,TDA, what is missing in majority of dimensionality reduction methods is the the role of Abstraction from the observed data. In fact, all the above mentioned methods create a one to one relation between data points from high-dimensional observations to the their transformed low dimensional representation. In applied topology, what is more important than transformation is to discover what is called as “the shape of data” in terms of topology and geometry. While, this idea seemed to me to be very old school science, where one is interested to find “the nature” and “universal properties” of the natural phenomena, it leads to very successful methods in exploratory data analysis tasks. For example, one of the leading companies in this field is http://www.ayasdi.com/ that is purely based on this ideas.
For a while, I was very curious to know more about TDA and its applications, but what I realized that although it is backed very well with strong mathematics of topological studies, there are not many practical algorithms in this area. However, when I was reading an article by Gunnar Carlsson about TDA and topological networks, I found out that SOM can be perfectly used for TDA. For example, to discover the shape of data as Gunnar Carlsson explains here.
Finally, after finishing up my thesis, during last weekend I decided to do a series of experiments and now I am more convinced than before that SOM can be used perfectly to make an abstract shape of the data. This abstract shape has an important property that if there are distinct clusters in the observed phenomena, regardless of the dimensionality of the data, they can be visualized with a graph. Instead of explaining more, in what follows I would like to show some examples. Imagine that we have a two dimensional data as it is shown in the next figure.
In this synthetic case we have four distinct clusters. Now If I train a one dimensional SOM to this data sets, it folds itself as follows:
A trained SOM will get to this:
As we could expect, SOM will keep the shape of this data set. However, the problem is that this is a two dimensional space that can be easily be visualized. What if we had 100 dimensions. From this point, TDA is playing an important role by using the abstract language of graph theory. In classical SOM nodes are connected by a given topology. In other words, nodes of SOM builds a fixed graph, where only adjacent nodes are connected together directly and during the training we try to give similar contexts (i.e. similar weight vectors) to the neighboring nodes. However, this rigid topology puts some constraints on the final graph. Although there are methods such as Neural Gas, where there is no fixed topology for the SOM, here my hypothesis was that this is not that important. Instead what is missing in the current SOM is a kind of post processing. What I did was to assume that after training of a SOM we have a set of nodes each with a unique high dimensional weight vector. Therefore, we can simply construct a similarity matrix of these nodes and then based on the ideas of TDA such as persistence diagram, with different similarity thresholds we will have different graphs, where if the similarity of two nodes is equal or larger than a given threshold, there will be an edge between those two nodes. Therefore, with different thresholds we expect to see different graphs, while there should be some consistent patterns in all the graphs. I implemented this procedure and it turned out that SOM perfectly identifies these patterns. For example, the following figure is for the first case with four distinct clusters:
Along the same line, we can use this method for other high dimensional data with unknown topological shapes. For example the next figure is the result of the same method applied to a collection of several elections in the state of Minnesota in US. As we could expect here, the structure of the identified patterns is not as simple as the previous synthetic data sets and needs further explorations. For this post I am going to stop here, but later I will write about other complimentary and of course necessary analysis techniques which come along this topological analysis.