As I mentioned in my previous post, currently I am working on this idea that how to see a dynamic spatio-temporal system like a meaningful! image. I should note that the word meaningful seems vague.
In many systems and in many applications, where we have lots of sensors in parallel, measuring a certain feature of the system, we can expect to see this image at each snap shot in each time frame. For example, suppose we are measuring the surface temperature in a spatial grid with10 by 10 rectangular cells. If in each cell we put a sensor we expect to see a spatial correlations between measurements of the sensors and if we are a bit lucky we can assume that there is a Gaussian distribution for this spatial correlations.
For many physical systems, this spatial correlation is very helpful in terms of static interpolations or in those applications that we are interested in dynamics of time varying fields.
But what if we have systems, where there is no spatial correlations between parallel streams beforehand? For example, we can think of the companies’ stock prices in NYSE as a system with thousands of parallel streams. Each stream is indicating the price of a specific company and in total we have a kind of time varying random field. Let say we have more than 5000 different stock values fluctuating in parallel in time. How a person interested in this market is able to see what is going on?
Of course, one idea would be to look at aggregate indexes such as NASDAQ, RUSSEL and so on. Or look at the averages of each Sector or Industry. But what if we are not interested in these top-down features or categories? Is there any way to have a better image of the market? Is it possible to simply put those similar stocks together to construct a random field similar to the random of field of temperature measurements?
Following diagram shows the average weekly price of 10 companies (sorted alphabetically) from year 2000 to 2014.
Or if we map the weekly price changes (first momentum of price values) we get something like this:
But that was only 10 symbols. The next diagram shows the average weekly price of 1000 companies in NYSE.
If we plot one snapshot of 1000 company time series in an arbitrary image (let say 25 by 40 pixels totally 1000 pixels), where the value of each pixel is corresponding to the weekly price change of one specific company, we have an image for each week of the market. Within each week (i.e. each image) I normalized the values to have a better result. In order to check if there is a correlation within each sector, I sorted the symbols (companies) based on their corresponding sectors. Then, adjacent pixels are showing the behavior of companies in the same sector. The following figure shows the first 100 weeks of the market (including 1000 companies) starting from first business day of 2000. As we could expect there is still no spatial correlation and the images although with a unique background, looks to be full of small random dots.
Jigsaw Puzzle And Spatio-temporal Similarity
If we look closely at one of the above images, we can assume that it is like a jigsaw puzzle. Then, our goal toward having spatial correlations of pixels would be to move pixels (i.e. companies assigned to those pixels) in a way to have a local similarity in values or let say to have a map where adjacent pixels have similar colors.
But the only problem is that if we have for example, 100 weeks and consequently 100 images, we need to solve 100 jigsaw puzzles at the same time! If in one image we move a company close to another similar company, we should make sure that this results to good neighborhoods in other images too. From this way that I explained it seems to be really hard to solve this problem. I must say that from computational point of view it is a hard problem indeed, but at least I know a very nice solution for that.
Suppose we are interested to have an image of the market and we have access to data of the market for K periods of time. In each time step, we have N parallel streams. We can represent this data with a matrix of N*K cells. In order to solve the above mentioned jigsaw puzzle, we have two parallel goals: 1- Find group of similar streams (companies) based on the similarity in their behavior in K time steps (considering all the images) and 2- Arrange similar groups in a spatial map in a way that similar groups are placed in similar locations. Note that the final image is not necessarily in the same size as of the original image if we let similar streams seat in the same pixel.
These two goals are the underlying ideas of the Self Organizing Map (SOM) that I have talked about before in my other posts. It simply solves this jigsaw problem by optimizing these two goals at the same time and as I will show you it gives a much better image of the market, using the same data used for the previous images.
Using weekly average momentums (weekly price differences) of 1000 companies for 756 business weeks from the beginning of 2000 to the end of July 2014, I trained a SOM with 10*10 rectangular grid.
Following figure shows a 2d- histogram of distribution of companies in a 10*10 grid. Size of the circles is proportional to the size of companies located in that pixel.
If our hypothesis about spatial correlation is true we expect to see smooth changing patterns on top of the trained SOM. Interestingly, SOM works very well for this data set. Rendering the weight vectors of the trained SOM for working weeks of years 2000 and 2001 admits that how SOM was able to solve our Jigsaw problem.
A full picture of 14 years is available here.
As we can see the original random images now have been transformed to images with a clear spatial correlations. I must say that first I was amazed by the shape of the identified patterns by SOM. It is so strange if the whole market behaves like this. As you can see there are few companies (the identified circle in the center of the image) behaving in opposite direction to all the other companies. Whenever, they are hot all the others are cold and vice versa. This is those kind of images that I was talking about. You see there is a two dimensional pattern, one spatial and one temporal.
This is the power of SOM by which now we created an imaginary field, where space has a meaning for itself. I stop here, but I must say I am not normally fan of just visualizations. I believe we can do more with these results. There are several steps that I will implement soon:
1- Analyzing the correlations between identified categories of companies and their other attributes such as sector and category
2- Now that we have an image for each time step, similar to the idea of contextual numbers, we can predict the whole behavior of the market. This would be extremely interesting application that I will focus on soon.
3- According to the underlying algorithm of SOM, each image can be seen as Gaussian Random Field. This gives us a lot of powerful functional abilities.
4- Being interested to an specific company, predictions of the price of that company can be supported by the predictions of all of its neighboring company in SOM.