Discussion with Davide 10 Sep 2010
Had a brainstorm with Davide about what we should analyse. We decide to use the Geolife dataset to test out some ideas:
Firstly I will do a meta-analysis of the data, to see what nodes we do and do not have data for in a given time period. Counting the number of reading per day over the dataset period.
We want to identify the colocation between nodes, and we calculate this
which means – if A and B are within λ distance of each other (where x is the location of a and y is the location of b) at time t, Cab is 1, else Cab is 0.
This gives us the average over all time periods, showing the co-locatedness of a and b. (the sum of the co-location of A and B over all locations, divided by the number of time periods)
We use this to construct a graph, where a and b are connected using a weighted edge labelled with Cab – this network is the structure of co-locatedness, and is exactly the same as a proximity network; it does not care where you meet, only that you do meet.
We calculate this in 1 month(?) blocks to see how the graph evolves over time.
We are also interested to find out if nodes are influential over locations (e.g. if particular nodes visit a location, which then becomes popular).
We plot the popularity of x over time, where the popularity is the number of poeple that visit a location in a day. Where the set of X is derived from all known locations.We hope to find a pattern where the graph tails off, or increases over time. flat graphs are un-interesting.
For this I will need to calculate locations from the data.