Home > Datasets, Thoughts, what i've been doing > Location clustering so far.

Location clustering so far.

October 21st, 2010 Leave a comment Go to comments

I have had a chance to look at the code that Derek sent me, namely the CFinder software for finding k-Cliques, and an implementation of the MDS  algorithm, I struggled to get these to make any sense of the data that I have, and started to think about an alternative approach.

Despite the decision we made in a previous meeting, to consider location readings globally, regardlessof order/time, I decided to consider using the nature of the data to inform the clustering algorithm, namely, using the time ordering to indicate the relationship between readings. This meant going back to the idea of using readings from  individuals, to determine the places that are relevant to them. This of course may have issues when comparing places from different individuals, but no more than when implementing a simulator (or real world) system, where individuals track their own movement patterns.

By taking time ordering into consideration, we can implicitly detect when a node is moving from one place to another, by measuring the distance between the current reading, and the centroid of previous readings from a known place.

- take each location reading for a person, and measure the distance
            between it and the center of the current cluster
    - if the distance of reading from the center of current cluster is greater
        than the mean distance of points in the cluster to the center + alpha,
        then it is outside of the cluster.
        - read ahead by X (to make sure this is not just an erroneous reading)
        - if X readings are beyond the threshold of Current cluster
          - then we have left the location,
              - record the current cluster if it has more than
                  N location readings
              - start a new cluster with the current reading
        - else
          - discard the erroneous reading
    - else
      - add to list of cluster points
      - re-calculate cluster centre

Where alpha is a metric that allows the cluster to grow, alpha can be fixed (say 50m) or a function of the mean distance to the centre of the cluster. The number of readings to read ahead (X) should probably be fixed at say, 10, so that if the person goes away and comes back very quickly, they are considered to have not left.

The minimum number of readings required to consider a cluster valid should be relatively large, say 10 or 20, so that only places where the user lingers for some time, are considered. If we use a number too high, then we will miss some important places. A number too small would mean that places where users just pause for a while will be recorded, which may not be important places.

The algoritm still need improvement, so that it also considers the time between readings; for example, rather that considering the number of readings in a cluster, consider the amount time between the first and most recent reading, and use this to determine whether a place is recorded. This means that we should be able to record important places even if readings were not taken for some reason.

  1. No comments yet.
  1. No trackbacks yet.