### Archive

Archive for November, 2010

## Seminar/Presentation 29 Nov 2010

November 29th, 2010 1 comment

Presentation summary:

Routing in Human Interaction Networks

The goal of this seminar is to get some ideas from the participants, about the feasibility, and possible directions for this research.

We believe that by detecting the underlying patterns of human interaction involving movement and contacts between individuals, we can design a delay tolerant mechanism for passing messages between actors in the system (for example using mobile phones) that does not rely on fixed infrastructure. A system such as this should exhibit low cost (hop count, update overhead, delivery latency) and high efficiency (delivery ratio).

We believe that we can drive a routing policy by calculating and sharing metrics and meta-data at node level; from the contacts they make and the places they visit, without the need for a centralised system. In particular we are interested in using location to drive the messaging scheme.

We will show recent, related results of Persistence Based Routing vs existing schemes, some initial results from our simple schemes, and will ask for suggestions regarding the direction of our research into Location Based Routing.

Present: Pádraig Cunningham, Davide Cellai, Derek Green, Conrad Lee, Fergal Reid, Neil Hurley

Presented some background, and ideas about routing in human interaction networks (Slides – 1.2MB), someone noted tha Delivery Ratio and Latency are directly linked, with the suggested that the requirement for high delivery ratio and low latency, may not be intuitive in some situations. e.g. when someone who will cause a high latency, may be the only one to get the message across.

The presentation seemed to go well, however some parts I might not have delivered properly, meaning that I seemed to be skimming over a few bits.

• Consider the use of some sort of altered vector clocks, which keep a history, and can be shared with other nodes
• Partition the graph in the Reality Mining data, to identify the communities, then test the algorithms for only those communities
• Strong consensus to start using periodic patterns for predicting routes
• Neil suggested that I try to generate a dataset that does work well for LBR
• Ferhgal suggested what we had talked about before: finding out at what places, messages get sent

Reference to chase up:

1. PNAS 2005/2006 – LIBEN-NOWELL – ?GeoRouting in social networks?
2. SneakerNet
3. Talk to Martin Harrigan – who is using Vector Clocks in some clever way

There may have been other things I don’t remember – feel free to comment with additions!

Categories:

## Rotating the data

Met with Pádraig and Davide briefly to discuss the poor performance of the LBR algorithm. Pádraig had previously suggested that we run multiple simulations over different time slots, and Davide and I had talked about and planned to do split the runs up as follows:

Measuring the delivery ratio, and cost for PBR, LBR, Prophet, Random 1.0, Random 0.2,  Random 0.0 and Flooding, will give us a good picture about the nature of the data, and we will be able to see if the previous plots are representative of the data.

I will also try to adapt the ranking mechanism, to include only the top (most seen) 10% or 20% of cell towers, to see if that makes any difference to the results.

Update:
Using the method above, I ran the simulations for LBR, PBR, Unlimited Flood, Random 1.0,0.0,0.2 and Prophet. Below is the average delivery ratio over 18 runs, for each  protocol in 28 day time windows, at 7 day intervals,  between  27 Aug 2004 and 21 Jan 2005.

Average delivery ratio over 18 runs for each protocol, between Aug 2004 and Jan 2005

There does seem however, to be a few issues with the number of lines returned in the logs, this suggests that somehow, I have mis-calculated, probably just for the end  time period used, the start and end time period, but this seems to only account for the last few hours of each run (out of the month). So until I find the error, I think these results are roughly correct.

Categories:

## LBR, PBR, Flooding and Random comparison

This graph shows the delivery ratios from the start of the simulation for MIT-OCT dataset. It would appear that in fact, LBR performs poorly against the random algorithm, where the P is the probability of passing on the message (1 copy) to the met node. Interestingly though, when P = 1, the delivery ratio increases. The graphs below show the delivery cost for the different algorithms, spread across two plots, so that the very high costs of Random and Prophet do not dwarf the lower costs of LBR and PBR. These three plots together show that whilst the delivery ratio of  prophet and random are high, the cost is very high.

Delivery costs for three routing algorithms on the MIT-OCT dataset, spread over two plots for clarity.

Perhaps a combined metric for comparison is  cost over delivery ratio, this would perhaps give a better idea of general quality of the algorithm.

Categories: Uncategorized

## Location Based Ranking, effects on routing

Using the simple formula for calculating the ranking of users, as follows:

Popularity of user $i = \sum_j \tau_{ij} ( p_j )$
where $\tau_{ij}$ is the time (or number of times) user $i$ has visited the tower $j$$p_j$ is the popularity of tower $j$.

Create a list of towers, with an associated number of times the tower is reported in the dataset, call this the set $p$,

SELECT count(celltower_oid) as count, celltower_oid FROM cellspan GROUP by celltower_oid ORDER BY count DESC

then for each user, get a list of the cell towers, with an associated number of reading for that user, and call this set  $\tau$

SELECT count(celltower_oid) as count, celltower_oid FROM cellspan WHERE person_oid = {USER_ID} GROUP by celltower_oid ORDER BY count DESC

then, for each member of  $\tau$, multiply the number of time the tower is seen in  $\tau$  by the number of time the tower is seen in total from  $p$.

This yields a number for each user, which can be used when initialising nodes in ContactSim (aka LocationSim). I added new properties into the the configuration files, which reflected the ranking of each node, and created new classes in LocationSim, which reflected a new protocol called LBR. When a node is instantiated, is looks up its ranking metric with an oracle class (which reads the data from the configuration file). During the operation of the simulator, when a node get’s an onConnect() event call, it checks it messages to see:

1. if it has any message for the node it is connected to, it passes them on
2. if the node it is connected to has a higher ranking, then it passes all messages on

The initial results show a good delivery ratio compared to PBR

Delivery COST of messages for LBR and PBR using 7 day periodicity, where messaging starts at the first timestep (PBR 7 day Period Start, LBR Start), or after 1 week.

Delivery Ratio of messages for LBR and PBR using 7 day periodicity, where messaging starts at the first timestep (PBR 7 day Period Start, LBR Start), or after 1 week.

This suggest that simply knowing the rank of the users, based on location, significantly improves routing. However, I need to check the algorithm is working before I can say the results are accurate:

• Does the node remove it from it’s buffer when it passes the message on? Yes it does
• Is the node correctly recording deliveries?

Currently working on: Graph of all connected users. Taking a long time……

Categories:

## Meeting 12 Nov 2010

We discussed progress since last week, and I showed the plots from the MIT-Reality dataset, and decided that it probably is OK for determining whether nodes are co-located or not, and give some extra information about movements. I said that I would generate a graph of people connected during the same time period to the same cell tower.

Pádraig suggested a simple policy for routing, based on ranking of locations and consequently people that visit them, where a persons rank is based on the rank of the towers (places) he visits, and how many different towers he visits, making a person that moves around to different, highly ranked towers, more important for routing that someone that only visits low ranked towers, or perhaps even someone who only visits one highly ranked tower.

Davide said that he might have made some notes about this in the past and would look them up, and Pádraig said that it could be calculated within some probablistic framework. I noted, that this sort of scheme would really be routing towards the center of the ranking network, which Pádraig suggested was because it was routing based on the centrality of the network. But we decided that it is worth using this to at least test this idea out against a random routing protocol.

With regards the actual implementation, Pádraig explained his idea that we could do the ranking (at least to start with) before the simulation starts, and simple gives nodes fixed ranked positions, meaning that we don’t actually have to change any of the simulator code, the rank is just a property of the node itself. So, a routing policy is as follows:

on meeting another node:

foreach message in buffer

   if node is destination node
pass message
else if node has a higher rank and visits a
location that the destination node does
pass message
else
keep message

The idea being the the protocol uses the existing contact events to drive communications, but uses the fixed location ranking to form the routing policy.

Pádraig also said that we really need to have a conference target in mind, and we couldn’t really think of a suitable one, Davide suggested Ubicomp, but it has a low acceptance rate.

So, for next time, I will have generated a graph of connected individuals based on connecting to cell towers during the same time period, made sure I can generate the same results as Graham, generated  a ranking for individuals based on cell towers, developed a routing scheme based on this, and hopefully identified conference targets.

UPDATE:

Davide suggested the following calculation for determining the popularity of users, which can be used as a metric for routing.

Popularity of user $i = \sum_j \tau_{ij} ( p_j + c )$
where $\tau_{ij}$ is the time (or number of times) user $i$ has visited the tower $j$$p_j$ is the popularity of tower $j$ and $c$ is a parameter which tunes the importance of user mobility

Pádraig suggested that $c$ should be 0.

Categories:

## MIT-Reality Mining Dataset – location analyis

I have spent some time looking at the reality mining dataset, to see how useful it is for inferring location about individuals. Firstly I generated a matrix, People over visited cell towers, which was huge, due to the large number of cell towers in the dataset (32, 000). Then I used this data to generate a graph  of people in the dataset, where an edge between two people, if they have ever visited the same cell tower, independantly of whether they were there at the same time. This produced the plot below:

This graph shows people connected based on visiting the same cell towers, but not necessarily at the same time. (full size)

The two plots show the same data, but with well connected node 22 removed on the right. This data implies that most people are using the same cell towers at some points in the dataset, which is promising for using cell tower a  location metric. However, this does not show any information about whether nodes were connected at the same time.

To further clarify the data, I processed the information provided for cell towers, which is stored in two columns in the dataset – oid and name, for example:

OID Name
35 40342, T-Mobile
36 40772, T-Mobile
37
38 0,
39 883, AT&T Wirel
40 2353, AT&T Wirel

I seperated out the cell ids and carrier names  into two columns, which enabled me to get the distribution of carrier names as below:

This shows the number of cells for each known carrier reported in the dataset (30 Others refers to 30 distinct reported carrier names, who’s sum totat was 429 reported towers) (full size)

This suggests that many of the towers are provided by different carriers, so this suggests that, we might have distinct clusters of users, that for some reason are connected by a few – perhaps they ‘roamed’ onto a new network, or changed provider at some point. This needs further clarification, and may only be able to be measured using ‘brute force’, by constructing the graph of people connected based on c0-locations of cell towers.

So far, the analysis is inconclusive, which means it will need a little more investigation. However, my feeling is that we WILL have enough overlap in the dataset to make it feasible to infer location from cell towers records.

Categories: Uncategorized

## Meeting with Pádraig and Davide

Met initially over lunch then in the office. Pádraig had made the following observations:

I think the “Routing with Location” post is very useful for making concrete how location information might help routing. This is the key principle:
“the more locations it visits in common with the destination node, the more likely it will be to pass the message to the destination.”
To be devil’s advocate, the worry would be that the useful information entailed in location is already compiled into the existing contact information. If that is true then the uplift due to the inclusion of location criteria in the routing will not be significant.

Davide and I had been discussing this earlier in the day, and I explained my idea of measuring the importance of a location in terms of how ‘active’ it was; i.e. how many messages exchanges happen there. In this way, it is distinct from contact knowledge alone.

Davide suggested a similar mechanism, which was to measure the popularity of locations, based on, for example the number of different people that visit there. I misssed slightly the meaning when he said an instantaneous value. We talked about concept of maintaining a global table, which tracks the current popularity of locations, and routing to nodes that visit these locations most often.

We discussed the possibility of using the cell tower information contained within the MIT reality mining dataset, and suggested mechanisms for implying location from this. The first, assuming that the nodes are connected to the same network operator, simply being when two nodes are at the same cell tower T1 and T1, they are co-located, else they are not (e.g. they see T1 and T2 respectively)

Another situation to test for however, is when two nodes can ‘see’ each other, but they do not report the same cell tower (T1 ad T2). I suggested a hybrid location, e.g. T12 which is a location where two people have been seen to be connected, but were not connected to the same tower. Initially, we decided to analyse the data to see what it looks like. Pádraig suggested using a simple matrix – People over cell tower IDs with a count of how many times they occur.

One thing we didn’t factor for however, is whether a node can see multiple cell towers at one…. this will become clearer when I have a proper look at the dataset.

Incidentally, this got me thinking about a generic way to represent locations, as 1d, 2d, 3d etc. The MIT dataset cell tower locations, providing all nodes are using the same co-ordindate scheme (i.e. network operator), can be described as a 1d co-ordindate system. The ‘distance’ between points, whilst not meaningful for physical location, could be based on some metric of common movements between locations.

We also talked about ContactSim, and Pádriag agrees that we should probably use this if it is suitable. He asked me to ask Graham for a copy of his chapter that deals with his PBR protocol (Chapter 7) for him to read. We decided that I should try to replicate Graham’s results, to ensure I had a proper handle on what the simulator does.

Pádraid would also like me and Davide(?) to hold a seminar for few people, in which we at least show the results from Grahams work, and discuss the ideas we have about routing with location, I suggested about a year from now, Pádraig was thinking about 2 weeks from now, so in next week’s meeting we will fix the details for the following week.

For next time, I will have replicated Graham’s results, analysed the MIT dataset for applicability for our needs, will have started planning for the seminar, and will have started to adapt Graham’s simulator to use location information.

Categories: Uncategorized

## Routing with location

I have started to think again about how to route using location, ignoring contact traces for the moment, I considered what the simplest metric for routing might be. The simplest method, as I seem to remember talking with Davide about previously, is simply comparing knowledge about the set of places visited by each node.

For example, in the above scenario, where the numbered nodes are places visited by nodes, and the lines build the network of places visited by each node (a line represents the route between two places, direction of the arrows shows any ordering of places), if node A has a message for node C, and at some time, meets node B, he can consider whether to pass the message to node B, or keep it to himself, by comparing the intersection (?) of the set of places that each node has with that of the destination node C. Based on the assumption that the more locations it visits in common with the destination node, the more likely it will be to pass the message to the destination.

So, in this case where:

$A \cap B = (3,7)$
$A \cap C = (3,7)$
$B \cap C = (3,7,8)$

Node B has more places in common with C than node A does, so node A passes the message to B, because, B is more likely to visit the same location as C, than A is. This simplistic scheme may work for some situations, but does not consider when two nodes do not meet in the same places, so how to bridge the gap. This is something to think about later on, but an initial guess is to use some other scheme to find a node that does know the destination node, then use this simple scheme.

This scheme is probably not sophisticated enough to be efficient in large networks, and in previous thoughts about routing schemes for location (in which I was mostly using the term vector clocks) I considered a number of things that can be used to inform, or be considered in a routing protocol:

• Locations visited
• The times locations are visited
• The length of time spent at a location
• The length of time between visits
• The ordering of visits to locations
• Common popular locations
• Use of sinks/post offices
• Use of data mules (proxy nodes that visit popular locations frequently)
• The number of routes that pass through a place (degree)
• The mobility of a node, e.g. the number of places it knows about
• Prediction of future visits to locations based on historic knowledge
• Sharing of knowledge between nodes

The edges in the graph, could easily be labelled with the times nodes travelled between places, such that we can build a (historic?) vector clock of all traversals in the graph.

I have started to think that using location for routing may be helpful because, knowing WHERE a meeting takes place, might help to consider the importance of that meeting to the routing protocol, whereas when we only know that a meeting has taken place, but not in what context, it is hard to distinguish whether it is an important meeting. However, I haven’t fully formed this idea.

Categories:

## Understanding ContactSim

In the last post, I mentioned Graham’s simulator, ContactSim, which he descibes as follows:

ContactSim is a discrete-time event simulator which is specifically designed to facilitate the playback of contact traces. It is flexible, allows simple addition of new datasets and swapping of existing datasets for the underlying contact patterns, and has a powerful XML configuration language.

I have managed to have a look through it, and with the help of Graham’s appendices notes and emails, I have managed to get it working, and have started to understand how it works. Graham mentioned that he has some ideas about how I might be able to adapt it to also consider location, and automatically run the kind of simulations we have been thinking about.

Grahams Appedix A lists in more detail how the simulator works. He suggests adding a LocationUpdate event, which informs nodes (aka processes in ContactSim) of any changes in their location, which can be used by the routing algorithms.

As a test, I formatted the co-location data that we generated from the SocialSensing study into the CSV format used in the simulator (I need to get a bit more information about the correct way to do this for Graham, but for the time being I copied the formatting of the MIT Reality Mining data format used by Graham), and ran two algorithms over it, Persistance Based Routing  and Prophet.  Whilst I am still trying to work out what the results mean(!) I produced the two plots below:

This plot simply shows the delivery ratio of the PBR prototcol using the MIT dataset with 103 users, vs the Social Sensing Study data with 26 users. It covers different time periods (both of 1  month), but clips some data at the end of the MIT dataset. However, it does show that the datasets are in some way comparable.

I am continuing to investigate the software, and will speak with Graham (who is in Canada until Christmas) about how best to adapt the software for our needs. It would seem wasteful not to adopt this software which does such a similar thing, without first testing its feasability. Another simulator mentioned by Graham is the ONE simulator, which I have not yet looked at.

What I am also planning to do, is to find out whether the MIT Dataset contains some notion of location, even if that is just cell tower data, that way, it would make it easier to compare any future location based DTN algorithm that we come up with, to Graham’s work, and any other work that uses this dataset.

Categories: