Archive

Archive for the ‘What Matt Says’ Category

Why HGCE does well?

July 27th, 2012 No comments

I just realise that it’s probably doing well because its agglomerative, a bit like how the interaction networks form over time (node by node from an ego centric view) and also it’s robust to perturbations – as characterised by the network itself. This would make over great choice for an algorithm to adapt for a distributed version, where each node learns its network over time and has to account for perturbations in the graph. Might discuss this in conclusions.

Dynamic Dataset Visualisation for DTN Simulations

June 13th, 2012 No comments

I was thinking about a way to visualise the flow of messages around a DTN network during simulations runs.

At the moment, we (PC and I) visualise the contact networks we are working with using the aggregate graph of the period of interest, nodes are sized based on betweenness centrality, edge thickness represents its weight, based on  the total connected time or connected time ratio between nodes. This network is laid out using some force directed layout, usually using the Gephi software’s Force Atlas (1 or 2) layout, or OpenOrd, and a combination of techniques such as non-overalap of nodes, and expansion/contraction. We sometimes remove low weight edges for clarity. Often, we also colour the nodes based on their community assignment.

This works well, and with a bit of manipulation, results in nice enough looking graphs.

I would like to have a nice way to visualise the flow of messages around this network for a given routing algorithm. I imagine a vizualisation much as the above, but with some adaptions. The adaptions allow for other information to be displayed:

  • The temporal connections of the edges – at some points in time, edges do not exist, when nodes are not in contact.
  • The data transmitted across edges – as time goes on, edges may be used to transmit messages
  • The data held by nodes at a given point in time – the buffers of nodes fill and empty over time, as data is transmitted

I would like to see a vizualisation, where the nodes are initially laid out in some appropriate manner e.g. force directed, based on the aggregate graph, so that nodes can be tracked over time, if they happen to move (based on some criteria). The nodes are sized based on Betweenness Centrality (or some other ranking/partition), along with some measure of their data storage, for example, a dotted border, where each dot represents a message. Perhaps a shadowed version records the accumulated count so far of all messages the node has seen.

In this visualisation, edges are only shown when they are active, and fade (with some function) as they become inactive (perhaps proportional to their connection time), perhaps they are still shown using their aggragate graph weight as a thickness, or perhaps this is the accumulated weight (perhaps showing the total aggregate as a shadow). Edges are decorated with information, such as dashes, each dash representing a message transmitted along the path. Each Dash is coloured based in how recently it was transmitted, perhaps in a binary fashion, where recently (tunable) is colours, say red, and non recent messages fade to blue as they reach a time limit. (do we keep edges greyed out, but vizible when they are inactive?) Dashes are located at end of the edge where the sending node is, to indicate the direction of the transmission – perhaps these series of dashes is terminates with an arrow.

When edges become active, they are emphasised in some way, perhaps flashing or briefly pulsing. An alternative orthogonal emphasis is shown when a message is transmitted, so that it is obvious when an edge is activated/de-activated, when a message is transmitted, or when both occur at the same time (i.e. they are easily distinguishable).

In the case of nodes and edges, dashes/dots represent messages, and should be displayed in a uniform size, and where the number of dashes exceeds the length of the edge, or circumference of the circle, they flow outwards from the line, so they appear stacked. In cases where the number of dashes exceeds the limits of the edge more than, say, 3 times, dashes are concatenated into an alternative symbol (e.g. a block) representing a larger number of messages e.g. 10.  Alternatively, a stacked buffer style viz. can be displayed beside the edge/node.

Dynamic interactions:

When a given dash is selected, representing a message, the path it has taken through the network is highlighted, perhaps a side panel of messages, nodes and edges is shown, with numerical statistics with sorting, so that important items are easily identifiable. These can be highlighted on the main viz. when selected (e.g. hovered over).

When an edge, node, or series of edges is drawn across, different modes cause different effects:

  • All Messages, All paths – the paths of any message traversing the edges is highligted
    • Only most recent N messages across edge
    • Only messages within time period
  • All messages transmitted/received by this node
    • All messages originating/terminating at this node
    • Recent/N/Time period of Messages
The layout parameters can be tweaked during vizulisation, perhaps switching between layout based on aggregate graph, accumulative graph and instantaneous graph.
Something enabling start/stop/pause/rewind/fast-forward/fast-reverse/repeat-period should also be implemented. It is also quite important to be able to drill down into the viz. to see interactions in small sections.

Data driving the viz:

The source data for the viz. could come in the form of three pieces of information:

  1. The list of aggregate edges between nodes (inlcuding weightings), and (maybe) a list of nodes – e.g. in some graph format which allows meta data for nodes and edges
  2. The list of contacts between nodes – start time, end time, node id, node id
  3. The list of movements of messages between nodes. message id, Source node, target node, transmission start time, transmission end time (assuming there may be some duration of transmission), and maybe including a list of all messages with source and destination. (for message initialisation a start time might be included).

It would be up to the viz. software to calculate the aggregate graphs, order the contacts, arrange for messages to be transmitted at correct times.

 

 

Categories: Ideas, Thoughts, What Matt Says

Temporal Networks

May 31st, 2012 2 comments

Pádraig and  I discussed Time Respecting Paths  in the datasets we have based on the paper – Temporal Networks[1].

We want to detect the shortest/quickest paths between pairs of nodes, which respect the the temporal paths, so that we can derive a betweeness centrality measure.

The existing implementation we have, based on the BubbleRAP paper, floods the network, and calculates the betweeness centrality using the number of times a node is used in ANY path  – the BubbleRAP implementation uses only the SHORTEST paths. So there is some discrepancy here. The BubbeRAP paper reads as follows:

To calculate the individual centrality value for each node, we take an numerical approach. First we carry out a large number of emulations of unlimited flooding with different uniformly distributed traffic patterns created using the HaggleSim emulator, which can replay the collected mobility traces and emulate different forwarding strategies for every contact event. Then we count the number of times a node acts as a relay for other nodes on all the shortest delay deliveries. Here the shortest delay delivery refers to the case when the same message is delivered to the destination through different paths, and we only count the de- livery with the shortest delay. We call this number the betweenness centrality of this node in this temporal graph. Of course, we can normalise it to the highest value found. Here we use unlimited flooding since it can explore the largest range of delivery alternatives with the shortest delay. This definition captures the spirit of the Freeman centrality[2]

 

The Freeman centrality paper, which I only skimmed, doesn’t appear to refer to temporal graphs directly –  further inspection may yield a better understanding 🙂

There is a question over what observation period we should use, and how to decide which is really the ‘shortest path’  when paths may start/end at different times. In the Bubble RAP paper, they seem to have chosen the path with the least latency, but not considering the start time of the path. In the temporal networks paper [1], the authors talk about a Saw-tooth pattern which governs latency (as regular connections create a latency lag between connections), and mention a method to normalise over this pattern – but not really for the Betweeness Centrality measure.

e.g. If we start at time 0, and measure the time it takes to get to the destination node, that is fine. however, what if there is an intermediate time, 4, where, if we measure the path from there, we find a new shortest path – in Bubble RAP, this is taken as the shortest path, but is this correct?

I will implement the Time Respecting, shortest path (latency) betweeness centrality, (and maybe the shortest path – hops), as per BubbleRAP, and I will also investigate the normalised version as per the Temporal Networks paper.

[1]Holme P, Saramäki J. Temporal Networks. 2011:1-28. Available at: http://arxiv.org/abs/1108.1780. Accessed May 15, 2012..

[2]Freeman L. A set of measures of centrality based on betweenness. Sociometry. 1977. Available at: http://www.jstor.org/stable/10.2307/3033543. Accessed May 31, 2012.

Measuring Hierarchy

February 4th, 2012 No comments

Pádraig and I discussed a mechanism for measuring hierarchy, to give a given structure a metric. Pádraig’s suggestion was to measure the average distance to the root, and I came up with a normalised version that should also take into account the relative size of a node (in this case a community), as follows:

The root is always assumed to be a node with a weight of exactly the total size. e.g. for community hierarchy, the size of the node represents the number of members of the community. The root node in this instance contains all members of of the network.

We measure the distance from the root to each community, and multiply the size of the community by the distance. We divide the sum of these values by the size of the root community. If the results is 1, then the community structure is shallow, if the result it less than 1, then the community structure does not contain all nodes, the further the result is upwards of 1 the more depth the structure has.

$depth_sum = 0;
$levels = array();
foreach($data as $commid=>$commdata){
  $levels[$commdata['level']] +=1;
  $depth_sum += (($commdata['level'] + 1) * count($commdata['members']));
}
$depth_metric = $depth_sum / count($nodes);

It still needs something to balance for breadth, so,

At each level of the dendogram, we take the size of each node, multiply it by the number of nodes at this level, and divide by the size of the root node, then multiply by the depth. The sum of the results for each node divided by the size of the root node gives us the breadth metric.

$breadth_sum = 0;
foreach($data as $commid=>$commdata){
  $breadth_sum += (
                    ($commdata['level']+1) *
                    (
                    ($levels[$commdata['level']] *
                    count($commdata['members']))
                    / count($nodes)
                    )
                    );
    }
$breadth_metric = $breadth_sum / count($nodes);

Average Overlap is simply the sum of the number of communities each node is a member of divided by the number of communities.

 

 

It will be interesting to calculate these metrics for each dataset and each threshold value, we can plot the delivery ratio (and other metrics)  in a scatter plot for breadth and depth, and see if there is a trend.

Cogs

November 30th, 2011 No comments

I have been thinking alot recently about why we might be doing this research into delay tolerant networking, and what possible applications of the research are, and whilst waiting for a simulation to run, I thought I would write up one idea I came up with, which I call – Cogs.

The idea is probably a little too over the top for reality.

‘Cogs’ (as in cognition) are little packets of thought, that encompass the idea of something, either a feeling/mood, a specific piece of information, or a directed message. These ‘cogs’ are shared between people who meet, and are spread around an area by people,  and depending on the ‘cog’, are received by like minded people.

The idea being that knowledge about what is going on around people, is passed around, giving other people the opportunity to join in, get a feel for the mood in the area, and perhaps actively seek out places based on the cogs eminating from them. Some examples:

  • I’m walking though temple bar, and a come across a really good band playing in the open, who I enjoy listening to, and I think others will to. I emit a ‘Cog’ with a feeling (happy), location (place: temple bar, range: 500meters),  interests (music,band,awesome,[genre],[song title]), and likely lifetime (1 hour) attached to it, this general ‘cog’ gets broadcast to those nearby, who may also be enjoying the band, they in turn broadcast this to others, and so on, until the time runs out, or the message travels further than is useful for that message (e.g. 500m). Some users may actively subscribe to certain interests, places, or people (senders), others may just browse the incoming stream of cogs for interesting things.
  • I remember something I have to tell someone, but I’m not in a rush to get it there, so a emit a direct ‘cog’ with a message attached, and make it private with key, and the key of the recipient(s). This cog, rather than be broadcast to everyone, is directed through only those people I meet, that are on the shortest path (in time) to the recipient.
  • I think of a witty joke, decide leave it in the form of some digital graffiti in my favourite pub, I emit a cog, with a fixed location, and a very long time-out, I emit it to those people nearby who are willing to accept such items (fixed and long term), perhaps the bar staff and regular patrons. This cog is then transmitted only to those people nearby, and is not carried any great distance away.
Categories: Ideas, Thoughts, What Matt Says

Update 27 Sep 2011

September 27th, 2011 No comments

In our last meeting I took down these actions:

  • Ressurect NGMAST paper and use Enron + MIT-NOV – argue that MIT is not big enough to make assumptions

I have started another paper in the Clique SVN here: https://erdos.ucd.ie/svn/clique/papers/BubbleH – at the moment its just the same as the NGMAST paper.

  • Write Thesis outline and get started on background chapters – message dissemination, and social networks side (from milgram onwards etc)

I created a thesis document in the repository too: https://erdos.ucd.ie/svn/clique/papers/MattStabelerThesis – I have started to compile some notes into background chapters, based on my transfer report. I have also started a rough outline of chapters. The latest version in PDF is here.

  • Speak to Conrad about finding a non-overlapping Hierarchical CFA

Since the feedback in CLIQUE mini-workshop, I asked Fergal, Conrad and Aaron again about community finding algoriths. They mentioned the Blondel one and the InfoMap one again.

  • Get a decent sub-set of STUDIVZ dataset

To get a sub-set of nodes in the STUDIVZ dataset I started by picking N nodes randomly, and included their immediate neighbours. I do the L more times to get the nodes a depth of L hops away from the source.  Using 10 random nodes, with a depth of 2 yields a network of around 3500 nodes (12% of all nodes).  When reduced to 5 seed nodes, we get ~1000 nodes (~4%). Going the other way, 100 seed nodes, with a depth of 1 gives 14571 nodes covering ~50% of the network. These figures change depending on which nodes are selected at random initially.

Currently, i’m testing the setup with 5 seed nodes and 2 levels of network, with the hope that there will be some overlap.

Conrad suggests that we take them non-randomly – by first reducing our set of nodes to those with high activity (either number of posts, or total length of posts), then using the network L hops from the remaining nodes.

Odd results – an update

July 13th, 2011 No comments

Just before Pádraig went away, we realised that the results we were getting for Bubble and BubbleH were exactly the same in some cases. We also wanted to see whether hierarchy was making any difference to results, and so we set these goals for the period whilst Pádraig was away:

  • work out why bubbleH and bubble are the same in some cases
  • get enron and wall posts dataset
  • pick the most hierarchical looking clustering  in these
  • see whether BubbleH works better than BubbleRap
  • (concentrate on HGCE and KCLIQUE)

The first question: Is Bubble (no global parent) always the same result as BubbleH (global parent)?

  InfoMap KCLIQUE HGCE LinkClustering
MIT-NOV Yes No No No
CAMB Yes Yes Yes No
IC05 Yes Yes No No
IC06 Yes Yes No No
SS Yes Yes No No *
Enron Yes No No No
Studivz ? ? ? ?

* in the case of Social Sensing Study LinkClustering all are same for Delivery Ratio, apart from where the threshold is 0.0. (with Social Sensing we used multiple tuning values, hence multiple results, the others used only one set of parameters)

Answer: Not always.

Is the answer overlap?
I think that these results are down to the structure of the communities. InfoMap ALWAYS produces the same results for BubbleH and BubbleRAP. I wonder if this is this down to the fact that InfoMap partitions the network, and therefore there are no overlaps? Could it be that for the most part, KCLIQUE creates non-overlapping communities and hence the results? HGCE creates a large number of highly overlapping communities, which are also hierarchical. LinkClustering also creates a large number of communities and whilst edges cannot overlap, nodes can belong to multiple communities. Or is it really that the inherent Hierarchy in community structure is causing results to differ?

The question is also, why do BubbleH and BubbleRAP get EXACTLY the same results if there is no overlap? Well, this is because in the end, there is no difference between them when there is no complicated structure to the network. Even if BubbleH is using the global parent community, that is EXACTLY the same as using the Global Ranking in BubbleRAP, so when there is no overlap, each node belongs to EXACTLY one community, and has a local rank in both BubbleH and BubbleRap. The global parent in BubbleH is the same as using the global rank in BubbleRAP. In fact, we could re-write BubbleH to incorporate this explicitly, and do away with a forced global parent, but this is just implementation detail, the end results will be the same.

Second Part: get enron and wall posts dataset

I used Conrad’s version of the Enron Dataset, as he had taken the time to remove irregularities, and in case of future papers, he would have an in depth knowledge of how he processed the data, saving me a lot of time!

The connected time graph is below, showing a decent number of clusters, hopefully with some nice hierarchy!

Connected Time Graph of the Enron dataset.

Connected Time Graph of the Enron dataset.

I explored this dataset in the same way as the previous ones, by experimenting with settings for the different algorithms, InfoMap is easy to visualise, and so below is the InfoMap clustering of the dataset:

Connected Time Graph for Enron Dataset, coloured with InfoMap clustering

Connected Time Graph for Enron Dataset, coloured with InfoMap clustering

I also used Conrads Studivz wall post dataset, (see here), this dataset is huge, and so I haven’t worked out how to run the full set of simulations. I was able to create a connected time edgelist (connected time is based on wall post length, 1000ms per character). Below is the graph of all connections, with node size and colour related to degree, edges removed for clarity.

network_degree_colournetwork_degree

network_degree_colour_section

Studivz connected time graph close-up with edges

Enron Dataset

In order to get any clusters at all out of KCLIQUE and LinkClustering, I had to plug in some very small threshold values, this is probably due to the way in which I created the events (An event occurs when an email is sent, the duration of every event is set to 2 seconds), so for the most part, nodes are not connected, and therefore there are small overall connected times. (Conrad’s dataset did not include the content of the messages, so I was not able to give contact events any length based on message size). To simplify things,  I used the interesting parameters from KCLIQUE and LinkClustering, to inform the parameters for HGCE, specifically the –minCliqueEdgeWeight parameter (more info about HGCE parameters), which excludes edges based on weight, effectively thresholding the graph edges as with KCLIQUE and LinkClustering.

To recap, the threshold means that (in the case of KCLIQUE and LinkClustering and now HGCE) edges are removed where the connected time between individuals is lower than the threshold.

Threhold parameters used for the Enron dataset:

0.0, 0.0000001, 0.0000002, 0.0000003, 0.0000004, 0.0000005, 0.0000006, 0.0000007, 0.0000008, 0.0000009, 0.000001, 0.000002, 0.000003, 0.000004, 0.000005, 0.00001

The plot below shows the results for Delivery Ratio for BubbleRAP using no global parent (NGP), and BubbleH using global parent (GP) (in future, we can safely ignore the global parents part, as BubbleRAP should always be run without a global parent and BubbleH with a global parent).

BubbleRAP vs BubbleH on the Enron Dataset, showing results for multiple parameters (of M). BubbelH beats BubbleRAP in all cases

BubbleRAP vs BubbleH on the Enron Dataset, showing results for multiple parameters (of M). BubbelH beats BubbleRAP in all cases

This is a good result, it shows that in this dataset, for these parameters, BubbleH beats BubbleRAP, but now we need to consider why. I had done an earlier exploration of the enron dataset with default values for M ( 0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.9), and so I looked back the results for that, and was surprised to see that BubbleRAP does much better in some cases. Below is a plot with the new data included (Solid red BubbleRAP line and blue dot dash BubbleH show this data).

BubbleRAP vs BubbleH for Enron Dataset, with extra (default) values for HGCE parameters (for M). Showing stronger results for BubbleRAP

BubbleRAP vs BubbleH for Enron Dataset, with extra (default) values for HGCE parameters (for M). Showing stronger results for BubbleRAP

So, BubbleRAP does better in some situations, (PS. NEED TO CHECK THIS AGAIN TO MAKE SURE BUBBLEH IS PROPERLY PLOTTED).

I started to look at then next step, hoping it will give some answers: pick the most hierarchical looking clustering  in [datasets].

I picked the best results for BubbleH, and mapped the communities to a hierarchical tree structure, shown below:

Community structure for nest run of BubbleH on Enron dataset where M = 0.000001.

Community structure for best run of BubbleH on Enron dataset where M = 0.000001

So far so good it seems to have a broad structure with some specific hierarchical clusters, I also mapped the worst run:

Community structure for best run of BubbleH on Enron dataset where M = 0.0000007

Community structure for best run of BubbleH on Enron dataset where M = 0.0000007

This too has some good structure to it, note however, that this is a plot of the  worst performing run, in a set of best parameters, (we didn’t run a whole range of values – so, this is worst of the best)

The next to show is the best BubbleRAP run, as below:

0.0000005

Community structure for best run of BubbleRAP on Enron dataset where M = 0.0000005

Interestingly, this has broad set of high level communities (as with the previous plots, but they had a global parent), but less broadness lower in the hierarchy (i.e. fewer lower communities).

TODO: plot the worst run of BubbleRAP

UPDATE: I found a better way to show the Dendograms of the Community hierarchy, and have automated the process. This page shows all of the plots for Enron without global parent and this page WITH global parent. To get a better idea of the relationship between hierarchical structure and results, I need to combine the results from runs, with the structure in the same place – so we can compare side by side what happens when there is different types of community.

BubbleH and SMW11

April 19th, 2011 No comments

Our paper for SMW11 was accepted, and now we will be able to include the results from the bug-fixed version of  BubbleH, which performs very well. An addition that Pádraig suggested, was to ignore level information in the hierarchy, and simply use community size. This has the benefit of making the algorithm simpler, but still incorporating hierarchical structure. I ran the two versions side by side and found that they perform almost identically, with some cases ignoring level being slightly better!

MIT-NOV Dataset, with levels included and not included for multiple parameters to H-GCE

MIT-NOV Dataset, with levels included and not included for multiple parameters to H-GCE

This plot is very hard to read, but it is possible to see the similarities at a broad level. The best performing run was with H-GCE parameters of K=3, E=0.15, ST=0.9, MAP=0.9 and Z=2. It’s structure seems to be relatively flat, with a large number of communities:

click to view

  0(90)
  |
   - 1(5)
  |
   - 2(5)
  |
   - 3(8)
  |
   - 4(8)
  |
   - 5(8)
  |
   - 6(4)
  |
   - 7(28)
  |
   - 8(27)
  |
   - 9(22)
  |
   - 10(17)
  |
   - 11(13)
  |
   - 12(20)
  |
   - 13(16)
  |
   - 14(26)
  |  |
  |   - 15(12)
  |
   - 16(7)
  |
   - 17(4)
  |
   - 18(4)
  |
   - 19(3)
  |
   - 20(8)
  |
   - 21(8)
  |
   - 22(3)
  |
   - 23(13)
  |
   - 24(8)
  |
   - 25(27)

This gives us the following graphic for the SMW11 paper.

bubbleh-bubble-prophet-flood-ignorelevelstrue-bestrun

MIT-NOV Dataset, for multiple parameters to H-GCE, compared to Bubble, PROPHET and Flood

The latency is shown below, which seems to follow the same trend as the previous version, but with BubbleH actually beating all but Flood at the end.

Latency in MIT-NOV dataset for Bubble, BubbleH, PROPHET and Flood.

Latency in MIT-NOV dataset for Bubble, BubbleH, PROPHET and Flood.

This is a positive result, but whilst doing some work towards the next paper for NGMAST11, I realised that we should be doing runs for multiple parameters to K-Clique, however for this paper, probably don’t need to worry so much. Also, for reference, the average value for BubbleH is included in the plot below.

bubbleh-bubble-prophet-flood-ignorelevelstrue-bestrun-withavg

We need to consider whether to evaluate multiple parameters to BubbleRAP, and see whether this affects the algorithm. Also, we need to consider whether the hierarchy is really making things better or is it the sheer number of communities, because the best performing run has a quite flat structure.

BubbleRAP using Heirarchical Community Structure

March 3rd, 2011 2 comments

The existing BubbleRAP mechanism works as follows:

on N connected with encountered node P
for each message M (for destination D) in buffer of N
if P == D
    pass message M to P
else
    if N and P share a community with D
        if LocalRank(P) > LocalRank(N)
            pass message M to P (and delete from buffer)
    else
        if (P shares a community with D) OR (GlobalRank(P) > GlobalRank(N))
            pass message M to P (and delete from buffer)
// keep message

The proposed version of Bubble-H (Bubble Heirarchical) works as follows

on N connected with encountered node P
for each message M (for destination D) in buffer of N
if P == D
    pass message
else
    if P is in a CLOSER community, that also contains D
        pass message M to P
    else if P and N share CLOSE community with D
            if(LocalRank(P) > LocalRank(N))
                pass message M to P
            else
                keep message
    // we need to still push messages to the top when there is no overlap
    else if(P is in a BRIDGING community with D, that N is not)
      pass message M to P
    else
      keep message

Bubble-H as above uses the notion of CLOSE(R) communities, and BRIDGING communities:

heirarchicalclustering-1

  • A CLOSER community, is the one that is lower down in the hierarchical structure of communities, for example, when N has the message destined for O, on meeting P, he would pass the message, as P is in a community lower in the hierarchy. Being CLOSER suggests that P is somehow more likely to be connected to O.
  • The shared CLOSE community is one that is low down in the heirarchy scale, that the destination is a member of, but that both nodes also share. They compare local ranks to decide who should have the message. For example. N and M share a CLOSE community with O and P.
  • A BRIDGING community, is at a lowest point that  bridges a gap between branches of the structure, in the example, the community C2 containing CDEA, GF and H, would be considered a bridge. (not the best example?). This is handy for when a node who is not in a low level community with the destination needs to get the message closer.

I think there might be something missing from this new algorithm, and I am not convinced it is any different to the existing BubbleRAP protocol, especially as nodes are not clustered hierarchically.

A note on hierarchical GCE: this algorithm produces communities that are hierarchically organised, however, the nodes within these communities can and will overlap, so that a node in a community in one partition of the tree, may also appear in a community on the other side, which means I have now realised that the illustration above is not a correct representation.

UPDATE (3/3/11): Pádraig’s comments make sense, and so the following is a better algorithm:

on N connected with encountered node P
for each message M (for destination D) in buffer of N
Identify bridging community (BC), smallest community containing N and D.
IF P ==D
   then pass the message
ELSE IF P shares a community with D that is smaller than BC
    then pass the message.
ELSE IF P is more central in BC than N
    then pass the message.

UPDATE (7/3/11): I have managed to get Conrad’s code to run today  and output the correct data, and it seems that the output is very different from what I had imagined; the MIT-Nov-Training dataset produces lots of non-heirarchical communities, and one or two parent-child structures (depending on parameters) – this means that in some/most cases, there will nodes who do not have bridging communities.

I am currently implementing BubbleH in the simulator and need to decide what to do in the case of no bridging community; two choices as I see it are: When a bridging community is not found either

  1. use global rank (as per original bubble rap), or
  2. Keep message until a bridging/smaller community is found.

My favourite is option 2, as this helps to hold the message by default, until a useful encounter comes along (a small world property?).

Vector clocks to drive delay tolerant networking

March 1st, 2011 1 comment

Vector Clocks can be used to build a knowledge of the network surrounding each node, from the local perspective of that node. This means that it would be possible to calculate some metrics about the global network, at each individual node. However, we already know [1] that in human networks, the 6 hour degree approximates the betweeness centrality of the node, which can be used as a useful metric for routing [2], so why should we complicate things further?

Benefits of vector clocks

Maintaining a vector clock [3] for each node encountered can give us extra information about the network. Assuming in our vector clock updates, we also transmit information about the network as seen by each node, we can tell both the structure of the network, and a number of extra things about the network that simply counting degree cannot.

Each node can know how out of date other nodes are, simply by checking it’s records, and seeing when an update was last received about that node, this gives some notion of distance from itself to other nodes (a.k.a. ball of radius, latency).

Indirect paths and essential links can be deduced at a global level by looking at the routes that are used to update vector clocks; where an update is received about an unknown node during an encounter, we can mark the encountered node as a possible carrier for the unknown node. And where one node is always used as a conduit for information updates about certain nodes, we know that it connects some other portion of the network.

The rate that other nodes update us with information (update rate) gives us a notion of how often contact is made with that node, the update amount, tells us how much knowledge about the network that node delivers . A node that has a high update rate is one that is encountered often, and one that has a high update amount may be well connected to other portions of the network.

Derived knowledge

Using these metrics, we can start to make other observations about the network. We now have sufficient information to attempt to predict future events [4][5]; Using the knowledge of update rate and out-of-dateness We may anticipate future encounters, based on the notion of periodicity [1], or perhaps even by simple Markov chains [6].

We can also try to calculate a notion of distance from one node to another using the update amount, out-of-dateness and out knowledge of the network structure. A nodes knowledge of the network also allows it to calculate things like it’s own centrality, degree and community membership, as well as giving it hints as to the same metrics for other nodes.

Vector clocks for location

Extending the idea further, it would be possible to share information about node movements along with network structure. Assuming nodes could detect their locations using some globally known scheme (e.g. simply using a grid-based approach to specifying location based on Latitude/Longitude), the vector clock updates can pass along updates about the last time a node visited a location, and perhaps the duration of the visit. This, combined with updates from other other nodes about their movements can give each node a picture of movements of other nodes. This in turn would allow us to make predictions[4][5] about where nodes will be in the future.

Usage

The combination of these metrics, gives us rich information to provide to a routing scheme, for example, it may be possible to adapt CAR [7] to use these metrics in it’s calculations, or a hybrid approach to BubbleRAP [2], where the global and/or local rank are based on these metrics. We may also want to devise our own scheme for routing based on this specific knowledge, however there are a large number of schemes already proposed, and it would seem  sensible to improve an existing scheme, rather than create yet another one.

Refs

1. Williamson G, Cellai D, Dobson S, Nixon P. Self-management of Routing on Human Proximity Networks Spyropoulos T, Hummel KA, eds. 2009;5918:1-12. Available at: http://www.springerlink.com/index/10.1007/978-3-642-10865-5 [Accessed July 7, 2010].

2. Hui P, Crowcroft J, Yoneki E. BUBBLE Rap: Social-based Forwarding in Delay Tolerant Networks. Networks. 2008:241-250. Available at: http://portal.acm.org/citation.cfm?doid=1374618.1374652.

3. Kossinets G, Kleinberg J, Watts D. The structure of information pathways in a social communication network. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining – KDD ’08. New York, New York, USA: ACM Press; 2008:435. Available at: http://portal.acm.org/citation.cfm?doid=1401890.1401945.

4. Song C, Qu Z, Blumm N, Barabási A-L. Limits of predictability in human mobility. Science (New York, N.Y.). 2010;327(5968):1018-21. Available at: http://www.ncbi.nlm.nih.gov/pubmed/20167789.

5. Ashbrook D, Starner T. Using GPS to learn significant locations and predict movement across multiple users. Personal and Ubiquitous Computing. 2003;7(5):275-286. Available at: http://www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s00779-003-0240-0 [Accessed July 31, 2010].

6. Musolesi M, Piraccini M, Fodor K, Corradi A, A. Supporting Energy-Efficient Uploading Strategies for Continuous Sensing Applications on Mobile Phones. Pervasive. 2010. Available at: http://www.springerlink.com/index/WH71427029706513.pdf [Accessed September 3, 2010].

7. Musolesi M, Mascolo C. CAR: Context-Aware Adaptive Routing for Delay-Tolerant Mobile Networks. IEEE Transactions on Mobile Computing. 2009;8(2):246-260. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4585387.