Archive

Archive for November, 2011

Selected Nodes – with fixed HGCE output

November 30th, 2011 No comments

We identified a problem with the way HGCE was being used, for all other CFAs the community structure used was identical apart from the global parent used by BubbleH, but HGCE outputs a different structure every run. I adapted my code to make sure that BubbleRAP and BubbleH make use of the SAME community structure. In the results below, selected pairs of nodes are chosen based on their contact in the training period. Also, the overall results are reported for the best value for a given metric (i.e. the best run for delivery-ratio is used to show results for cost, latency, etc.). In this case, all use delivery-ratio as the primary metric.

Studivz

Studivz

Social Sensing

Social Sensing

MIT

MIT

Hypertext2009

Hypertext2009

Enron

Enron

Enron still shows a lose for BubbleH for HGCE. Detailed plot compared to previous results:

Delivery Ratio for Enron, BubbleH vs Bubble - old results vs new results - no great change

Delivery Ratio for Enron, BubbleH vs Bubble - old results vs new results - no great change

Still working on coding up an alternative to unlimited flood which allows us to accurately measure delivery ratio for selected pairs.

Cogs

November 30th, 2011 No comments

I have been thinking alot recently about why we might be doing this research into delay tolerant networking, and what possible applications of the research are, and whilst waiting for a simulation to run, I thought I would write up one idea I came up with, which I call – Cogs.

The idea is probably a little too over the top for reality.

‘Cogs’ (as in cognition) are little packets of thought, that encompass the idea of something, either a feeling/mood, a specific piece of information, or a directed message. These ‘cogs’ are shared between people who meet, and are spread around an area by people,  and depending on the ‘cog’, are received by like minded people.

The idea being that knowledge about what is going on around people, is passed around, giving other people the opportunity to join in, get a feel for the mood in the area, and perhaps actively seek out places based on the cogs eminating from them. Some examples:

  • I’m walking though temple bar, and a come across a really good band playing in the open, who I enjoy listening to, and I think others will to. I emit a ‘Cog’ with a feeling (happy), location (place: temple bar, range: 500meters),  interests (music,band,awesome,[genre],[song title]), and likely lifetime (1 hour) attached to it, this general ‘cog’ gets broadcast to those nearby, who may also be enjoying the band, they in turn broadcast this to others, and so on, until the time runs out, or the message travels further than is useful for that message (e.g. 500m). Some users may actively subscribe to certain interests, places, or people (senders), others may just browse the incoming stream of cogs for interesting things.
  • I remember something I have to tell someone, but I’m not in a rush to get it there, so a emit a direct ‘cog’ with a message attached, and make it private with key, and the key of the recipient(s). This cog, rather than be broadcast to everyone, is directed through only those people I meet, that are on the shortest path (in time) to the recipient.
  • I think of a witty joke, decide leave it in the form of some digital graffiti in my favourite pub, I emit a cog, with a fixed location, and a very long time-out, I emit it to those people nearby who are willing to accept such items (fixed and long term), perhaps the bar staff and regular patrons. This cog is then transmitted only to those people nearby, and is not carried any great distance away.
Categories: Ideas, Thoughts, What Matt Says

Selected Pairs evaluation

November 23rd, 2011 No comments

Pádraig suggested that we use only those nodes that have communicated in the training phase, as the message pairs  for the testing phase. To this end, I implemented this as follows:

Given the entire edge list for the training period, take each edge, and create a pair from the source and destination node. Do not create a pair of the pairing exists (either direction). At simulation startup for the test period, send messages between only those pairs, in both directions. So that messages are sent:  A->B and B->A. This reduces the number of messages sent, and it can be argued that only nodes that know each other already, will want to communicate in future. This will give us a much better picture.

Enron-Split-A

Enron Split - A results

Enron Split - A results

Prophet and Hold&Wait

Hypertext2009

Hypertext2009-split results

Hypertext2009-split results

Prophet and Hold&Wait

Studivz-split

Studivz-split results

Studivz-split results

Social Sensing Split

Social Sensing Split results

Social Sensing Split results

Very strange for Undelivered hops, but these charts will change when they use ONLY the best run for a given metric, not the best of any.

Enron Evaluation

November 17th, 2011 No comments

We took a limited period in the Enron dataset, and split it into two parts, a training phase, and a testing phase:

Activity in the Enron dataset, hourly, daily, weekly, showing possible periods that could be used for training/testing

Activity in the Enron dataset, hourly, daily, weekly, showing possible periods that could be used for training/testing

In this case, the first period  shown in the above plot was used.

For HGCE the comminity structure is shown below:

Community Structure created by HGCE on Enron-split-a

Community Structure created by HGCE on Enron-split-a

Results of Routing are not good for HGCE:

Results for Enron-split-a dataset

Results for Enron-split-a dataset

Variation in parameters (threshold) for delivery ratio, on HGCE are shown below:

Benchmark results are shown below:

In which Unlimited flood makes nearly 50% delivery ratio.

Categories: Datasets, experiments

Studivz Evaluation

November 14th, 2011 No comments

We explored a small portion of the Studivz dataset; around 4 months between Oct 2007 and Feb 2008 (illustrated below).

Studivz Weekly and Daily Activity, showing limited periods used, and message peaks (new year)

Studivz Weekly and Daily Activity, showing limited periods used, and message peaks (new year)

Because the dataset is so large, it was not practical to use all of the 28,000 nodes for testing, so, we took a small number of nodes (10) and travelled a number of hops (2) into their connected peers in a connected time graph, and used only this set of nodes to run out simulation. In this instance, we used 385 nodes. We again used MEAN,MEDIAN,80th and 20th percentiles in connected time ratios (described here) as thresholds on edge weights for Community finding using HGCE,KCLIQUE and LinkClustering. (Moses, Random and InfoMap are not thresholded).

Values used:

  • Mean: 0.0001171252,
  • Median: 5.66346e-05,
  • 80th Percentile: 0.0001579604,
  • 20th Percentile: 1.1558e-06

For InfoMap, this resulted in the graph below:

infomap-clustering-studivz

InfoMap Clustering of the reduced node set in Studivz. All edges shown.

KCLIQUE performed quite poorly on this for 80Pc, as there are some clear structures that it did not find.

KCLIQUE 80PC Threshold on Studivz Limited period/nodes

KCLIQUE 80PC Threshold on Studivz Limited period/nodes

It performed slightly better on 20th percentile thresholding, but again a number of clear structures missed.

KCLIQUE 20PC Threshold on Studivz Limited period/nodes

KCLIQUE 20PC Threshold on Studivz Limited period/nodes

LinkClustering for 80pc is shown below:

LinkClustering studivz dataset - 80pc

LinkClustering studivz dataset - 80pc

HGCE finds some structure to the network, and successfully finds hierarchy:

HGCE Community Strucutre for Studivz. Each node represents a community.

HGCE Community Strucutre for Studivz. Each node represents a community.

Result of Routing:

Studivz Dataset results of routing

Studivz Dataset results of routing

For Prophet UF and H&W the results are below:

studivz-split-uf-haw-pht-bar

Studivz Delivery Ratio for unlimitedflood, hold and wait, and Prophet

Studivz Delivery Ratio for unlimitedflood, hold and wait, and Prophet

It seems that even unlimited flood can only pentrate to 20% delivery ratio

Categories: Datasets, experiments

Hypertext2009 Evaluation

November 14th, 2011 No comments

Finally finished running the experiments on the Hypertext2009 dataset. This used all available nodes, but was split into two parts. The dataset is around 3 days long, so I used day 1 to “train” the community finding algorithms, i.e. used the edge list from day 1 to create the communities. Then, the testing phase took place on days 2 and 3. (i.e. the comminities from day 1, were used for routing in days 2 and 3).

In the training phase, I used 4 threshold values for the HGCE, KCLIQUE and LinkClustering (ahn et. al.). InfoMap, Random and Moses do not have thresholding applied. Those values relate to the MEAN, MEDIAN, 20th Percentile and 80th Percentile of the connected time ratio. i.e. ordering all edges in ascending order, pick the edge at the 20th percentile, and 80th percentile as use it’s value as the threshold value (see http://mattstabeler.co.uk/phdblog/datasets/hypertext2009 for plot).

Values used were: MEAN: 0.0035760146,MEDIAN: 0.0007309433,80th PC: g, 20th PC: 0.0003654716

The visualization below show the clustering based on InfoMap, which is non-overlapping. Edges are related to connected time ratio, and are removed where this is lower than the 80th Percentile. Edge thickness indicates connected time, and node size indicates betweeness centrality.

 Without Numbers: InfoMap Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

Without Numbers: InfoMap Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

infomap-hypertext2009-80thpc

Without Numbers: InfoMap Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

This visualisation shows clustering using the KCLIQUE algorithm, this is overallping, so some nodes will have multiple community assignments.

KCLIQUE Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

KCLIQUE Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

When we run the routing algoritms BubbleRAP, BubbleH, Prophet (best in class), Unlimited Flood (best possible) and Hold and wait (baseline), we get the following results:

combined_hypertext2009-split-current_split

Best results for each metric, for each community finding algorithm. BubbleRAP vs BubbleH

and for the rest:

ht2009-split-uf-haw-pht-line

Delivery Ratio for Unlimited Flood, Prophet and Hold and Wait on Hypertext 2009 - split

The community structure found by HGCE was very flat, and in fact found no hierarchical structure at any threshold value apart from for the 80th percentile threshold, where there is one sub community.

HGCE Community hierarchy produced by HGCE for BubbleH

HGCE Community hierarchy produced by HGCE for BubbleH

Categories: Datasets, experiments

Calculating Edge List Thresholds

November 3rd, 2011 No comments

I had an idea to use some standard way to pick the basic threshold parameters for datasets, and it struck me that all we are doing is finding the elbow in the distribution of connected times for each edge.

So, after a little bit of experimentation, I found that the 80/20 rule (Pareto principle) works well.

$pareto = $values[floor((count($values) / 100) * 80)];

The code above, takes an array of all edge weights (connected time), sorted lowest to highest, and picks out the value at the 80th percentile. i.e out of 200 values, it would pick the value at position 160 as the threshold value.
I compared the results for the MEAN average, and the MEDIAN average, and found that the 80th Percentile worked well for Studivz and Hypertext2009.
So I propose that we use this mechanism to pick the threshold values for all datasets.