Home > Datasets, experiments > Hypertext2009 Evaluation

Hypertext2009 Evaluation

November 14th, 2011 Leave a comment Go to comments

Finally finished running the experiments on the Hypertext2009 dataset. This used all available nodes, but was split into two parts. The dataset is around 3 days long, so I used day 1 to “train” the community finding algorithms, i.e. used the edge list from day 1 to create the communities. Then, the testing phase took place on days 2 and 3. (i.e. the comminities from day 1, were used for routing in days 2 and 3).

In the training phase, I used 4 threshold values for the HGCE, KCLIQUE and LinkClustering (ahn et. al.). InfoMap, Random and Moses do not have thresholding applied. Those values relate to the MEAN, MEDIAN, 20th Percentile and 80th Percentile of the connected time ratio. i.e. ordering all edges in ascending order, pick the edge at the 20th percentile, and 80th percentile as use it’s value as the threshold value (see http://mattstabeler.co.uk/phdblog/datasets/hypertext2009 for plot).

Values used were: MEAN: 0.0035760146,MEDIAN: 0.0007309433,80th PC: g, 20th PC: 0.0003654716

The visualization below show the clustering based on InfoMap, which is non-overlapping. Edges are related to connected time ratio, and are removed where this is lower than the 80th Percentile. Edge thickness indicates connected time, and node size indicates betweeness centrality.

 Without Numbers: InfoMap Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

Without Numbers: InfoMap Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

infomap-hypertext2009-80thpc

Without Numbers: InfoMap Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

This visualisation shows clustering using the KCLIQUE algorithm, this is overallping, so some nodes will have multiple community assignments.

KCLIQUE Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

KCLIQUE Community Assignment of Hypertext2009 dataset, using 80th Percentile thresholding of edges

When we run the routing algoritms BubbleRAP, BubbleH, Prophet (best in class), Unlimited Flood (best possible) and Hold and wait (baseline), we get the following results:

combined_hypertext2009-split-current_split

Best results for each metric, for each community finding algorithm. BubbleRAP vs BubbleH

and for the rest:

ht2009-split-uf-haw-pht-line

Delivery Ratio for Unlimited Flood, Prophet and Hold and Wait on Hypertext 2009 - split

The community structure found by HGCE was very flat, and in fact found no hierarchical structure at any threshold value apart from for the 80th percentile threshold, where there is one sub community.

HGCE Community hierarchy produced by HGCE for BubbleH

HGCE Community hierarchy produced by HGCE for BubbleH

Categories: Datasets, experiments
  1. No comments yet.
  1. No trackbacks yet.