Pádraig had the idea to try to eliminate Delivery Ratio from our analysis, this way we could then concentrate specifically on Cost and Latency. He suggested that we take the normal dataset and extend it to three times the length, repeating the events in the correct time ordering. Then we could run the simulations 10 times each with a different start time in the original portion of the dataset, this would give each message a much better chance of being delivered, over three iterations of the dataset.

I implemented this by taking a slice of the MIT dataset, processing the events contained within it, and duplicating them, adding the length of the slice to the timestamp times the iteration (i.e. MIT-NOV, MIT-NOV+1 length, MIT-NOV+2 lengths). Any contact events that were active at the start or end time of the slice, were truncated to start/end at the start/end time. I verified the resulting dataset by counting the number of connection every day, and ensuring that there was a distinct repeating pattern. For each dataset, I split the duration between the start and end time of the original dataset time period into 10 equal chunks, and calculated the corresponding start time for each. For MIT and Cambridge this equates to roughly every three days, whereas in the InfoCom datasets there is a much smaller gap between start times (~14-20 hours).
start times
mit-nov 2004-11-10_00:00:00 - 2004-12-11_00:00:00
2004-11-10_00:00:00
2004-11-13_02:24:00
2004-11-16_04:48:00
2004-11-19_07:12:00
2004-11-22_09:36:00
2004-11-25_12:00:00
2004-11-28_14:24:00
2004-12-01_16:48:00
2004-12-04_19:12:00
2004-12-07_21:36:00
infocom-2005 2005-03-06_00:00:00 - 2005-03-12_00:00:00
2005-03-06_00:00:00
2005-03-06_14:24:00
2005-03-07_04:48:00
2005-03-07_19:12:00
2005-03-08_09:36:00
2005-03-09_00:00:00
2005-03-09_14:24:00
2005-03-10_04:48:00
2005-03-10_19:12:00
2005-03-11_09:36:00
infocom-2006 2006-04-24_00:00:00 - 2006-05-02_00:00:00
2006-04-24_00:00:00
2006-04-24_19:12:00
2006-04-25_14:24:00
2006-04-26_09:36:00
2006-04-27_04:48:00
2006-04-28_00:00:00
2006-04-28_19:12:00
2006-04-29_14:24:00
2006-04-30_09:36:00
2006-05-01_04:48:00
cambridge 2005-10-28_00:00:00 - 2005-12-23_00:00:00
2005-10-28_00:00:00
2005-11-02_13:30:00
2005-11-08_04:00:00
2005-11-13_18:30:00
2005-11-19_09:00:00
2005-11-24_23:30:00
2005-11-30_14:00:00
2005-12-06_04:30:00
2005-12-11_19:00:00
2005-12-17_09:30:00
Running Simulations
We decided to remove the overhead of running the CFAs with multiple parameters, by choosing a set of parameters for each, that had performed well in previous tests. The list below shows the ones we chose:
CFA Parameters
MIT-NOV
HGCE
Bubble
K A M E P S R ST MAP Z
3 1.3 0 0.25 0.2 4 100 0.7 0.5 0.1
BubbleH
K A M E P S R ST MAP Z
4 1.3 0 0.25 0.3 4 100 0.9 0.9 0.1
# Communities AverageMembers
# 21 15
2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 27 29 30 31 32 34 35 36 37 38 39 41 42 43 45 46 47 48 50 51 53 54 55 57 58 59 60 61 62 63 65 66 67 68 69 70 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 94 95 97 98 99 100 101 102 103
6 8 14 16 18 30 41 46 53 59 61 63 78 79 86 88 91 95 101 102
6 13 17 29 30 47 86 88 95 101
10 25 62 65 74 83 102
3 25 30 62 74
4 23 30 45 57 91
9 30 42 101
19 27 35 37 39 81 89 102
2 41 51 87 88 95 102 103
15 41 43 51 67 69 87 88 95 100 103
19 27 41 66 85 98
15 43 46 58 61 67 69 78 79 83 85 90 95 97 100 103
2 7 11 15 57 61 67 69 78 79 85 87 88 90 95 97 100 101 103
6 7 8 10 11 14 15 16 17 21 30 31 34 35 38 41 43 46 55 58 61 69 75 78 79 80 83 85 86 88 90 91 94 95 97 101 102
8 10 15 16 17 30 38 41 46 53 63 73 80 83 85 95 97 99 101 102
10 14 18 25 35 46 65 83 86 88 90 102
10 17 25 35 53 72 85 88
10 25 53 72 74 78
27 30 50 95 97 99
4 12 19 20 23 24 27 30 32 36 37 39 41 45 48 50 54 59 60 68 70 76 77 81 82 84 89 98
19 27 37 39 41 68 81 98
KCLIQUE
BubbleH
K MCT
3 0.01
# k = 3
# minimum_connected_ratio = 0.01
# number_of_communities = 4
# community_sizes = [5, 3, 45, 24]
# average_size = 19.25
103 2 67 100 87
99 27 69
6 7 8 10 11 13 14 15 16 17 18 21 29 30 31 34 35 38 41 43 44 46 53 55 58 59 61 63 67 73 75 78 79 80 83 85 86 88 90 91 95 97 100 101 102
68 70 76 12 77 81 82 19 84 23 24 89 32 98 36 37 39 45 48 50 54 20 59 4
Bubble
K MCT
3 0.01
LinkClustering
BubbleH
ET
0
# D_max: 0.591406681046
# S_max: 0.596774193548
# Communities: 79
# Average size : 8
# Nodes : 95 (seen 635 times)
3 72
51 73
58 30 77 50 102 88 89 97 53 67 82 69 81 86 87 84 85 24 20 21 48 23 46 45 43 41 4 7 6 8 68 11 83 13 99 76 75 38 70 91 90 100 101 95 29 79 78 15 10 39 12 59 14 17 98 19 54 31 16 37 36 35 55 63 32 61 103
25 72
62 45
74 65
72 35
77 19 74 20
19 9 77
25 84
103 67 47
3 30
25 65
44 73
74 73
9 2
83 61 88 67 102 90 100 81 95 87 79 85 11 10 101 21 17 16 86 41 2 7 103
25 74
27 74
91 10 17 16 30 88 41 61 62 102 83 101 86 79
61 88 67 91 83 100 101 86 87 97 85 69 95 57 30 41 7 103 78 31 90
3 74
19 98 66 24
76 64
4 45 50 72
102 63 30 33 101
62 65
24 20 48 19 32 37 77 98 89 73 54 59
10 21 61 88 63 90 91 83 101 102 94 97 85 95 25 86 58 17 16 55 18 30 43 53 41 46 69 7 8 103 78
1 83 30
9 72
98 36 77 76 89 70 68 20 84 24 39 12 59 48 23 19 54 82 45 37 50 4 32
30 42 101
18 84
42 72
90 28 103
44 27
25 3
25 35
25 62
102 88 91 90 80 101 95 87 79 85 11 10 81 15 21 14 16 46 86 31 30 61 43 35 41 97 7 6 9 8 18
60 81 30
9 52
52 101 15 16 30 41
64 73
39 27 19 37 50 36 98 89 4 68
99 63 73 102 83 101 95 87 97 85 11 10 38 15 14 17 16 46 31 30 53 41 94 6 8
73 72
83 29 88 65 90 102 103 80 86 79 78 10 38 15 21 55 18 31 43 35 46 69 7
61 63 66 67 102 83 100 101 95 85 10 14 16 46 41 55 69 7 8 78 90
11 13 58 17 16 47 31 30 29 88 95 83 101 86
23 34
93 40
9 57
21 16 30 40 41 61 88 6 102 101 86 14
27 66
67 43 61 88 63 53 90 69 80 81 86 87 85 21 46 95 29 41 79 7 6 8 83 99 75 91 103 100 101 102 94 97 78 11 10 13 38 15 58 17 16 55 18 31 30 35 34 14
9 42
11 13 14 17 30 35 41 61 88 6 93 101 86 87 79
73 66
74 72
3 53
87 15 95 51 43 41 88 67 102 8 103 69 100
7 21 43 99 75 64 95 90 101 86 79 78 10 58 55 18 31 30 29 88 6 8
3 62
10 88 72 17 69 85 87 53 78
62 74
62 28
9 74
77 42
60 98 20 54 77 50
2 51
12 2
91 10 101 14 31 30 53 34 99 61 74 102 83 41 95 78
57 45 4 23
9 25
14 61 44 30 53 41 102 88 7 6 91 83 101 95 79
11 83 99 61 88 63 90 102 103 100 81 69 94 97 85 91 10 27 15 21 14 16 55 95 31 30 35 41 80 79 101 6 8
Bubble
ET
0.003
InfoMap
# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/mit-nov/edge_list.dat.tree}
# Generated on {Fri, 27 May 2011 11:03:51 +0100}
# number_of_communities = 11
# avg size = 9
# nodes = 95
19 20 24 32 12 54 98 76 89 50 77 36 70 48 68 4 84 45 37 82 59 23 39 81 60
30 102 86 14 6 91 61 41 16 101 88 79 18 53 35 63 44 9 40 93 52 72 33 42
15 97 85 100 80 83 38 73 1
90 95 29 46 58 8 7 75 43 66 64 94
67 103 87 2 51 57 47
21 55 78 34
11 17 13
31 10
69 27 99
25 65 74 3
62 28
InfoCom2005
HGCE
Bubble
K A M E P S R ST MAP Z
3 1 0 0.25 0.2 1 100 0.5 0.5 0.1
BubbleH
K A M E P S R ST MAP Z
3 1 0 0.25 0.2 1 100 0.6 0.9 0.1
# Communities AverageMembers
# 2 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41
KCLIQUE
BubbleH
K MCT
4 0.009
# k = 4
# minimum_connected_ratio = 0.009
# number_of_communities = 1
# community_sizes = [37]
# average_size = 37.0
1 2 3 4 5 6 7 8 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 37 38 39 40 41
Bubble
K MCT
2 0.11
LinkClustering
bubbleH
ET
0.008
# D_max: 0.398456726708
# S_max: 0.368421052632
# Communities: 15
# Average size : 7
# Nodes : 40 (seen 99 times)
9 30
21 41
12 23
11 25 26 35 15 41 13 14 24 10 39 27 20 21 22 16 33 32 37 29 40 34 1 19 3 5 4 7 6 8 17 18
38 36
10 39 15 22 33 37 36 34 1 13
24 27 39 2 14 35 34
24 25 13 15 9 5 37 29 34
18 12
2 23
11 39 27 32 30 37 40 35 5 4 28
26 22 23 33 8 6 18 34
9 12
39 38 40 22 19 32 35 4
bubble
ET
0.04
InfoMap
# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/infocom-2005/edge_list.dat.tree}
# Generated on {Fri, 27 May 2011 11:03:59 +0100}
# number_of_communities = 3
# avg size = 14
# nodes = 41
40 32 39 1 27 22 15 35 29 19 24 17 4 25 8 13 30 10 9 26 14 36 5 34 11 6 16 38 21 3 23 28 41 2 12 31
33 20 37
7 18
InfoCom2006
HGCE
Bubble
K A M E P S R ST MAP Z
3 1 0 0.25 0.2 1 100 0.7 0.5 0.2
# Communities AverageMembers
# 2 49
22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 95 96 98
22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 95 96 98
BubbleH
K A M E P S R ST MAP Z
3 1 0 0.25 0.3 4 100 0.6 0.5 0.2
KCLIQUE
BubbleH
K MCT
3 0.02
# k = 3
# minimum_connected_ratio = 0.02
# number_of_communities = 1
# community_sizes = [58]
# average_size = 58.0
22 25 26 27 28 31 32 34 35 36 37 38 39 40 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 60 61 63 64 66 67 70 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 93 95 98
Bubble
K MCT
3 0.02
LinkClustering
BubbleH
ET
0.01
# D_max: 0.352182263797
# S_max: 0.407407407407
# Communities: 95
# Average size : 4
# Nodes : 78 (seen 387 times)
86 35
38 40 49 36 35
44 56
83 48
24 62 27
68 69 21
22 67
36 48
30 59
46 44
56 71
91 39
92 48
58 94
83 40
56 94 40
59 43
80 65
56 40 48
24 60 62 51 75 52 77 76 88 89
46 39 86
47 56 40
77 58 44 45 51 43 41 60 75 89 73 71 70 64 80 92 95 93 74
25 38 58 49 32 57 56 88 40 43 60 75 89 53 52 86
68 48
46 26 39
26 80
39 74 89 46 54 88 53 52
26 56
74 97
56 29
54 48
26 97
46 85
60 25 75 22 57 51 78
24 62
54 41
76 58 82 32 44 30 51 52
38 22 49 56 40
47 22
87 65 72
26 58 53 43
31 65
83 73 23
24 80
88 57 70 28 50 77 75 89 66 67 82 31 93 81 44 84 85
42 48
24 62 48
69 21
47 45 28 43 60 88 89 73 79 64 93 81 86 84 78
54 22
73 29
56 73 40 49
80 59
45 28
24 62 36
68 65
22 48
65 96
42 29
86 85
25 44
71 41
51 21
63 38 59
47 92
25 67
31 35 49
29 43
89 58 43 37 36 53 34
39 30
22 85
97 85
92 71
74 64 73 68 51 45 86 43 52
76 66 91 90 31 96 93 79 78
59 52
77 58 32 33 86 45 51 43
25 65
83 23
46 39 59
30 48
33 48
46 39
48 51 53 76 89 64 66 82 73 80 84 78
27 49 55 56 35 61 63 40 38
25 26 74 54 57 75 52
26 22
68 47
69 65
61 27 51 43
28 60 88 89 64 66 67 82 80 81 86 87 84 25 44 45 42 43 34 77 76 75 74 73 72 70 91 90 93 95 79 78 51 58 98 32 57 37 50 53 52 54 31
27 48
37 59 34
Bubble
ET
0.09
InfoMap
# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/infocom-2006/edge_list.dat.tree}
# Generated on {Fri, 27 May 2011 11:04:18 +0100}
# number_of_communities = 5
# avg size = 16
# nodes = 78
78 93 76 66 60 74 64 82 77 81 45 72 84 88 28 43 89 90 44 53 51 67 79 87 37 70 75 86 91 56 34 52 73 48 58 40 95 41 32 80 71 50 31 54 26 57 85 47 92 98 36 24 30 33 59 42 62 96 65 23 29 83 94 97 21
38 35 49 63 55 27 61
25 22
39 46
68 69
Cambridge
HGCE
BubbleH
K A M E P S R ST MAP Z
5 1.6 0 0.25 0.2 4 100 0.6 0.5 0.1
# Communities AverageMembers
# 4 23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1 2 3 4 5 6 7 9 10 11 13 14 15 17 18 19 20 21 23 25 28 29 30 32 33 34
2 3 5 6 7 8 9 10 11 12 13 16 18 20 21 22 25 27 28 29 31 32 33 34 35 36
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 23 24 26 29 30 31 33 35
Bubble
K A M E P S R ST MAP Z
5 1 0 0.25 0.3 4 100 0.9 0.9 0.2
KCLIQUE
BubbleH
K MCT
2 0.02
# k = 2
# minimum_connected_ratio = 0.02
# number_of_communities = 3
# community_sizes = [2, 2, 11]
# average_size = 5.0
1 4
24 26
32 34 3 36 7 16 21 22 25 27 28
Bubble
K MCT
2 0.04
LinkClustering - bubbleH
ET
0.004
# D_max: 0.75851421306
# S_max: 0.5
# Communities: 12
# Average size : 5
# Nodes : 34 (seen 55 times)
11 25 27 21 22 16 32 28 36 34 3 12 7
9 5 19 15 33
2 34
26 35
3 6
13 15 6 19 20
9 13
10 20 14 17 23 19 18 30 1 33 2 5 4 6 15
2 35
24 26
11 21 22 29 7
Bubble
ET
0.003
InfoMap
# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/cambridge/edge_list.dat.tree}
# Generated on {Fri, 27 May 2011 11:04:24 +0100}
# number_of_communities = 3
# avg size = 12
# nodes = 36
34 21 36 22 25 32 28 3 7 16 27 12 11 29 8
18 33 15 14 5 20 19 4 6 2 23 1 17 30 10 13 9 31
26 35 24
I originally made the decision to use seperate parameters for Bubble and BubbleH, but in hindsight this was not a good idea, as they cannot easily be compared.
When we discussed the results, we realised the KCLIQUE seems to perform well for latency with BubbleH, despite not having much hierarchical structure. Pádraig suggested I investigate this further, to find out what is happening. In the mean time I have re-run the simulations, using the best parameters for BubbleH – to level the playing field in terms of parameters. As is obvious from the above community structures, only HGCE produces communities with one all encompassing community, and in the case of KCLIQUE for both InfoCom datasets, it produces only one community – this seems odd, as we would expect the best performance from a more complicated structure.
This resulted in the following community structure.
Community Structures
MIT-NOV
HGCE
comm_count avg_member_count min_nodes max_nodes
21 15.80952381 4 87
KCLIQUE
comm_count avg_member_count min_nodes max_nodes
4 20.25 4 46
InfoMap
comm_count avg_member_count min_nodes max_nodes
11 8.636363636 2 25
LinkClustering
comm_count avg_member_count min_nodes max_nodes
78 8.141025641 2 67
INFOCOM-2005
HGCE
comm_count avg_member_count min_nodes max_nodes
2 40 40 40
KCLIQUE
comm_count avg_member_count min_nodes max_nodes
1 38 38 38
InfoMap
comm_count avg_member_count min_nodes max_nodes
3 13.66666667 2 36
LinkClustering
comm_count avg_member_count min_nodes max_nodes
14 7.071428571 2 32
INFOCOM-2006
HGCE
comm_count avg_member_count min_nodes max_nodes
2 73 73 73
KCLIQUE
comm_count avg_member_count min_nodes max_nodes
1 59 59 59
InfoMap
comm_count avg_member_count min_nodes max_nodes
5 15.6 2 65
LinkClustering
comm_count avg_member_count min_nodes max_nodes
94 4.117021277 2 43
CAMBRIDGE
HGCE
comm_count avg_member_count min_nodes max_nodes
4 28.75 23 36
KCLIQUE
comm_count avg_member_count min_nodes max_nodes
3 6 3 12
InfoMap
comm_count avg_member_count min_nodes max_nodes
3 12 3 18
LinkClustering
comm_count avg_member_count min_nodes max_nodes
11 5 2 15

Delivery Ratio, Cost, Latency and Delivered Hops for each CFA and for each Dataset, for BubbleRAP and BubbleH. (OLD PLOT - SEE BELOW FOR NEW PLOT)
In the results above, there seems to be something odd happening with LinkClustering for BUBBLERap, for Cost and Hops in MIT-NOV and InfoCom2005 it has much higher values than for the other CFAs. Also, there appears to be something odd happening with Latency in the Cambridge dataset, with a maximum of ~1000 hours latency which is about 40 days. The MIT dataset is roughly the same length, but doesn’t get anywhere near the same latency.
The two plots below show Delivery Ratio and Latency for BubbleH, HGCE for the Cambridge dataset.

Delivery Ratio for HGCE-BubbleH, on the Cambridge dataset

Latency for HGCE-BubbleH, on the Cambridge dataset
These seem to show that at the points where the new parts of the dataset are joined, there is a huge jump in deliveries of messages. It also seems that the datasets are truncated at the end. I will double check the start/end times, but this could be due to the original dataset having no events after a certain timestamp, and therefore, in the last section, the simulation stops when there are no events left to process. The huge spike in messages delivered explains the very high and very low extremes – as messages that were sent in the last iteration of message sending, get delivered very quickly. It seems odd however, that the same is not true for the first iteration. It could be, that because the last set have started to be distributed, maybe only one hop away, these nodes are already in a much better position to deliver the nodes at the start of the next section. This could be due to higher activity in the first hours of the dataset collection, perhaps even due to all the nodes being switched on in the same place at the same time. If this is the case, then we need to start the dataset slice slightly later, to avoid this. The activity traces for each dataset is shown below:

MIT-Nov Daily Connections

Cambridge Daily Connections

Infocom2005 Hourly Connections

Infocom2006 hourly connections
From these we can see that MIT has a good distribution of connectivy across the dataset, but the other three have only short spikes of activity. This in itself does not make a difference to the comparison of resulst for each datasets, as they are all relative, but it does go some way to explain the very high latency figures for Cambridge.
The issues with KCLIQUE
I looked back at the data for KCLIQUE in the NGMAST simulations, where there were multiple parameters to KCLIQUE, resulting in varying degrees of community structure, for all datasets, the delivery ratio in KCLIQUE seems to increase with member count, whereas community count peaks before delivery ratio really gets going.

Delivery Ratio, Cost, Community Count and Average Member count.
This makes sense, because for BubbleH, the more we know about members of a the communities, the more messages we can direct using local ranking. And, because we haven’t included an overall community for KCLIQUE (or InfoMap*, or LinkClustering) we have no way to deliver messages to nodes not in a community, until a node holding a message comes into contact with the destination node. I’m not sure if we designed it this way or whether this is just a bug.
The next step is to fix this issue, by either
- adjusting the BubbleH algorithm to use global rank to pass messages between nodes at the highest level
- Force a global community in the other community finding algorithms (this will have the same effect as 1)
* InfoMap includes every node in the communities list, but does not include a global parent community
Update:
I have now run all of the datasets with a forced global parent community, this seems to have evened the playing field somewhat:

Results where every CFA is forced to produce a global parent community including all nodes. (NOTE the change in axis labels from previous run)
The interesting thing is that there is very little difference in delivery ratio and cost using BUBBLERap.