Archive

Archive for May, 2011

Extending Datasets

May 31st, 2011 No comments

Pádraig had the idea to try to eliminate Delivery Ratio from our analysis, this way we could then concentrate specifically on Cost and Latency. He suggested that we take the normal dataset and extend it to three times the length, repeating the events in the correct time ordering. Then we could run the simulations 10 times each with a different start time in the original  portion of the dataset, this would give each message a much better chance of being delivered, over three iterations of the dataset.

extendeddatasets

I implemented this by taking a slice of the MIT dataset, processing the events contained within it, and duplicating them, adding the length of the slice to the timestamp times the iteration (i.e. MIT-NOV, MIT-NOV+1 length, MIT-NOV+2 lengths). Any contact events that were active at the start or end time of the slice, were truncated to start/end at the start/end time. I verified the resulting dataset by counting the number of connection every day, and ensuring that there was a distinct repeating pattern. For each dataset, I split the duration between the start and end time of the original dataset time period into 10 equal chunks, and calculated the corresponding start time for each. For MIT and Cambridge this equates to roughly every three days, whereas in the InfoCom datasets there is  a much smaller gap between start times (~14-20 hours).

start times

mit-nov 2004-11-10_00:00:00 - 2004-12-11_00:00:00
2004-11-10_00:00:00
2004-11-13_02:24:00
2004-11-16_04:48:00
2004-11-19_07:12:00
2004-11-22_09:36:00
2004-11-25_12:00:00
2004-11-28_14:24:00
2004-12-01_16:48:00
2004-12-04_19:12:00
2004-12-07_21:36:00
 
infocom-2005 2005-03-06_00:00:00 - 2005-03-12_00:00:00
2005-03-06_00:00:00
2005-03-06_14:24:00
2005-03-07_04:48:00
2005-03-07_19:12:00
2005-03-08_09:36:00
2005-03-09_00:00:00
2005-03-09_14:24:00
2005-03-10_04:48:00
2005-03-10_19:12:00
2005-03-11_09:36:00
 
infocom-2006 2006-04-24_00:00:00 - 2006-05-02_00:00:00
2006-04-24_00:00:00
2006-04-24_19:12:00
2006-04-25_14:24:00
2006-04-26_09:36:00
2006-04-27_04:48:00
2006-04-28_00:00:00
2006-04-28_19:12:00
2006-04-29_14:24:00
2006-04-30_09:36:00
2006-05-01_04:48:00
 
cambridge 2005-10-28_00:00:00 - 2005-12-23_00:00:00
2005-10-28_00:00:00
2005-11-02_13:30:00
2005-11-08_04:00:00
2005-11-13_18:30:00
2005-11-19_09:00:00
2005-11-24_23:30:00
2005-11-30_14:00:00
2005-12-06_04:30:00
2005-12-11_19:00:00
2005-12-17_09:30:00

Running Simulations

We decided to remove the overhead of running the CFAs with multiple parameters, by choosing a set of parameters for each, that had performed well in previous tests. The list below shows the ones we chose:

CFA Parameters

MIT-NOV
	HGCE
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1.3	0	0.25	0.2	4	100	0.7	0.5	0.1
 
	  BubbleH
	  K	A	M	E	P	S	R	ST	MAP	Z
	  4	1.3	0	0.25	0.3	4	100	0.9	0.9	0.1
			# Communities AverageMembers
			# 21 15
			2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 27 29 30 31 32 34 35 36 37 38 39 41 42 43 45 46 47 48 50 51 53 54 55 57 58 59 60 61 62 63 65 66 67 68 69 70 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 94 95 97 98 99 100 101 102 103
			6 8 14 16 18 30 41 46 53 59 61 63 78 79 86 88 91 95 101 102
			6 13 17 29 30 47 86 88 95 101
			10 25 62 65 74 83 102
			3 25 30 62 74
			4 23 30 45 57 91
			9 30 42 101
			19 27 35 37 39 81 89 102
			2 41 51 87 88 95 102 103
			15 41 43 51 67 69 87 88 95 100 103
			19 27 41 66 85 98
			15 43 46 58 61 67 69 78 79 83 85 90 95 97 100 103
			2 7 11 15 57 61 67 69 78 79 85 87 88 90 95 97 100 101 103
			6 7 8 10 11 14 15 16 17 21 30 31 34 35 38 41 43 46 55 58 61 69 75 78 79 80 83 85 86 88 90 91 94 95 97 101 102
			8 10 15 16 17 30 38 41 46 53 63 73 80 83 85 95 97 99 101 102
			10 14 18 25 35 46 65 83 86 88 90 102
			10 17 25 35 53 72 85 88
			10 25 53 72 74 78
			27 30 50 95 97 99
			4 12 19 20 23 24 27 30 32 36 37 39 41 45 48 50 54 59 60 68 70 76 77 81 82 84 89 98
			19 27 37 39 41 68 81 98
 
  KCLIQUE
	  BubbleH
	  K	MCT
	  3	0.01
			# k = 3
			# minimum_connected_ratio = 0.01
			# number_of_communities = 4
			# community_sizes = [5, 3, 45, 24]
			# average_size = 19.25
			103 2 67 100 87
			99 27 69
			6 7 8 10 11 13 14 15 16 17 18 21 29 30 31 34 35 38 41 43 44 46 53 55 58 59 61 63 67 73 75 78 79 80 83 85 86 88 90 91 95 97 100 101 102
			68 70 76 12 77 81 82 19 84 23 24 89 32 98 36 37 39 45 48 50 54 20 59 4 
 
	  Bubble
	  K	MCT
	  3		0.01	
 
  LinkClustering
	  BubbleH
	  ET
	  0
			# D_max: 0.591406681046
			# S_max: 0.596774193548
			# Communities: 79
			# Average size : 8
			# Nodes : 95 (seen 635 times)
			3 72
			51 73
			58 30 77 50 102 88 89 97 53 67 82 69 81 86 87 84 85 24 20 21 48 23 46 45 43 41 4 7 6 8 68 11 83 13 99 76 75 38 70 91 90 100 101 95 29 79 78 15 10 39 12 59 14 17 98 19 54 31 16 37 36 35 55 63 32 61 103
			25 72
			62 45
			74 65
			72 35
			77 19 74 20
			19 9 77
			25 84
			103 67 47
			3 30
			25 65
			44 73
			74 73
			9 2
			83 61 88 67 102 90 100 81 95 87 79 85 11 10 101 21 17 16 86 41 2 7 103
			25 74
			27 74
			91 10 17 16 30 88 41 61 62 102 83 101 86 79
			61 88 67 91 83 100 101 86 87 97 85 69 95 57 30 41 7 103 78 31 90
			3 74
			19 98 66 24
			76 64
			4 45 50 72
			102 63 30 33 101
			62 65
			24 20 48 19 32 37 77 98 89 73 54 59
			10 21 61 88 63 90 91 83 101 102 94 97 85 95 25 86 58 17 16 55 18 30 43 53 41 46 69 7 8 103 78
			1 83 30
			9 72
			98 36 77 76 89 70 68 20 84 24 39 12 59 48 23 19 54 82 45 37 50 4 32
			30 42 101
			18 84
			42 72
			90 28 103
			44 27
			25 3
			25 35
			25 62
			102 88 91 90 80 101 95 87 79 85 11 10 81 15 21 14 16 46 86 31 30 61 43 35 41 97 7 6 9 8 18
			60 81 30
			9 52
			52 101 15 16 30 41
			64 73
			39 27 19 37 50 36 98 89 4 68
			99 63 73 102 83 101 95 87 97 85 11 10 38 15 14 17 16 46 31 30 53 41 94 6 8
			73 72
			83 29 88 65 90 102 103 80 86 79 78 10 38 15 21 55 18 31 43 35 46 69 7
			61 63 66 67 102 83 100 101 95 85 10 14 16 46 41 55 69 7 8 78 90
			11 13 58 17 16 47 31 30 29 88 95 83 101 86
			23 34
			93 40
			9 57
			21 16 30 40 41 61 88 6 102 101 86 14
			27 66
			67 43 61 88 63 53 90 69 80 81 86 87 85 21 46 95 29 41 79 7 6 8 83 99 75 91 103 100 101 102 94 97 78 11 10 13 38 15 58 17 16 55 18 31 30 35 34 14
			9 42
			11 13 14 17 30 35 41 61 88 6 93 101 86 87 79
			73 66
			74 72
			3 53
			87 15 95 51 43 41 88 67 102 8 103 69 100
			7 21 43 99 75 64 95 90 101 86 79 78 10 58 55 18 31 30 29 88 6 8
			3 62
			10 88 72 17 69 85 87 53 78
			62 74
			62 28
			9 74
			77 42
			60 98 20 54 77 50
			2 51
			12 2
			91 10 101 14 31 30 53 34 99 61 74 102 83 41 95 78
			57 45 4 23
			9 25
			14 61 44 30 53 41 102 88 7 6 91 83 101 95 79
			11 83 99 61 88 63 90 102 103 100 81 69 94 97 85 91 10 27 15 21 14 16 55 95 31 30 35 41 80 79 101 6 8
 
	  Bubble
	  ET
		0.003
 
	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/mit-nov/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:03:51 +0100}
			# number_of_communities = 11
			# avg size = 9
			# nodes = 95
			19 20 24 32 12 54 98 76 89 50 77 36 70 48 68 4 84 45 37 82 59 23 39 81 60
			30 102 86 14 6 91 61 41 16 101 88 79 18 53 35 63 44 9 40 93 52 72 33 42
			15 97 85 100 80 83 38 73 1
			90 95 29 46 58 8 7 75 43 66 64 94
			67 103 87 2 51 57 47
			21 55 78 34
			11 17 13
			31 10
			69 27 99
			25 65 74 3
			62 28
 
InfoCom2005
  HGCE
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.2	1	100	0.5	0.5	0.1
	  BubbleH
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.2	1	100	0.6	0.9	0.1
 
			# Communities AverageMembers
			# 2 27
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41
 
  KCLIQUE
	  BubbleH
	  K	MCT
	  4	0.009
 
			# k = 4
			# minimum_connected_ratio = 0.009
			# number_of_communities = 1
			# community_sizes = [37]
			# average_size = 37.0
			1 2 3 4 5 6 7 8 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 37 38 39 40 41 
 
	  Bubble
	  K	MCT
	  2	0.11
 
  LinkClustering
	  bubbleH
	  ET
	  0.008
			# D_max: 0.398456726708
			# S_max: 0.368421052632
			# Communities: 15
			# Average size : 7
			# Nodes : 40 (seen 99 times)
			9 30
			21 41
			12 23
			11 25 26 35 15 41 13 14 24 10 39 27 20 21 22 16 33 32 37 29 40 34 1 19 3 5 4 7 6 8 17 18
			38 36
			10 39 15 22 33 37 36 34 1 13
			24 27 39 2 14 35 34
			24 25 13 15 9 5 37 29 34
			18 12
			2 23
			11 39 27 32 30 37 40 35 5 4 28
			26 22 23 33 8 6 18 34
			9 12
			39 38 40 22 19 32 35 4
 
	  bubble
	  ET
	  0.04
 
	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/infocom-2005/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:03:59 +0100}
			# number_of_communities = 3
			# avg size = 14
			# nodes = 41
			40 32 39 1 27 22 15 35 29 19 24 17 4 25 8 13 30 10 9 26 14 36 5 34 11 6 16 38 21 3 23 28 41 2 12 31
			33 20 37
			7 18
 
InfoCom2006
  HGCE
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.2	1	100	0.7	0.5	0.2
			# Communities AverageMembers
			# 2 49
			22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 95 96 98
			22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 95 96 98
 
	  BubbleH
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.3	4	100	0.6	0.5	0.2
 
  KCLIQUE
	  BubbleH
	  K	MCT
	  3	0.02
 
			# k = 3
			# minimum_connected_ratio = 0.02
			# number_of_communities = 1
			# community_sizes = [58]
			# average_size = 58.0
			22 25 26 27 28 31 32 34 35 36 37 38 39 40 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 60 61 63 64 66 67 70 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 93 95 98 
 
	  Bubble
	  K	MCT
	  3	0.02
 
  LinkClustering
	  BubbleH
	  ET
	  0.01
			# D_max: 0.352182263797
			# S_max: 0.407407407407
			# Communities: 95
			# Average size : 4
			# Nodes : 78 (seen 387 times)
			86 35
			38 40 49 36 35
			44 56
			83 48
			24 62 27
			68 69 21
			22 67
			36 48
			30 59
			46 44
			56 71
			91 39
			92 48
			58 94
			83 40
			56 94 40
			59 43
			80 65
			56 40 48
			24 60 62 51 75 52 77 76 88 89
			46 39 86
			47 56 40
			77 58 44 45 51 43 41 60 75 89 73 71 70 64 80 92 95 93 74
			25 38 58 49 32 57 56 88 40 43 60 75 89 53 52 86
			68 48
			46 26 39
			26 80
			39 74 89 46 54 88 53 52
			26 56
			74 97
			56 29
			54 48
			26 97
			46 85
			60 25 75 22 57 51 78
			24 62
			54 41
			76 58 82 32 44 30 51 52
			38 22 49 56 40
			47 22
			87 65 72
			26 58 53 43
			31 65
			83 73 23
			24 80
			88 57 70 28 50 77 75 89 66 67 82 31 93 81 44 84 85
			42 48
			24 62 48
			69 21
			47 45 28 43 60 88 89 73 79 64 93 81 86 84 78
			54 22
			73 29
			56 73 40 49
			80 59
			45 28
			24 62 36
			68 65
			22 48
			65 96
			42 29
			86 85
			25 44
			71 41
			51 21
			63 38 59
			47 92
			25 67
			31 35 49
			29 43
			89 58 43 37 36 53 34
			39 30
			22 85
			97 85
			92 71
			74 64 73 68 51 45 86 43 52
			76 66 91 90 31 96 93 79 78
			59 52
			77 58 32 33 86 45 51 43
			25 65
			83 23
			46 39 59
			30 48
			33 48
			46 39
			48 51 53 76 89 64 66 82 73 80 84 78
			27 49 55 56 35 61 63 40 38
			25 26 74 54 57 75 52
			26 22
			68 47
			69 65
			61 27 51 43
			28 60 88 89 64 66 67 82 80 81 86 87 84 25 44 45 42 43 34 77 76 75 74 73 72 70 91 90 93 95 79 78 51 58 98 32 57 37 50 53 52 54 31
			27 48
			37 59 34
 
	  Bubble
	  ET
	  0.09  
 
	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/infocom-2006/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:04:18 +0100}
			# number_of_communities = 5
			# avg size = 16
			# nodes = 78
			78 93 76 66 60 74 64 82 77 81 45 72 84 88 28 43 89 90 44 53 51 67 79 87 37 70 75 86 91 56 34 52 73 48 58 40 95 41 32 80 71 50 31 54 26 57 85 47 92 98 36 24 30 33 59 42 62 96 65 23 29 83 94 97 21
			38 35 49 63 55 27 61
			25 22
			39 46
			68 69
 
Cambridge
  HGCE
	  BubbleH
	  K	A	M	E		P	S	R	ST	MAP	Z
	  5	1.6	0	0.25	0.2	4	100	0.6	0.5	0.1
			# Communities AverageMembers
			# 4 23
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
			1 2 3 4 5 6 7 9 10 11 13 14 15 17 18 19 20 21 23 25 28 29 30 32 33 34
			2 3 5 6 7 8 9 10 11 12 13 16 18 20 21 22 25 27 28 29 31 32 33 34 35 36
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 23 24 26 29 30 31 33 35
 
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  5	1	0	0.25	0.3	4	100	0.9	0.9	0.2
 
  KCLIQUE
	  BubbleH
	  K	MCT
	  2	0.02
			# k = 2
			# minimum_connected_ratio = 0.02
			# number_of_communities = 3
			# community_sizes = [2, 2, 11]
			# average_size = 5.0
			1 4
			24 26
			32 34 3 36 7 16 21 22 25 27 28 
 
	  Bubble
	  K	MCT
	  2	0.04
 
  LinkClustering - bubbleH
	  ET
	  0.004
			# D_max: 0.75851421306
			# S_max: 0.5
			# Communities: 12
			# Average size : 5
			# Nodes : 34 (seen 55 times)
			11 25 27 21 22 16 32 28 36 34 3 12 7
			9 5 19 15 33
			2 34
			26 35
			3 6
			13 15 6 19 20
			9 13
			10 20 14 17 23 19 18 30 1 33 2 5 4 6 15
			2 35
			24 26
			11 21 22 29 7
 
	  Bubble
	  ET
	  0.003
 
	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/cambridge/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:04:24 +0100}
			# number_of_communities = 3
			# avg size = 12
			# nodes = 36
			34 21 36 22 25 32 28 3 7 16 27 12 11 29 8
			18 33 15 14 5 20 19 4 6 2 23 1 17 30 10 13 9 31
			26 35 24

I originally made the decision to use seperate parameters for Bubble and BubbleH, but in hindsight this was not a good idea, as they cannot easily be compared.

When we discussed the results, we realised the KCLIQUE seems to perform well for latency with BubbleH, despite not having much hierarchical structure. Pádraig suggested I investigate this further, to find out what is happening. In the mean time I have re-run the simulations, using the best parameters for BubbleH – to level the playing field in terms of parameters. As is obvious from the above community structures, only HGCE produces communities with one all encompassing community, and in the case of KCLIQUE for both InfoCom datasets, it produces only one community – this seems odd, as we would expect the best performance from a more complicated structure.

This resulted in the following community structure.

Community Structures

MIT-NOV
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    21	        15.80952381     	4	        87
  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    4	          20.25	            4	        46
 
  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    11          8.636363636     	2	        25
 
  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    78	          8.141025641     2       	67
 
INFOCOM-2005
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    2	          40     	          40	      40
 
  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    1	          38     	          38	      38
 
  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    3	          13.66666667     	2	        36
 
  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    14          7.071428571     	2	        32
 
INFOCOM-2006
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    2	          73     	          73	      73
  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    1	          59                59       	59
  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    5	          15.6            	2       	65
  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    94          4.117021277     	2	        43
 
CAMBRIDGE
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    4	          28.75     	      23	      36
 
  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    3           6     	          3       	12
 
  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    3           12                3         18
 
  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    11          5	                2	        15

Delivery Ratio, Cost, Latency and Delivered Hops for each CFA and for each Dataset, for BubbleRAP and BubbleH.

Delivery Ratio, Cost, Latency and Delivered Hops for each CFA and for each Dataset, for BubbleRAP and BubbleH. (OLD PLOT - SEE BELOW FOR NEW PLOT)

In the results above, there seems to be something odd happening with LinkClustering for BUBBLERap, for Cost and Hops in MIT-NOV and InfoCom2005 it has much higher values than for the other CFAs. Also, there appears to be something odd happening with Latency in the Cambridge dataset, with a maximum of ~1000 hours latency which is about 40 days. The MIT dataset is roughly the same length, but doesn’t get anywhere near the same latency.

The two plots below show Delivery Ratio and Latency for BubbleH, HGCE for the Cambridge dataset.

Delivery Ratio for HGCE-BubbleH, on the Cambridge dataset

Delivery Ratio for HGCE-BubbleH, on the Cambridge dataset

Latency for HGCE-BubbleH, on the Cambridge dataset

Latency for HGCE-BubbleH, on the Cambridge dataset

These seem to show that at the points where the new parts of the dataset are joined, there is a huge jump in deliveries of messages. It also seems that the datasets are truncated at the end. I will double check the start/end times, but this could be due to the original dataset having no events after a certain timestamp, and therefore, in the last section, the simulation stops when there are no events left to process. The huge spike in messages delivered explains the very high and very low extremes – as messages that were sent in the last iteration of message sending, get delivered very quickly. It seems odd however, that the same is not true for the first iteration. It could be, that because the last set have started to be distributed, maybe only one hop away, these nodes are already in a much better position to deliver the nodes at the start of the next section. This could be due to higher activity in the first hours of the dataset collection, perhaps even due to all the nodes being switched on in the same place at the same time. If this is the case, then we need to start the dataset slice slightly later, to avoid this. The activity traces for each dataset is shown below:

MIT-Nov Daily Connections

MIT-Nov Daily Connections

Cambridge Average Daily Connections

Cambridge Daily Connections

Infocom2005 Hourly Connections

Infocom2005 Hourly Connections

Infocom2006 hourly connections

Infocom2006 hourly connections

From these we can see that MIT has a good distribution of connectivy across the dataset, but the other three have only short spikes of activity. This in itself does not make a difference to the comparison of resulst for each datasets, as they are all relative, but it does go some way to explain the very high latency figures for Cambridge.

The issues with KCLIQUE

I looked back at the data for KCLIQUE in the NGMAST simulations, where there were multiple parameters to KCLIQUE, resulting in varying degrees of community structure, for all datasets, the delivery ratio in KCLIQUE seems to increase with member count, whereas community count peaks before delivery ratio really gets going.

commsvsstats-combined

Delivery Ratio, Cost, Community Count and Average Member count.

This makes sense, because for BubbleH, the more we know about members of a the communities, the more messages we can direct using local ranking. And, because we haven’t included an overall community for KCLIQUE (or InfoMap*, or LinkClustering) we have no way to deliver messages to nodes not in a community, until a node holding a message comes into contact with the destination node. I’m not sure if we designed it this way or whether this is just a bug.

The next step is to fix this issue, by either

  1. adjusting the BubbleH algorithm to use global rank to pass messages between nodes at the highest level
  2. Force a global community in the other community finding algorithms (this will have the same effect as 1)

* InfoMap includes every node in the communities list, but does not include a global parent community

Update:

I have now run all of the datasets with a forced global parent community, this seems to have evened the playing field somewhat:

matrixes-global-parent

Results where every CFA is forced to produce a global parent community including all nodes. (NOTE the change in axis labels from previous run)

The interesting thing is that there is very little difference in delivery ratio and cost using BUBBLERap.

Results for NGMAST11/NEMO

May 23rd, 2011 No comments

We have decided not to submit to NGMAST11 conference as the results we have were not really ready, also I found an error in the config setup for KCLIQUE BubbleH. I have therefore run all of the simulations from scratch, just to make sure it is all correct. In this re-run, I ran the HGCE algorithms far too many times for each dataset, so there is a very large number of results for the HGCE algorihtm (~700 for each dataset). We plan to submit to the Finding NEMO workshop instead.

To visualise the results so far, we have plotted the best  (see note below) results for BubbleRAP and BubbleH:

matrix

BUBBLERap and BubbleH compared for Delivery Ratio, Cost and Latency

To determine the ‘best’ run for algorithms with multiple parameters, we first filtered the results to the best 10 for latency, then ordered by delivery ratio followed by cost, this should mean that a good delivery ratio is still shown for good delivery latency, but avoids results where a abnormally low delivery ratio is chosen with a very poor delivery ratio.

Initial results – HGCE, KCLIQUE, InfoMap and LinkClustering

May 7th, 2011 No comments

We ran each community finding algorithm over the datasets: MIT-NOV, Cambridge, InfoCom 2005, and InfoCom 2006. We used bubbleRAP, BubbleH, Unlimited flood, hold & wait, and Prophet to measure the effect of the CFA on the dataset. The results below are grouped by dataset. Only the best results for BubbleH/BubbleRAP KCLIQUE and HGCE are shown. Note: The local ranking used for InfoMap is based on the rank given by the InfoMap algorithm, NOT centrality as with BubbleRAP and BubbleH.

MIT-NOV

MIT-NOV

CAMBRIDGE

CAMBRIDGE

IncoCom 2005

IncoCom 2005

InfoCom 2005

InfoCom 2005

My next plan is to link statistics about community structure (number of communities, average members etc.) to the results.

Also, I want to find out if there is an existing measure of overlappiness? Does the amount of overlap in communities affect the results in any way? Also, what about the extent to which there is ‘nesting’ of communities, does this make a difference?

I also want to visualise the communities produced in some way, so that it is clearer what the network structure for the best results is based on.

This excel file this excel file contains the combined results of each CFA for all 4 datasets, so we can compare the effect of community structure (number of communities, average member size) directly with results for each simulation run.