Home > Datasets > Next steps

Next steps

In the last few days, I have been trying out new CFAs for use in the next paper for NGMAST11, the idea behind the paper is exploring different CFAs and datasets and seeing what happens. In order to do this, I have decided to first explore what the Different CFAs do using the MIT-NOV dataset, for both BubbleRAP and BubbleH, i.e. use the communities found by each to drive both DTN algorithm. This gives a fair comparison. Then, I will perform the same tests on different datasets, I envisage problems for some CFAs as the datasets we intend to use are in some cases small, and may not yield good community results, and therefore we will explore thresholding of edges before finding communities (depending on whether the CFA can accept weighted edges or not.

The following tables sum up the tasks invloved, and will be completed as I get the work done:

Work to do

    BubbleRAP BubbleH
  HGCE Done Done
MIT KCLIQUE Done Done
  LinkClustering Done Done
  InfoMap Done Done

Some notes so far:

InfoMap undirected, weighted version gives it own ranking for nodes within communities, which can be used in BubbleRAP and BubbleH, so the two versions could be compared. The hierarchical version also does this, but only for the finest level community.

It might be worth writing new community dataset loader for ContactSim/LocationSim so it can take in and calculate the hierarchy, for example, at the moment, each community is defined by a list of nodes, but if we prefix this list with a parent id, it will implicitly specify the hierarchy. (this may be time consuming, but worthwhile down the line, as we can do away with HierarchyOracle and associated classes, and can work directly on communities.

Results so far:

Community finding

InfoMap, which partitions the graph, is easy to visualize,

InfoMap clustering of MIT-NOV dataset

InfoMap clustering of MIT-NOV dataset - Colour of nodes indicates clustering (edges are coloured by connecting node and weighted by connected time)

The others are harder however, as there I could think of no easy way of indicating all the communities a node belongs to visually, the following shows LinkClustering.

LinkClustering communities on MIT-NOV where the size and colour of the node indicate the number of communities it belongs to

LinkClustering communities on MIT-NOV where the size and colour of the node indicate the number of communities it belongs to

The communities found by HGCE are multiplied across many parameters to the algorithm, so they are much harder to visualise in terms of nodes and edges, however, the plot below, shows the effect on community size and number of communities when changing parameters to HGCE. The graph below is orderded by MAP, ST and average nodes per community. Details about the parameters to HGCE are summarised here.

MIT NOV, HGCE multiple parameters

MIT NOV, HGCE multiple parameters

BubbleH and BubbleRAP

Categories: Datasets
  1. No comments yet.
  1. No trackbacks yet.