Archive

Archive for June, 2011

The problem parents

June 22nd, 2011 No comments

After we submitted the paper to FindingNEMO2011, I started to work on the Social Sensing dataset, and Pádriag’s suggestion to create a Random community finding algorithm. Initial tests showed a peculiar phenomenon: All runs of BUBBLERap and for KCLIQUE with multiple parameters were coming up with exactly the same results. Now, this really shouldn’t happen, so I looked further.

Unfortunately, or fortunately, depending on how you look at it, it is not a bug in the code. It was because of a decision we made a long while back, to force each community finding algorithm to output a top level community including all nodes. This works well for BubbleH, which uses targeted community routing. But, and in hindsight, this is obvious, BUBBLERap simply looks as community membership, and so, when it meets another node, they are ALWAYS in the same community, and therefore it routes using ONLY local ranking. This has the effect of making our implementation simply centrality based routing.

I spent the last couple of days re-engineering the configuration, so that now, we can get results for communites with a global parent, and without a global parent. Shown below are the results from the FindingNEMO11 paper, and below that the updated results. Where we have used no-global-parent for BUBBLERap and a global parent for BubbleH.

Results used in FindingNEMO11 paper

Results used in FindingNEMO11 paper

Results for ALL CFAs and Datasets, using Global Parents only for BubbleH

Results for ALL CFAs and Datasets, using Global Parents only for BubbleH

The new results shows a slightly different picture, in that the improvement in delivery ratio and latency are less pronounced. One interesting thing is that for InfoMap, which partitions the graph, the results are exactly the same. This is because there is no ovelap, which both algorithms to work in the same way, i.e. BUBBLERap uses global centrality to route between nodes of different communities, and BubbleH uses the local centrality in the GLOBAL parent to route to nodes not in smaller communities. This means that BubbleH is actually designed solely for overlapping community structure. With regards to the Heirarchichal clustering of HGCE, we see that it performs poorly for Cambridge, but roughly equally for the others.  LinkClustering for BUBBLERap in InfoCom2005 and MIT-NOV performs very poorly on Delivered Hops. This could be down it creating a large number of communities.

ngpvsgp-bubble-mit-nov-linkclustering

Global versus No Global parent for LinkClustering BUBBLERap on MIT November,

It seems the difference is very pronounced in LinkClustering, but it is not a problem with the results. The differences between the results for HGCE is the result of randomness (pertubations) for each run.  This may result in withdrawing the paper from SMW11. Which is very unfortunate.

My next step is to finish getting the Social Sensing dataset prepared, (by testing muliple CFA parameters against it) and add it to the matrix plot. Then, I will finish doing the Random runs against all of the datasets, to see if the CFAs have any real effect on performance, or whether it is insignificant. (!)

Finally, if things go to plan, I will continue working on the twitter based dataset, and perhaps the new sensor dataset from St Andrews.

Categories: Uncategorized

Social Sensing Dataset

June 15th, 2011 1 comment

I finally got around to extracting out contact events from the social sensing study. I have made some notes on the Social Sensing Dataset page. The main images are shown below:

Social Sensing Dataset activity, 15mins, hourly, daily, 3 daily, and weekly. Lines showing the most active period

Social Sensing Dataset activity, 15mins, hourly, daily, 3 daily, and weekly. Lines showing the most active period

It seems most active from Oct 2008 onwards, and tails off, with the end of Nov 2008 being a good cut-off point. This means there is over a year of contact data.

Force directed connected time graph for SocialSensing dataset, showing mapping of users to nodes, with low weight edges removed. NOTE: Do not share this image - remove names/btmacs first

Force directed connected time graph for SocialSensing dataset, showing mapping of users to nodes, with low weight edges removed. NOTE: Do not share this image - remove names/btmacs first

The connected time graph (i.e. edges weighted by the total length of time connected with the target node) gives a really good sense of the social structure. As I personally know most of these people, (and I am on it myself) it is easy for me to see the relevance of the layout.

I haven’t written code to lookup locations based on cell towers/wifi points yet, but it might be interesting in the future if we end up using location.

Also, I need to work on getting BTMAC addresses for the three other people. The problem is that I know that at least one of these people changed phones during the study – hence why they are removed.

Extracting contact networks from less obvious datasets.

June 14th, 2011 No comments

There are three (or more) possible routes from this point, either we delve deeper into the merits of using community structure in routing, and try to categorise different networks based on the performance differences of routing in these diverse contact networks. Or we concentrate on the mechanics of making the existing system fully distributed, and therefore having the possibility of real-world deployment. Or we could go down the route of adding a new aspect to the mix, and finding out what location does when put into the mix. However, there is a problem with all of these, and that is that we don’t have the datasets to measure any of them.

Pádraig and I recently discussed relaxing the specification on datasets, to give us larger and more robust data to deal with.

Conrad has a dataset based on wall posts on a social networking site, which we could use as an analogue to a human contact network: A contact event occurs when a wall post is made by one person to another, the ‘duration’ of the contact is based on the length of the wall post.

The Enron dataset has been used in the literature (Kleinberg vector clock paper) before, and we could adapt this in the same way, using the emails to build a network of contacts in a similar way – with message length (if available) as a indicated of contact duration.

Pádraig also mentioned a blogging network, from which we could extract contacts from links, or trackbacks/pingbacks etc.(?)

Finally, the Sentiment project allready collects a large number of tweets, along with location information about them, talking to Anthony Brew, it looks like it may be limited, but I think it is worth a look over the data to see if there is enough coverage to build a reliable network from directed messages (@someone). The problem is that we will only have a percentage of the tweets from the twitter API (apparently). But we could override this by making use of whatever (rate limited) we can get from twitter about an interesting group of specific people, rather than just looking at everyone.

Categories: Datasets