Home > Datasets > Extracting contact networks from less obvious datasets.

Extracting contact networks from less obvious datasets.

There are three (or more) possible routes from this point, either we delve deeper into the merits of using community structure in routing, and try to categorise different networks based on the performance differences of routing in these diverse contact networks. Or we concentrate on the mechanics of making the existing system fully distributed, and therefore having the possibility of real-world deployment. Or we could go down the route of adding a new aspect to the mix, and finding out what location does when put into the mix. However, there is a problem with all of these, and that is that we don’t have the datasets to measure any of them.

Pádraig and I recently discussed relaxing the specification on datasets, to give us larger and more robust data to deal with.

Conrad has a dataset based on wall posts on a social networking site, which we could use as an analogue to a human contact network: A contact event occurs when a wall post is made by one person to another, the ‘duration’ of the contact is based on the length of the wall post.

The Enron dataset has been used in the literature (Kleinberg vector clock paper) before, and we could adapt this in the same way, using the emails to build a network of contacts in a similar way – with message length (if available) as a indicated of contact duration.

Pádraig also mentioned a blogging network, from which we could extract contacts from links, or trackbacks/pingbacks etc.(?)

Finally, the Sentiment project allready collects a large number of tweets, along with location information about them, talking to Anthony Brew, it looks like it may be limited, but I think it is worth a look over the data to see if there is enough coverage to build a reliable network from directed messages (@someone). The problem is that we will only have a percentage of the tweets from the twitter API (apparently). But we could override this by making use of whatever (rate limited) we can get from twitter about an interesting group of specific people, rather than just looking at everyone.

Categories: Datasets
  1. No comments yet.
  1. No trackbacks yet.