Archive for June, 2012

Final Chapter – Next Steps

June 22nd, 2012 1 comment

Met with Pádraig to discuss next steps.

We decided to go along the route of trying to classify CFAs based on synthetic datasets. We realised that there are no synthetic datasets out there that model both contacts and clustering.

We decided to be clever with the Studivz dataset. We will run the Moses CFA over it. Then pick a random community, then build out the network based on connected communities.

The idea being that we can decide how many nodes we want, and keep going until we have enough.


I implemented this. The script takes an edge list, the community allocation by Moses. (see file in SVN)

The script picks a random community C0, then, while the total number of nodes is less than the max number of nodes, it picks the community (C1) connected to C0 that has the largest number of links to C0. Then, it picks one of these communities at random, and repeats the picking process until the total number of nodes has been reached or exceeded.

Initially, I did this four times, with a limit of 200 nodes and produced the following graphs, coloured by modularity class, and sized by betweeness centrality (sized linearly between 5 and 100 in gephi).





But then I got to thinking what are we actually trying to test here? It’s a bit vague to just see what happens in these datasets. We could be testing a number of things:

  • Which CFA works best for well connected communities
  • Which CFA works best for poorly connected communities
  • Are we testing a particular aspect of a network?
    • Density?
    • Avg. community size
    • Average network size

So I also decides to generate a set of nodes in poorly connected datasets, so adapted the algorithm to pick the communities with the lowest number of links. Resulting in the four below:





Also, perhaps we should be considering weighted links, perhaps the most time connected vs the least time connected?

Each of these subsets should be compared in terms of other network metrics to see if there is an effect on CFA performance.

awesome command to do this:

EXPERIMENT_GROUP=FINAL_EVALUATION DATASETS=studivz-three-200-A,studivz-three-200-B,studivz-three-200-C,studivz-three-200-D,studivz-three-200-A-MIN,studivz-three-200-B-MIN,studivz-three-200-C-MIN,studivz-three-200-D-MIN,mit-nov,mit-oct,cambridge,social-sensing,hypertext2009,infocom-2005,infocom-2006  && for DATASET in $(echo ${DATASETS} | tr "," "\n"); do php -f scripts/stats/DatasetCommunityLinkCountStats.php OUTPUT/${EXPERIMENT_GROUP}/edgelist-graphs/${DATASET}/edge_list.dat OUTPUT/${EXPERIMENT_GROUP}/communities/${DATASET}/Moses/no-global-parent/edge_list.dat.communities.dat  CSV ${DATASET} > OUTPUT/${EXPERIMENT_GROUP}/data/${DATASET}-DatasetCommLinkStats.txt; done

Dynamic Dataset Visualisation for DTN Simulations

June 13th, 2012 No comments

I was thinking about a way to visualise the flow of messages around a DTN network during simulations runs.

At the moment, we (PC and I) visualise the contact networks we are working with using the aggregate graph of the period of interest, nodes are sized based on betweenness centrality, edge thickness represents its weight, based on  the total connected time or connected time ratio between nodes. This network is laid out using some force directed layout, usually using the Gephi software’s Force Atlas (1 or 2) layout, or OpenOrd, and a combination of techniques such as non-overalap of nodes, and expansion/contraction. We sometimes remove low weight edges for clarity. Often, we also colour the nodes based on their community assignment.

This works well, and with a bit of manipulation, results in nice enough looking graphs.

I would like to have a nice way to visualise the flow of messages around this network for a given routing algorithm. I imagine a vizualisation much as the above, but with some adaptions. The adaptions allow for other information to be displayed:

  • The temporal connections of the edges – at some points in time, edges do not exist, when nodes are not in contact.
  • The data transmitted across edges – as time goes on, edges may be used to transmit messages
  • The data held by nodes at a given point in time – the buffers of nodes fill and empty over time, as data is transmitted

I would like to see a vizualisation, where the nodes are initially laid out in some appropriate manner e.g. force directed, based on the aggregate graph, so that nodes can be tracked over time, if they happen to move (based on some criteria). The nodes are sized based on Betweenness Centrality (or some other ranking/partition), along with some measure of their data storage, for example, a dotted border, where each dot represents a message. Perhaps a shadowed version records the accumulated count so far of all messages the node has seen.

In this visualisation, edges are only shown when they are active, and fade (with some function) as they become inactive (perhaps proportional to their connection time), perhaps they are still shown using their aggragate graph weight as a thickness, or perhaps this is the accumulated weight (perhaps showing the total aggregate as a shadow). Edges are decorated with information, such as dashes, each dash representing a message transmitted along the path. Each Dash is coloured based in how recently it was transmitted, perhaps in a binary fashion, where recently (tunable) is colours, say red, and non recent messages fade to blue as they reach a time limit. (do we keep edges greyed out, but vizible when they are inactive?) Dashes are located at end of the edge where the sending node is, to indicate the direction of the transmission – perhaps these series of dashes is terminates with an arrow.

When edges become active, they are emphasised in some way, perhaps flashing or briefly pulsing. An alternative orthogonal emphasis is shown when a message is transmitted, so that it is obvious when an edge is activated/de-activated, when a message is transmitted, or when both occur at the same time (i.e. they are easily distinguishable).

In the case of nodes and edges, dashes/dots represent messages, and should be displayed in a uniform size, and where the number of dashes exceeds the length of the edge, or circumference of the circle, they flow outwards from the line, so they appear stacked. In cases where the number of dashes exceeds the limits of the edge more than, say, 3 times, dashes are concatenated into an alternative symbol (e.g. a block) representing a larger number of messages e.g. 10.  Alternatively, a stacked buffer style viz. can be displayed beside the edge/node.

Dynamic interactions:

When a given dash is selected, representing a message, the path it has taken through the network is highlighted, perhaps a side panel of messages, nodes and edges is shown, with numerical statistics with sorting, so that important items are easily identifiable. These can be highlighted on the main viz. when selected (e.g. hovered over).

When an edge, node, or series of edges is drawn across, different modes cause different effects:

  • All Messages, All paths – the paths of any message traversing the edges is highligted
    • Only most recent N messages across edge
    • Only messages within time period
  • All messages transmitted/received by this node
    • All messages originating/terminating at this node
    • Recent/N/Time period of Messages
The layout parameters can be tweaked during vizulisation, perhaps switching between layout based on aggregate graph, accumulative graph and instantaneous graph.
Something enabling start/stop/pause/rewind/fast-forward/fast-reverse/repeat-period should also be implemented. It is also quite important to be able to drill down into the viz. to see interactions in small sections.

Data driving the viz:

The source data for the viz. could come in the form of three pieces of information:

  1. The list of aggregate edges between nodes (inlcuding weightings), and (maybe) a list of nodes – e.g. in some graph format which allows meta data for nodes and edges
  2. The list of contacts between nodes – start time, end time, node id, node id
  3. The list of movements of messages between nodes. message id, Source node, target node, transmission start time, transmission end time (assuming there may be some duration of transmission), and maybe including a list of all messages with source and destination. (for message initialisation a start time might be included).

It would be up to the viz. software to calculate the aggregate graphs, order the contacts, arrange for messages to be transmitted at correct times.



Categories: Ideas, Thoughts, What Matt Says

WWIC2012 – Santorini, Greece

June 12th, 2012 1 comment

The conference went well, and I managed to speak to a few interesting people. Eiko Yoneki (Author of the BubbleRAP paper) was there, and we had a brief chat, she was interested in our work. She would be a perfect candidate for an external examiner, as her work is very similar, and she is an expert in the field. Plus she’s not far away – Cambridge University.

The presentation seemed to go well, there were plenty of people in the room, it was well received and I got a few questions. It was definitely a good place to submit this work. The session chair in my session was Scott Burleigh, from NASA, who was one of the authors of RFC5050 – a DTN Specification, and he seemed to be quite important in the field.

There were a number of talks about much lower level stuff, but all very interesting. I think I need to read up a bit on the DTN Bundle Specification and the DTN Routing Specification (based on PRoPHET I think), and include a small section on these in the thesis background.

I was also chatting to Bodgan Ghita from Plymouth University, who suggested that I could apply for a research/developer position in DTN security, but he needs someone to start in July. I don’t think I’ll be done by then – plus I’m not sure it’s my kind of thing.

Another interesting guy was Jeorg Ott – who gave a keynote(?) talk which did some nice review of the general DTN area, “DTN – is it for humans?” In which he basically answered yes, but there needs to be thought about the real-world applications of DTN based systems.
He gave an interesting simulation scenario that he had done, a market trader auction system, where the vendor advertises the prices of items, and local users bid/barter the prices, expanding the traders rearch. His idea worked in the same way as the cogs idea I wrote about while back – The main emphasis was determining how far the data needed to travel from the physical location, i.e. in what location context should it exist, and when no-one is there, the data is deleted/lost – as it no longer has context.

Very few of the talks included real-world datasets, and it seemed to be acceptable to use synthetic datasets. However, there were no papers that were very similar to our work.

Also, Santorini was pretty awesome 🙂 definitely a place for couples/families though!