Archive

Archive for the ‘What i’ve been writing’ Category

BubbleH and SMW11

April 19th, 2011 No comments

Our paper for SMW11 was accepted, and now we will be able to include the results from the bug-fixed version of  BubbleH, which performs very well. An addition that Pádraig suggested, was to ignore level information in the hierarchy, and simply use community size. This has the benefit of making the algorithm simpler, but still incorporating hierarchical structure. I ran the two versions side by side and found that they perform almost identically, with some cases ignoring level being slightly better!

MIT-NOV Dataset, with levels included and not included for multiple parameters to H-GCE

MIT-NOV Dataset, with levels included and not included for multiple parameters to H-GCE

This plot is very hard to read, but it is possible to see the similarities at a broad level. The best performing run was with H-GCE parameters of K=3, E=0.15, ST=0.9, MAP=0.9 and Z=2. It’s structure seems to be relatively flat, with a large number of communities:

click to view

  0(90)
  |
   - 1(5)
  |
   - 2(5)
  |
   - 3(8)
  |
   - 4(8)
  |
   - 5(8)
  |
   - 6(4)
  |
   - 7(28)
  |
   - 8(27)
  |
   - 9(22)
  |
   - 10(17)
  |
   - 11(13)
  |
   - 12(20)
  |
   - 13(16)
  |
   - 14(26)
  |  |
  |   - 15(12)
  |
   - 16(7)
  |
   - 17(4)
  |
   - 18(4)
  |
   - 19(3)
  |
   - 20(8)
  |
   - 21(8)
  |
   - 22(3)
  |
   - 23(13)
  |
   - 24(8)
  |
   - 25(27)

This gives us the following graphic for the SMW11 paper.

bubbleh-bubble-prophet-flood-ignorelevelstrue-bestrun

MIT-NOV Dataset, for multiple parameters to H-GCE, compared to Bubble, PROPHET and Flood

The latency is shown below, which seems to follow the same trend as the previous version, but with BubbleH actually beating all but Flood at the end.

Latency in MIT-NOV dataset for Bubble, BubbleH, PROPHET and Flood.

Latency in MIT-NOV dataset for Bubble, BubbleH, PROPHET and Flood.

This is a positive result, but whilst doing some work towards the next paper for NGMAST11, I realised that we should be doing runs for multiple parameters to K-Clique, however for this paper, probably don’t need to worry so much. Also, for reference, the average value for BubbleH is included in the plot below.

bubbleh-bubble-prophet-flood-ignorelevelstrue-bestrun-withavg

We need to consider whether to evaluate multiple parameters to BubbleRAP, and see whether this affects the algorithm. Also, we need to consider whether the hierarchy is really making things better or is it the sheer number of communities, because the best performing run has a quite flat structure.

SMW11 Paper submitted

March 29th, 2011 No comments

We have recently been working on a paper directed towards the Social Mobile Web 2011 workshop in Barcelona, Spain. A 4 page paper which is really a position paper, with some preliminary results:

Initial SMW2011 Submission

The main focus of the paper was to demonstrate that considering hierarchy in the mechanism for routing would improve the results compared to BubbleRAP, as the extra knowledge would give the algorithm an advantage. We called our algorithm BubbleH (details) and it used Hierarchical Greedy Clique Expansion (H-GCE) to discover communities.

About Datasets

September 14th, 2010 No comments

I have spent a couple of days looking at the quality of the datasets I have parsed (Cabspotting and Geolife) and have had a couple problems. When I imported the data, I made the decision to use the MySQL datetime format for timestamps. This resulted in very poor performance when querying the database (maybe). So, I decided to convert the timestamps to integers, which represented unix time.

ALTER TABLE `entity_locations` ADD `unixtime` INT NULL;

Query OK, 25075904 rows affected (7 min 40.11 sec)

The coversion was a simple case of create an new column, then updating the new column with converted timestamps:

UPDATE entity_locations SET unixtime = UNIX_TIMESTAMP(timestamp);

Query OK, 25075904 rows affected (11 min 13.38 sec)

Then adding an index (I should have added this when adding the extra column, but may not have made much difference)

ALTER TABLE `entity_locations` ADD INDEX ( `unixtime` );

Query OK, 25075904 rows affected (15 min 21.79 sec)

All seemed well and fine, however, I had noticed during testing a sample dataset, that the MySQL datetime field equated to 23 seconds earlier than the new unixtime field. (as parsed by PHP), the ameded dataset maintained this difference, meaning it was not a compound error. This will not be a major problem, unless comparing to another dataset, using a small granularity.

One problem I did notice however, in the Cabspotting dataset, the timestamp column  had ‘on update current timestamp’ set meaning that when the new unixtime column was updated, the timestamp column was given a new timestamp value. I rectified this by removing the trigger, and setting the timestamp based on the unixtime. (not accounting for the 23 seconds).

Another issue I noticed with some of the data, was that the timestamp was set to the time of insert, for some of the geolife data, as well as some records which were zeroed, and some which were in the future. This may mean the need for a full import. Luckily I have kept all of the data and parsing scripts.

I still need to do some thorough analyis of the datasets in other ways as mentioned before. And I also want to convert the social sensing type data into the same format as these two datasets. There is also the CenceMe dataset to consider; it may not be suitable for parsing, as the location data could be very sparse. This of course might be useful for comparing to thicker datasets.

Geolife dataset

Initial analysis of the datasets to find the concentration of readings shows that there are a handful of users who collect data from around apri/mayl 2008, then October 2008, another set of users contribute data in earnest, with fewer contributing from the original set of users. Readings tail off from april 2009, with vert few users contributing after the beggining of August 2009.

Cabspotting

Show a good amount of data is recoreded for most users between 17 May 2008 and 09 Jun 2008, with only a small number of exceptions.

ODCSSS 2009

June 16th, 2009 No comments

I have taken on mentorship of an ODCSSS project which we have dubbed – CitySense. My student, John Paul Meaney, is currently working on plugging in movement models to TOSSIM and Tiny OS – and also implementing simple DTN protocols. We have chosen TinyOS and TOSSIM so that we are able to easily test this in the real world, to compare data with simulation data.

City wide environment sensing overview

October 15th, 2008 No comments

city-wide-environmental-sensing-overview

The document I drew up after supervisor meeting – to get a general overview of the idea.

Supervisor Meeting 02 Oct 2008

October 2nd, 2008 No comments

Met with Paddy and discussed what we need to be on with – we both seem to have been thinking about different things – I seem to have gone down a route of something to do with mobile phones – but it needs to be linked to what we are getting at. I said that I don’t really know what wearable computing really means – as I haven’t seen it in the flesh!

Really need to get into the idea of what infrastructure is needed to make wearable systems work – think of it more as network of wearable sensors.

What issues are there when we are thinking about transferring data amongst nodes – what if instead of being centrally managed – what if they could freely communicate with each other.

Discussed the idea of city wide environmental sensing – a mixture of fixed and mobile nodes which are tagging and recording data – offloading to nodes with big pipes – how does this work – and how do we implement it?

Also need to think about issues such as transient and persistant data – what is kept and what is shared and removed – what is private what is encrypted what is not?

Example application is
citywide environmental monitoring – fixed nodes talk each other and share data,
they also talk to mobile nodes to get data –application to monitor an area.

Nodes about the body can
communicate to other nodes on other bodies, handing off data – gossiping –
every piece of data has timestamp – and is tagged so that that it can be
synchronised at a later date.

Also –  talked about  developing some things that we say is good for
wearable computing – i.e. p2p vs managed nodes then compare previous wearable implementations
to our idea of whats best and see if they succeeded.

Hundreds of shimmers
downstairs in clarity to have a play with.

Speak to Julie doyle
about user study – work out what we are trying to get out and keep it in the back of my head.

Paddy suggested this was a turning point – are we
interested in hard computer science part?  Or more the HCI part?

Overall we really get down to the
basics –

What will be good
about P2P vs Central Node?

What is persistant
what is transient?

Matt Walsh – they have
the OS with low level access to data – what do we need to do to get something up and running?

Ideas about context

July 29th, 2008 No comments

I’ve been reading lots of things recently about wearable computing, but previously I was reading around context and situation awareness, had a number of thoughts based on the ideas I read. Alot of my more recent reading has also referenced some of this material, so it seems  it may be useful, and I have written a short document about how I understand it at the moment.

Ideas About Context

Sensor PSO

November 21st, 2007 No comments

I have finished a paper on Sensor PSO, which is for one of my structured PhD Modules: Natural Computing. Details can be found here attached is the PDF version.

Optimum Coverage of the Autonomous Sensor Swarm

Current Writings

October 5th, 2007 No comments

Not alot i’m afraid :S