### Archive

Archive for September, 2010

## Meeting with Pádraig and Davide 30 Sep 2010

Discussed what we should be getting on with:

I will spend some time organising the data from the Social Sensing Study (SSS) and planning an appropriate structure that can be used by any dataset. Around the idea of a table with contact information (e.g. node1 met note 2 at this time) which is assumed to mean that entitied could communicate at that time, and another table which is the place of individual entities over time (e.g. node1 was at location A from T to T’), the place is determined as some labelled k-clustering of raw readings.

There will be some policy decision that will have been made, which determines how data is extracted, these need to be recorded so we can discuss whether they are valid. Can entities communicate only when the raw readings show that they both detected each other, or is  one way detection sufficient. Or how do we detemine the best value of k for clustering.

I will also start to think about how to build a simple simulator that will allow us to test test cases (e.g. send a message at every timestep to every other node to test the limit of the network) for multiple routing algorithms (ideally simultaneously, e.g. BUBBLE and Grahams algorithm).

Also with the idea in mind that datasets, when conforming to this structure, can be tested without having to change code.  By the next meeting I will have come up with some ideas, defined some policies, and will also bring some example data along to inform our choices.

Categories: Uncategorized

## Whiteboard discussion with Davide 29 Oct 2010

Had a discussion with Davide about what we had decided would be the best algoritm to use as the baseline scenario for routing.

We think the best approach is to use either Graham’s periodic degree routing approach, or the BUBBLE protocol. We sketched out the algorithm for Grahams algorithm, and realised that in fact it is very similar to the BUBBLE protocol, apart from BUBBLE does not adapt to a changing period in which to calculate degree, and Grahams protocol does not use communities.

We also had some ideas about detecting places as higher level concepts of where a node is, rather than raw location readings (e.g. GPS).

We also talked briefly about how we would use the popularity of places, in combination with community knowledge as the basis for a routing algorithm. I will write this up seperately.

Categories: Uncategorized

## Kick off Meeting with Padraig and Davide 21 Sep 2010

Had a first meeting with Padraig, where I presented a few slides (Presentation to Padraig Sep 2010) about my area of interest and how I got to it etc.

Padraig initially suggested that location is not important, but later decided that in fact, it is useful when the network structure is not fully known.

He suggested that I take a baseline scenario, using a reliable dataset (Social Sensing), and use is as a straw man to test further refinements on.E.g. Take the degree approach (Graham & Davide), train it over 1 month, then have a test period, where 50 messages are sent across the network. At the same time have a firm idea about the next stage refinements, i.e. using location to improve routing. (i.e. use a location based routing algorithm). This will then drive the next round of improvements.

He also suggested that the application of the idea is not so important, and whilst I will deal with that in the thesis, it will not figure greatly.

He also talked about community finding in the network, and how we could use that for routing.

I agreed to send on Graham’s paper, and perhaps the BUBBLERap paper.

We will meet again next week.

Categories: Uncategorized

I have spent a couple of days looking at the quality of the datasets I have parsed (Cabspotting and Geolife) and have had a couple problems. When I imported the data, I made the decision to use the MySQL datetime format for timestamps. This resulted in very poor performance when querying the database (maybe). So, I decided to convert the timestamps to integers, which represented unix time.

ALTER TABLE entity_locations ADD unixtime INT NULL;

Query OK, 25075904 rows affected (7 min 40.11 sec)

The coversion was a simple case of create an new column, then updating the new column with converted timestamps:

UPDATE entity_locations SET unixtime = UNIX_TIMESTAMP(timestamp);

Query OK, 25075904 rows affected (11 min 13.38 sec)

ALTER TABLE entity_locations ADD INDEX ( unixtime );

Query OK, 25075904 rows affected (15 min 21.79 sec)

All seemed well and fine, however, I had noticed during testing a sample dataset, that the MySQL datetime field equated to 23 seconds earlier than the new unixtime field. (as parsed by PHP), the ameded dataset maintained this difference, meaning it was not a compound error. This will not be a major problem, unless comparing to another dataset, using a small granularity.

One problem I did notice however, in the Cabspotting dataset, the timestamp column  had ‘on update current timestamp’ set meaning that when the new unixtime column was updated, the timestamp column was given a new timestamp value. I rectified this by removing the trigger, and setting the timestamp based on the unixtime. (not accounting for the 23 seconds).

Another issue I noticed with some of the data, was that the timestamp was set to the time of insert, for some of the geolife data, as well as some records which were zeroed, and some which were in the future. This may mean the need for a full import. Luckily I have kept all of the data and parsing scripts.

I still need to do some thorough analyis of the datasets in other ways as mentioned before. And I also want to convert the social sensing type data into the same format as these two datasets. There is also the CenceMe dataset to consider; it may not be suitable for parsing, as the location data could be very sparse. This of course might be useful for comparing to thicker datasets.

Geolife dataset

Initial analysis of the datasets to find the concentration of readings shows that there are a handful of users who collect data from around apri/mayl 2008, then October 2008, another set of users contribute data in earnest, with fewer contributing from the original set of users. Readings tail off from april 2009, with vert few users contributing after the beggining of August 2009.

Cabspotting

Show a good amount of data is recoreded for most users between 17 May 2008 and 09 Jun 2008, with only a small number of exceptions.

Categories:

## Discussion with Davide 10 Sep 2010

Had a brainstorm with Davide about what we should analyse. We decide to use the Geolife dataset to test out some ideas:

Firstly I will do a meta-analysis of the data, to see what nodes we do and do not have data for in a given time period. Counting the number of reading per day over the dataset period.

We want to identify the colocation between nodes, and we calculate this

$C_{AB}(t) = 1 - \theta(|x_A(t)-x_B(t)|-\lambda)$

which means – if A and B are within λ distance of each other (where x is the location of a and y is the location of b) at time t, Cab is 1, else Cab is 0.

$C_{AB} = \frac{1}{T} \sum C_{AB}(t)$

This gives us the average over all time periods, showing the co-locatedness of a and b. (the sum of the co-location of A and B over all locations, divided by the number of time periods)

We use this to construct a graph, where a and b are connected using a weighted edge labelled with Cab – this network is the structure of co-locatedness, and is exactly the same as a proximity network; it does not care where you meet, only that you do meet.

We calculate this in 1 month(?) blocks to see how the graph evolves over time.

We are also interested to find out if nodes are influential over locations (e.g. if particular nodes visit a location, which then becomes popular).

We plot the popularity of x over time, where the popularity is the number of poeple that visit a location in a day. Where the set of X  is derived from all known locations.We hope to find a pattern where the graph tails off, or increases over time. flat graphs are un-interesting.

For this I will need to calculate locations from the data.

Categories: Uncategorized

## Weekly update 10 Sep 2010

Davide and I have been conducting a quick literature search for interesting papers relating to location in delay tolerant networking, and interesting goals to do with knowledge and prediction of location. The first round led us to four main areas of interest:

• Reccommendation systems
• Routing in Delay Tolerant Networks (obviously)
• Peer-to-peer over proximity networks (e.g. bit torrent over DTN)
• ????

I am in the process of reading a few papers of interest, and Davide is working away at finding other information out. We plan to meet this afternoon to discuss our progress.

I have also parsed and assimilated the GeoLife and Cabspotting datasets, and plan to do the same for the CenceMe dataset.  When we get the Social Sensing project dataset, I will do the same with that. I have tried to use a consistant format for the data, so that when it comes to it, it will be possible to analyse it all in the same way. The simple format is based on the concept that networks are formed of entities, which also have locations that they visit over time.

I had already written, and have since updated to reflect this format, a playback visualisation of entity movements. This is really just a way of seeing what the data looks like over time.

Funding update:

I have finally managed to sort out the details of my funding, and have calculated that I have enough funds to cover a stipend for the next 11 months (I have already paid half fees for this year):

€11,976.42 remaing IRCSET Scholarship fund

+ €2621.25 credit on UCD account – which will be transferred to my IRCSET fund by 18 September 2010

= €14,597.67 / €1333.50 = 10.94 installments.

However, the only problem is that I will not be paid this month, so I plan to ask payroll to produce an advance cheque for this month (fingers crossed).

Categories: Uncategorized