Archive

Archive for the ‘projects’ Category

Location and Persistence Based Routing (LPBR) thoughts

December 16th, 2010 No comments

I think that once we have some robust way of relying upon what a location is, we can start to be a little bit more sophisticated with the routing protocol. i.e. making real-time decisions about routing based on the current history, rather than the broad ranking.

For example, extending the idea in PBR which uses duration of contact, and including the idea of Prophet and various others, which use the time since the last meeting with a node, we can use these for locations, and contacts;

Node A keeps a record of contacts and locations, and interrogates
            other nodes on encounter.
On encountering another node B, pass the message for C:
     if B spends more time than A, at any location that node C does
     or B has visited the destination nodes most popular location(s)
           more recently than A
     or B has seen the destination node more recently than A
          ?(and it has a regular pattern (periodicity) of seeing the destination
           node which can be predicted, and means that B will see C
           sooner than A)?
      or B has spent more time than A, with node C in the past

This effectively pushes the message to to the places that the destination node often visits, and to the nodes that see the destination more often.

In a large network, with many very weakly connected communities, it would be difficult to penetrate the destination communities with messages, however, an enhancement to get messages sent widely is to use some other broadcasting mechanism first, such as spray and wait, where there are a limited copies of a message, of which half are shared at every contact until each node has one copy.

At this point the Spray and wait protocol would hold onto the message waiting to see the destination. However, if we then start LPBR routing from this point, we could find that one or some of those messages are with nodes in the destination community, and therefore a greater likelihood of getting the message to the destination.

Categories: Ideas, projects

Understanding ContactSim

November 3rd, 2010 No comments

In the last post, I mentioned Graham’s simulator, ContactSim, which he descibes as follows:

ContactSim is a discrete-time event simulator which is specifically designed to facilitate the playback of contact traces. It is flexible, allows simple addition of new datasets and swapping of existing datasets for the underlying contact patterns, and has a powerful XML configuration language.

I have managed to have a look through it, and with the help of Graham’s appendices notes and emails, I have managed to get it working, and have started to understand how it works. Graham mentioned that he has some ideas about how I might be able to adapt it to also consider location, and automatically run the kind of simulations we have been thinking about.

Grahams Appedix A lists in more detail how the simulator works. He suggests adding a LocationUpdate event, which informs nodes (aka processes in ContactSim) of any changes in their location, which can be used by the routing algorithms.

As a test, I formatted the co-location data that we generated from the SocialSensing study into the CSV format used in the simulator (I need to get a bit more information about the correct way to do this for Graham, but for the time being I copied the formatting of the MIT Reality Mining data format used by Graham), and ran two algorithms over it, Persistance Based Routing  and Prophet.  Whilst I am still trying to work out what the results mean(!) I produced the two plots below:

delivery_ratio

This plot simply shows the delivery ratio of the PBR prototcol using the MIT dataset with 103 users, vs the Social Sensing Study data with 26 users. It covers different time periods (both of 1  month), but clips some data at the end of the MIT dataset. However, it does show that the datasets are in some way comparable.

I am continuing to investigate the software, and will speak with Graham (who is in Canada until Christmas) about how best to adapt the software for our needs. It would seem wasteful not to adopt this software which does such a similar thing, without first testing its feasability. Another simulator mentioned by Graham is the ONE simulator, which I have not yet looked at.

What I am also planning to do, is to find out whether the MIT Dataset contains some notion of location, even if that is just cell tower data, that way, it would make it easier to compare any future location based DTN algorithm that we come up with, to Graham’s work, and any other work that uses this dataset.

Data Analysis – Co-location using WiFi access points

October 19th, 2010 No comments

In the last post, I described a method for deciding on co-location based on spotting of common WiFi points, which co-located users if they saw the same WiFi access point, at roughly the same time. We refined this method to discriminate when the approximate distance, based on signal reading (which could be signal strength, or signal quality). Pseudo code for this algorithm:

  • For each consecutive timestep window, between start and end.
    • Retrieve all readings within that window for all users.
      • Foreach user
        • pick the strongest signal from the list of readings for each access point (as there may be multiple)
        • rank reading in order if signal strength
        • discard the readings for access points where:
          • Signal Strength divided by Strongest Signal Strength is less than Alpha
        • Foreach other user
          • perform signal processing as above
          • if the intersection of the remaining list of access points for each user is not null, consider the two to be co-located

The variables in this algorithm are the start and end time, the size of the window and the alpha value.

There were a few factors to consider before we could implement this, firstly, it was important to find out what the values for signal strength actually meant;  the data ranged between values of 11 to 110.  I used an N95 phone, with the Campaignr software installed, to record a number of reading at varying distances from known access points. I instructed the software to upload to a script that simply recorded the data in a text file.  This showed that the ‘signal_strength’ value increases as the phone moves away from the access point. Another factor to consider, was the range of values that were reported in the database. Davide suggested plotting the values of all signal strengths for all readings, over the the readings that would be selected using the alpha value selection. The figure below shows the data based on an alpha value of 0, for all readings,  0.8, and 1.0 (for only the strongest readings).

The second figure shows all readings, split into individuals.

The next step is to decide what alpha value to use, and store the co-locations into the database. To this end, I have written a script to parse all 3.6 million records, calculate the alpha value (Strongest signal strength over signal strength) and write the value back into the database. This will allow us to quickly see the effect of using different alpha values for decision making, without having to repeat the calculations every time.  It will also allow us to plot the effect of different alpha values against each other very quickly.

Quick meeting with Paddy 6th May 2010

May 6th, 2010 No comments

Met with paddy in his office and discussed updates so far.

Paddy wants to see something tangible by next week’s update – i.e. a worked example of how vector clocks will work in terms of location.

Also emphasised that I should not get sidetracked! (of course!)

Suggested storing temporal  information (parallel vector clocks?) – e.g. a from to, so that we can say things like does this time overlap this one, is it contained withing etc. etc.

Also thought about how to define location – the bounding of the location  gives an experimentation thing – change the grid size – whats the computation impact of the size of the grid – and what the relevance e.g. too big and it makes realistic.

Construct vector stamps – 5 separate path across these locations, two or three allow drop messages – run through and pick various vector clocks at various times, and show them. Then start generalising them.

From this we can draw general things about e.g.: decisions made, what information is stored, what we put in vector clocks, what operators we need.

Then run a simulation and see if generalisation works. Then we can see if some things fall down, and come back and change things.

Should stick with ideas about location, not proximity yet.

Using this it is then possible to write this concisely.

Actions:

  • Generate a worked example/scenario
    • show examples of vector clocks at times
    • show the movements over time
  • Don’t get sidetracked!

Presentation 10th Feb 2010

February 11th, 2010 No comments

Gave presentation to Paddy, Davide, Neil Cowzer and Fergal Reid (clique) about my quick and dirty analysis of the dataset that I have collected allready.

Slides

General concensus was that there was not really enough users, and so there were some suggestions about other datasets that might be found -persuade a mobile phone company to give data about user movements. Mine flickr/twitter for geo-tagged photo’s/tweets, and try to determine groups of people based on similar locations.

Also suggested that the GMA is good for visualising data, not greatly interesting, PH is interesting as is SPD. BD is something that is useful as an application to gather data, but would need a very large engineering effort.

Paddy suggested that if we could make the data collection process very easy, then we could throw it out to the student population to start collecting data. Fergal said that in J2ME it would be very difficult, but by sticking to C++ it might work (for Nokia phones).

Also talked about getting ground truth for data, Fergal Suggested collecting accellorometer data too (so if someone asked – how did you verify GPS trace, one can say that we correlated it with the accelorometer data). I also suggested tagging locations.

Determined the following actions:

  • Look for access to datasets with good location – 1 week
    • WaveLAN Dataset
    • HeaNET – chase paddy – Eduroam
    • Mine location data from Flickr
  • Look at applying analysis to these datasets – specifically
    • Periodicity Hunting
    • Spatial Dependance on the Degree
  • See if we can construct overlay over these networks
    • e.g. drop nodes
      • Popular locations
      • popular people
      • Other?
      • Vector clocks might be the way to do it
  • Read up about Vector Clocks as suggested in the paper by Klineberg, Watts and ???? at  KDDOA
  • Speak to Graham about whether I can easily integrate this data into his code, if so – do it, otherwise think about implementing it seperately(robustly!)

Also planned to meet Paddy again next week to go over these things, and try to hammer out a better plan. Then meet with these people again in three weeks to show where I have go to.

Davide also talked about churn in proximity patterns – might be worth thinking about what this means (example was then a person regularly sees other people, and after a while, one of those people drops off the radar – what does this mean)

Paddy said that in his mind, the long goal is to be able to forward plan using the knowledge of data that has passed (prediction).

Discussion with Davide about plots etc 4th Feb 2010

February 4th, 2010 No comments

Three types of data analysis:

General Mobility Analysis

We calculate the distance between locations at the start of every time period, (e.g. 1 hour) and plot the number of time that particular distance is travelled (to some granularity) over some time period (1 week maybe)

Periodicity Hunting

We measure the time spent at a location, and count the number of times in a bounded time period (say a week), using the same timescale as above to bracket readings.

(people visit common locations frequenly, or the visit some locations for a long period of time. – also think about the case that lots of people visit a common location infrequently/frequently).

Statial Dependance of the Degree

We count the number of devices seen in a given time period (same as above – e.g. 1 hour) and the location

Buddy Discovery

We count the duration of the contacts between pairs (the user and the devices he can see) and also the location of the contacts, and try to see which devices are seen most often, and then try to see which devices are seen at multiple locations. (using the same time period as above – 1 hour slots over a week)

Categories: discussions, Ideas, projects

Discussion with Paddy and Davide 2nd Feb 2010

February 2nd, 2010 No comments

Met with Paddy and Davide and discussed what we have been doing.

  • Actions from last meeting:
  • Said that I had been collecting data which seems to have good location information.
  • Had spoken with prag etc. but not really very useful
  • Davide has come up with some great questions for analysis of data
  • The only thing I hadn’t done was arrange a presentation for findings so far.

Paddy was happy with the progress so far, and after we discussed a number of things, we came to the following action points:

  1. Do a quick and dirty analysis of data
    1. Mobility analysis
    2. Periodicity
    3. Buddys
    4. Spatial degree
    5. Situation detection e.g. what does periodiciy mean?
  2. This is so that we can ask:
    • Do we have the data we need already?
    • What are the limitations of the data?
    • Are there other questions we need to ask?
  3. Plan a presentation for next wednesday morning (more of a brainstorm) to develop the ideas further, and really try to hammer down the larger plan

Paddy also suggested that we think about putting a paper into ubicomp (deadline 13th March) about our analysis of this data, but put a spin on it, e.g. what does periodicity mean? Can we predict events based on this? – Can we infer some useful context, based simply on the structure of the data, without the need for advanced techniques ( – i call this Urban Guerilla Sensing).

We suggested that we might be able to do two applications based on one of buddy finding analysis part (see mobile_agents and PhD the Story) the first, Paddy dubbed F3 (Facebook Friend Finder) where we encourage people to collect data for us, in return for detecting the presence of other facebook users, and suggesting friends based on frequency of co-location. The second was a similar application, but for regular visitors to research seminars.

I mentioned my vision on the next three points of reference, the first being a paper about the collection and analysis of this dataset, the second being another work which tied this into an simulator for the dataset, which synthesises this data in to a generic set, which can be used to test MANETs etc. The final thing (I didn’t get this far) being the final writeup of my PhD which brings all of these ideas together.

Paddy likes this, and suggested the idea of Pattern Language (used to desrcribe patterns in software engineering) which had recently been applied to Ubicomp environments to describe patterns in situations, Paddy thought that this might be particularly relevent to this, and that he would like to see some language of description emerge from our analysis. This sounds like a great idea. 🙂

Finally, Paddy spoke anbo

Meeting with Davide 14 Dec 2009

December 14th, 2009 No comments

Had a meeting with Davide to discuss current research,

we talked about what I had discussed with Paddy, and Davide seemed to think this is effectively the same as the E-DTN position paper, and that we should pursue these ideas further.

We decided that in lieu of getting access to largert datasets, we should look at the ones we have allready.  I suggested that we use the Tom dataset that was collected over a week or so when Tom used the N95 I programmed – as a start this might help us to see what the data looks like. He suggested the following tasks:

  • Contact Eiko Yoneki to see if she has some datasets that we could use
    • UPDATE: She didn’t have any datasets and she is eager to get some herself…
  • Look at Grahams simulator, understand it, and extend it to include the ability for location data to be incorporated
  • Generate two graphs from the Tom data (which he called the TomStalker dataset):
    • Latitude vs Longitude vs Number devices seen (3d)
    • Frequency of device spotting vs time
  • Look into taking a statistics course at ucd, to learn the techniques of statisical analysis
  • Include him in correspondance about this

Meeting with Prag 9 Dec 2009

December 10th, 2009 No comments

Had a brief meeting with Prag Sharma, who described to me the sort of things that the clique group were doing.

I explained to him that I was interested in ways of analysing social networks in terms of movement patterns. He mentioned a few datasets that clique has some access to: Conrad and Fergul have access handset data from 6 million Nodes from a telephone network – IDIRO, but he did not know what the data included. Another dataset was NORON data, which is to do with financial fraud and included banking transaction data.

He suggested that a good person to talk to was Derek Green, who he thought was doing  similar work.

We saw Derek in is cube, and it seems he is looking more at social clustering, but we thought there might be some interesting overlap, so I will send him my position paper, and he will send me his recent presentation.

Prag suggested I check out the clique website.

Meeting with Paddy 7th Dec 2009

December 8th, 2009 No comments

Had a meeting with Paddy (Audio here)

Actions:

  • project/ experiment -find some location data extract and parse it,
    • Have to have a set of questions we are going to ask of the data
    • Looking for structures in the data – what are they?
  • extract soem temporal and strcutural patterns
  • Speak to prag (network clique) to find techniques on how to extract patterns from these networks.
  • Email Simon, Aaron and Adrian, to see if they know about, or have dealt with any large datasets which include accurate location information.
  • In 6 weeks present to a group of people about initial findings, and ideas for further research. but beforehand, keep in sync with Paddy, and make sure I don’t go off track. Keep a record in a wave about progress.
  • Also, investigate movement patterns.

Justifications for this:

Burst Situations – emergency situations – however movement patterns are different in this case.

Focus in on simple pragmatics – I want to be able to send free messages.

Pick social everyday applications that justify it – (parasitic networking?).

Need to focus on detailed stuff – get numbers,

Focus on movement patterns – what recurring patterns are in there – need to data mine it all.

How do we collect data, sufficiently large, and in constrained sets.

Padraig Cunningham group doing network analysis – We can use Paddy’s Social network to get help with things like statistics etc.

Extracitng movement patterns from everyday life – augemtnting with context information.

Set up experments

do blind first  – capture everything from individula, apply analysis just to movement informations plus simulation, then you take in to accoutn context information and see if it improves anything. Theres nothing to say that context is useful all the time. Perhaps we find it reduces the number of hops, this reduces failure rate.

Chapter 1 is the broad view – DTNs Human Movent – underlying model of data transmission. my hypothersis is that human moevemtn patterns will provide an underlying model for transmission of data on ad-hoc networks.

Next actions:

Think up and construct an experiment and do it – short term 3-4 week expt whewre we actually get somehting out of it – needs to be a small constrianed expriment, that takes into account movement and context.

Have to have a set of questions we are going to ask of the data:

Looking for structures in th data what are they?

couple of conversations: Prag Sharma – good talking point – i’ve got this data, I want to use thi

try to characterise the patterns – they might be structural patterns, or temporal –