Home > Datasets, projects, what i've been doing > Data Analysis – Co-location using WiFi access points

## Data Analysis – Co-location using WiFi access points

In the last post, I described a method for deciding on co-location based on spotting of common WiFi points, which co-located users if they saw the same WiFi access point, at roughly the same time. We refined this method to discriminate when the approximate distance, based on signal reading (which could be signal strength, or signal quality). Pseudo code for this algorithm:

• For each consecutive timestep window, between start and end.
• Retrieve all readings within that window for all users.
• Foreach user
• pick the strongest signal from the list of readings for each access point (as there may be multiple)
• rank reading in order if signal strength
• Signal Strength divided by Strongest Signal Strength is less than Alpha
• Foreach other user
• perform signal processing as above
• if the intersection of the remaining list of access points for each user is not null, consider the two to be co-located

The variables in this algorithm are the start and end time, the size of the window and the alpha value.

There were a few factors to consider before we could implement this, firstly, it was important to find out what the values for signal strength actually meant;  the data ranged between values of 11 to 110.  I used an N95 phone, with the Campaignr software installed, to record a number of reading at varying distances from known access points. I instructed the software to upload to a script that simply recorded the data in a text file.  This showed that the ‘signal_strength’ value increases as the phone moves away from the access point. Another factor to consider, was the range of values that were reported in the database. Davide suggested plotting the values of all signal strengths for all readings, over the the readings that would be selected using the alpha value selection. The figure below shows the data based on an alpha value of 0, for all readings,  0.8, and 1.0 (for only the strongest readings).

The second figure shows all readings, split into individuals.

The next step is to decide what alpha value to use, and store the co-locations into the database. To this end, I have written a script to parse all 3.6 million records, calculate the alpha value (Strongest signal strength over signal strength) and write the value back into the database. This will allow us to quickly see the effect of using different alpha values for decision making, without having to repeat the calculations every time.  It will also allow us to plot the effect of different alpha values against each other very quickly.

Categories: