Archive

Archive for the ‘Uncategorized’ Category

Experiment Logs

October 26th, 2011 No comments

I have been filling my brain up with experiments recently, and it occurred to me, that very often I lose track of what I have been doing and why I have been doing it. So I have started to log what I am doing on the Experiment Logs page.

The idea is for every chunk of experiment, write down what the goals are, what the setup was, and what the result was. It is in a fairly loose structure, and so pertinent everyday stuff will probably appear there too.

Categories: Uncategorized

Weird Simulator Bug… :(

October 26th, 2011 No comments

Anyway the problem is a strange one. I have specified a dataset (based on the Studivz German university social network) – I have selected a time period in the dataset config, and I have called this dataset stidivz-3month-5_2 which refers to the period of the dataset I’m interest in (a 3 month chunk), and the group of nodes out of the whole lot (the dataset is huge – 28k nodes), the simulator doesn’t cope well, so I picked 5 seed nodes, and discovered the network 2 hops deep, and used all of those nodes as the set of nodes to use. In the config I have done this by specifying the nodes explicitly in the node-list property:
<property name=”node-list” value=”5967,2117,780,1828,2274, …. ” />

The problems happens with community ranking, when the simulator runs the HuiRanking over the set of nodes in a community, for this dataset only, it seems to include all of the nodes in the list (as with global.dat), rather than just the nodes for that community.

However, the community file it uses lists the correct nodes (i.e. different (numbers of) nodes in each community).

However, the weirdest thing is that it ONLY does it for this particular dataset, not for any others (e.g MIT, Enron, Social-Sensing, InfoCom etc.). None of the source code has changed, just the config files, but I can’t work out what has gone wrong.
I added some debugging code to /LocationSim/src/ie/ucd/argfrot/simulate/bubblerap/LocalHuiResultTask.java which prints out the size of the Simulation context variable (as passed to LocalHuiResultTask), and for this particular dataset, it confirms that it contains all of the nodes  (specified in the config). It appears that instead of the Simulation object having only the nodes for that community, it is including all of the nodes by mistake.

What I don’t understand is why it works for other datasets (with a smaller number of nodes) but for this particular dataset, is does not… I can only think that it’s something to do with my configuration. The config files I have tested with are:
xml/UNIFIED_EXTENDED/InfoMap-make-communities.xml (which uses xml/UNIFIED_EXTENDED/datasets/All-Datasets.xml)
xml/UNIFIED_EXTENDED/InfoMap-centrality.xml (which uses xml/UNIFIED_EXTENDED/datasets/All-Datasets.xml and xml/UNIFIED_EXTENDED/datasets/All-Communities.xml)

commands I was running

java -jar dtnsim.jar xml/UNIFIED_EXTENDED/InfoMap-make-communities.xml 1 DATASETS=studivz-3month-5_2 PARAM_SET=default EXPERIMENT_GROUP=BUGFINDING
followed by
java -jar dtnsim.jar xml/UNIFIED_EXTENDED/InfoMap-centrality.xml 1 DATASETS=studivz-3month-5_2 PARAM_SET=default EXPERIMENT_GROUP=BUGFINDING

generates a community file in
datasets/communities/BUGFINDING/InfoMap/studivz-3month-5_2/global-parent
and
datasets/communities/BUGFINDING/InfoMap/studivz-3month-5_2/no-global-parent

(e.g. file is named: edge_list.dat.communities.dat)

My next thought is to try to see if there is an issue with carriage returns in the communites.dat files since I moved it to the new SVN – but this seems unlikely…

Update

Fixed… a very simple ommision in the config files for community dataset loading:

Categories: Uncategorized

post holiday meeting

August 25th, 2011 No comments

Main objectives:

Test the hypothesis that deep heirarchical structure for BubbleH performs better than flat structure. (and compare to BUBBLE Rap)

Simplify the parameterisation of each community finding algorithm (I have already reduced this down to a range of threshold values used in each of the CFAs)

Work on analysing the Enron dataset as above

Try to incorporate the dataset from the paper (On the dynamics of human proximity for data diffusion in ad-hoc networks) – chase up email if no response in a few days

Update Pádraig with results

Categories: Uncategorized

pre-holiday update

July 26th, 2011 No comments

Met with Pádraig to catch up with where we are an what to be getting on with. Talked about previous post.

Said that I am going though MIT-NOV and Cambridge, picking out a (range of?) best parameters for each CFA.

Need to pick best hierarchy based on outputs, we want to be able to compare flat structures to deep hierarchical structures, to see if  hierarchical clustering really does improve results. or is it just overlap?

We need to finish testing/exploring Enron and Padraig suggests we cluster the Studivz  and pick out a sub-tree to make a manageable number of nodes, and perhaps picking different sub-trees to make different datasets.

Plan would be to submit the enron analysis to one of the NIPS workshops (Christmas in Spain), to get some feedback, might give us a line on the thesis, and write up the thesis in the new year.

mentioned the ranking of community structures again

We will meet again when I get back to talk about it in more detail.

Categories: Uncategorized

The problem parents

June 22nd, 2011 No comments

After we submitted the paper to FindingNEMO2011, I started to work on the Social Sensing dataset, and Pádriag’s suggestion to create a Random community finding algorithm. Initial tests showed a peculiar phenomenon: All runs of BUBBLERap and for KCLIQUE with multiple parameters were coming up with exactly the same results. Now, this really shouldn’t happen, so I looked further.

Unfortunately, or fortunately, depending on how you look at it, it is not a bug in the code. It was because of a decision we made a long while back, to force each community finding algorithm to output a top level community including all nodes. This works well for BubbleH, which uses targeted community routing. But, and in hindsight, this is obvious, BUBBLERap simply looks as community membership, and so, when it meets another node, they are ALWAYS in the same community, and therefore it routes using ONLY local ranking. This has the effect of making our implementation simply centrality based routing.

I spent the last couple of days re-engineering the configuration, so that now, we can get results for communites with a global parent, and without a global parent. Shown below are the results from the FindingNEMO11 paper, and below that the updated results. Where we have used no-global-parent for BUBBLERap and a global parent for BubbleH.

Results used in FindingNEMO11 paper

Results used in FindingNEMO11 paper

Results for ALL CFAs and Datasets, using Global Parents only for BubbleH

Results for ALL CFAs and Datasets, using Global Parents only for BubbleH

The new results shows a slightly different picture, in that the improvement in delivery ratio and latency are less pronounced. One interesting thing is that for InfoMap, which partitions the graph, the results are exactly the same. This is because there is no ovelap, which both algorithms to work in the same way, i.e. BUBBLERap uses global centrality to route between nodes of different communities, and BubbleH uses the local centrality in the GLOBAL parent to route to nodes not in smaller communities. This means that BubbleH is actually designed solely for overlapping community structure. With regards to the Heirarchichal clustering of HGCE, we see that it performs poorly for Cambridge, but roughly equally for the others.  LinkClustering for BUBBLERap in InfoCom2005 and MIT-NOV performs very poorly on Delivered Hops. This could be down it creating a large number of communities.

ngpvsgp-bubble-mit-nov-linkclustering

Global versus No Global parent for LinkClustering BUBBLERap on MIT November,

It seems the difference is very pronounced in LinkClustering, but it is not a problem with the results. The differences between the results for HGCE is the result of randomness (pertubations) for each run.  This may result in withdrawing the paper from SMW11. Which is very unfortunate.

My next step is to finish getting the Social Sensing dataset prepared, (by testing muliple CFA parameters against it) and add it to the matrix plot. Then, I will finish doing the Random runs against all of the datasets, to see if the CFAs have any real effect on performance, or whether it is insignificant. (!)

Finally, if things go to plan, I will continue working on the twitter based dataset, and perhaps the new sensor dataset from St Andrews.

Categories: Uncategorized

Plotting all parameters to GCE-H in BubbleH simulation

March 16th, 2011 No comments

The following plot was generated from automated processes within LocationSim, using its poweful configuration options, it was possible to run all community ranking processes (to generate local/global ranks) and bubbleH simulations in around 20 minutes (after spending a number of hours setting up the configuration!).

GCEH all parameters, simulated for BubbleH

GCEH all parameters, simulated for BubbleH

During the run, the system used only 4 cores (out of 24) and 6.1GB of memory (out of 124GB) – and should not have affected other users considerably (hopefully!).

top at runtime

top command at runtime

The process is detailed on How to run LocationSim for multiple parameters of GCEH.

Categories: Uncategorized

Meeting with Pádraig 3 Feb 2011

February 7th, 2011 No comments

Discussed results so far – I feel that the dataset does not include enough data to give reliable information about location. We discussed other datasets, perhaps using cabspotting and geolife.

Padriag re-iterated his skepticism about using location; that all the information needed is in contact information, and therefore location data is redundant. we suggested the idea of testing LBR on the cabspotting dataset, over short periods of time, because cabs are highly mobile, they come into contact with each other more often.

Pádraig suggested that we could start looking more at communities, and continuing on from Bubble RAP, so we could then use synthetic networks. It would mean a bit of a change of track however.

I said I would try to get Bubble Rap running in the simulator, then try to run it using MOSES or GCE to determine the communities, rather than the default method. GCE has a heirarchical clustering mechanism, which would suit Bubble Rap.

I also said I would try to use the Social Sensing data to see how it performs.

Pádraig suggested again that we could follow on from what Pan Hui did, and testing highly overlapping communities, and use some of the groups community finding algorithms. This would fit better with the groups work, and therefore would be a bit easier to link in.

With Bubble added the results for MIT-OCT are show below – note that the communities used for this run, were pre-generated by Graham.  I haven’t yet  found out how he generated them.

Results with Bubble included

Results with Bubble included

Bubble routing seems to calculate Communities using k-clique clustering and weighted network analysis.

Last minute update: I generated data for use in the simulator based on MOSES communities (based on contacts between nodes). Instead of using Hui Betweenness for ranking, I simple used the number of communties a node is a member of, as the rank value, just for initial testing (of script output).  It yielded a better result than regular Bubble, with the caveat that the bubble communities are based on a start date 1 week into the period, and I need to double check that the MOSES one also does this.

moses-vs-hui-rank-with-caveats

Regular Bubble routing vs Bubble MOSES Communities with Community Rank

Categories: Uncategorized

Meeting

December 16th, 2010 No comments

Met with Pádraig and Davide, we need to get out of the data cleansing phase, and get into some more interesting areas!

I need to generate the linked cell towers graph based on contacts not just message passing. (as this was what should have been done in the first place!), and use the last known cell tower (within a time limit) as the current cell tower, if none has been seen. This will hopefully mean that we will have cell towers reported for all contacts, and with any luck there should be a larger proportion of contact events that report the same tower.

Using the graph, link the main connected towers into the ranking algorithm.

Categories: Uncategorized

LBR Location Analysis – MIT Dataset (Oct)

December 8th, 2010 No comments

I wanted to see what the real picture was when using cell towers for location, so for a run of LBR on the MIT Dataset for October, I collected details about every message hop. I compared the timestamps for the hops to the cell tower information in the database, to find out which cell tower each node involved in the message hop was recorded as using.

Colocated: (same cell) 2247

Not-Colocated: (different cells) 15688

One sided: (only one sees cell) 13452

No data: (no cells seen) 3211

Count: (total number of hops evaluated) 31387

These results are slightly concerning, as roughly half of the hops between nodes happened when nodes reported different locations (cell towers ID),  10% of hops did not record any cell tower information at all, and in 42% of hops, only one node reported a location. This seems to demonstrate that we cannot rely on cell tower information to give us an accurate idea of the co-location of people. However, it might still prove useful to use the cell tower location for generating general mobility statistics, which could be used as routing metrics. Davide mentioned he had some ideas about these sort of statistics.

Categories: Uncategorized

Community Clustering part 1

December 1st, 2010 No comments

I have visualised the graph of connections in the MIT Dataset, for the month of October split into 5 parts: Weeks 1 to 4, and the whole month. Each week represents a new set of edges, built after the end of the previous week. So each graph shows the edges that are formed only in that week.

mit-graph

The list of edges and matrix data is in the zip file here: mit-graph.

Categories: Uncategorized