Experiment Logs

October 26th, 2011 No comments

I have been filling my brain up with experiments recently, and it occurred to me, that very often I lose track of what I have been doing and why I have been doing it. So I have started to log what I am doing on the Experiment Logs page.

The idea is for every chunk of experiment, write down what the goals are, what the setup was, and what the result was. It is in a fairly loose structure, and so pertinent everyday stuff will probably appear there too.

Categories: Uncategorized

Weird Simulator Bug… :(

October 26th, 2011 No comments

Anyway the problem is a strange one. I have specified a dataset (based on the Studivz German university social network) – I have selected a time period in the dataset config, and I have called this dataset stidivz-3month-5_2 which refers to the period of the dataset I’m interest in (a 3 month chunk), and the group of nodes out of the whole lot (the dataset is huge – 28k nodes), the simulator doesn’t cope well, so I picked 5 seed nodes, and discovered the network 2 hops deep, and used all of those nodes as the set of nodes to use. In the config I have done this by specifying the nodes explicitly in the node-list property:
<property name=”node-list” value=”5967,2117,780,1828,2274, …. ” />

The problems happens with community ranking, when the simulator runs the HuiRanking over the set of nodes in a community, for this dataset only, it seems to include all of the nodes in the list (as with global.dat), rather than just the nodes for that community.

However, the community file it uses lists the correct nodes (i.e. different (numbers of) nodes in each community).

However, the weirdest thing is that it ONLY does it for this particular dataset, not for any others (e.g MIT, Enron, Social-Sensing, InfoCom etc.). None of the source code has changed, just the config files, but I can’t work out what has gone wrong.
I added some debugging code to /LocationSim/src/ie/ucd/argfrot/simulate/bubblerap/LocalHuiResultTask.java which prints out the size of the Simulation context variable (as passed to LocalHuiResultTask), and for this particular dataset, it confirms that it contains all of the nodes  (specified in the config). It appears that instead of the Simulation object having only the nodes for that community, it is including all of the nodes by mistake.

What I don’t understand is why it works for other datasets (with a smaller number of nodes) but for this particular dataset, is does not… I can only think that it’s something to do with my configuration. The config files I have tested with are:
xml/UNIFIED_EXTENDED/InfoMap-make-communities.xml (which uses xml/UNIFIED_EXTENDED/datasets/All-Datasets.xml)
xml/UNIFIED_EXTENDED/InfoMap-centrality.xml (which uses xml/UNIFIED_EXTENDED/datasets/All-Datasets.xml and xml/UNIFIED_EXTENDED/datasets/All-Communities.xml)

commands I was running

java -jar dtnsim.jar xml/UNIFIED_EXTENDED/InfoMap-make-communities.xml 1 DATASETS=studivz-3month-5_2 PARAM_SET=default EXPERIMENT_GROUP=BUGFINDING
followed by
java -jar dtnsim.jar xml/UNIFIED_EXTENDED/InfoMap-centrality.xml 1 DATASETS=studivz-3month-5_2 PARAM_SET=default EXPERIMENT_GROUP=BUGFINDING

generates a community file in
datasets/communities/BUGFINDING/InfoMap/studivz-3month-5_2/global-parent
and
datasets/communities/BUGFINDING/InfoMap/studivz-3month-5_2/no-global-parent

(e.g. file is named: edge_list.dat.communities.dat)

My next thought is to try to see if there is an issue with carriage returns in the communites.dat files since I moved it to the new SVN – but this seems unlikely…

Update

Fixed… a very simple ommision in the config files for community dataset loading:

Categories: Uncategorized

Next

October 13th, 2011 No comments

Next plan is to:

  • do a thorough search for paper targets
  • use a section of the dateset (1 to 3 months)
  • incorporate moses and see what happens
  • train the CFA on the first month
  • run algorithms on the last part
  • run multiple random sub-graphs and take the averages etc.

Future

  • aim to have this done by november,
  • then get paper sorted ready for december
  • then work on next section involving Vector Clocks for estimating network properties used for routing
  • then finish the frikking thesis

Steps

Also – explore studivz dataset with KCLIQUE

  • use Mean, Median and 80th Percentile – finish KCLIQUE Studivz 4 2 0 0
  1. incorporate Moses algorithm
    • visualise?
    • check bubbleH moses studivz 4 2 0 0
  2. Pick a period of activity in the dataset
    • test runs
    • 1st Oct 2006 to 1st Feb 2007?
    • pick a new set of sub graphs based on this period?
  3. generate a graph based on the the first 1/3rd
  4. run the algorithm from start to finish (but start the flooding 1/3rd in
  5. run multiple times with different random node configs (e.g. 4,2,0,0 x 10) and get the average results of all
Categories: To Do

Studivz

October 6th, 2011 No comments

I took some time to explore the Studivz wall posts dataset, to see whether it would be useful to use.

The first step was to extract a sub-set of the data, as the entire dataset is a little large to run in LocationSim (some of the CFAs can’t quite handle 28k nodes), scaling it up to this many nodes is a work in progress (it might involve writing something more lightweight to do some of the raw processing).

The approach I have taken so far, is to pick N nodes randomly, and included their immediate neighbours. I do the L more times to get the nodes a depth of L hops away from the source.  Using 10 random nodes, with a depth of 2 yields a network of around 3049 nodes (~10% of all nodes).  When reduced to 5 seed nodes, we get ~1000 nodes (~4%).   Going the other way, 100 seed nodes, with a depth of 1 gives 14571 nodes covering ~50% of the network. These figures change depending on which nodes are selected at random initially. Two other paramters affect the results of this, the first is a threshold, where nodes with a connnected time less than this are not included, the second is the value used to seed the random number generator (if 0, then automatically choose a seed).

In the end I settled on three parameters in the table below – note that the number of nodes in the final set is highly subjective to the initially chosen nodes, so this is very random.

Studivz Random Node Choices

N L # Nodes
3 2 213
4 2 914
10 2 3049


Interestingly, despite the source or seed nodes being picked at random, the entire graph is connected in all configurations, the graphic below shows the connected_time graph and InfoMap clusterings for N=3, L=2.

InfoMap clustering of Studivz dataset, where N=3 and L=2

InfoMap clustering of Studivz dataset, where N=3 and L=2

This is a promising start, since there are distinct clusters of nodes, which we expected, as this is the concatenation of three egocentric networks, but also there are connections between each egocentric network, meaning there is a route to every other node. However, we can’t tell from this graph how often these contacts occur.

Looking at the whole dataset, we can get an idea about how active it is over time by measuring the number of connections in a given time period, below show the number of weekly connections for the entire dataset.

Weekly number of connections in the Studivz dataset

Weekly number of connections in the Studivz dataset

It shows that this social network seems to have become increasingly popular  over time, with a peak of just over 10,000 wall posts made in Jan 2007. If we were to pick a period to concentrate on, it should probably be from October 2006 onwards.

Studivz N=3, L=2

Initial results for each metric are shown below:

Delivery Ratio for BubbleH vs BubbleRAP for Studivz 3 2 0 0

Delivery Ratio for BubbleH vs BubbleRAP for Studivz 3 2 0 0

Cost for BubbleH vs BubbleRAP for Studivz 3 2 0 0

Cost for BubbleH vs BubbleRAP for Studivz 3 2 0 0

Latency for BubbleH vs BubbleRAP for Studivz 3 2 0 0

Latency for BubbleH vs BubbleRAP for Studivz 3 2 0 0

Delivery ratio is very poor for all runs, to see what the maximum possible delivery ratio is, we can look at the results for flooding the network below:

Delivery Ratio plot of Unlimited Flood on Studivz 3 2 0 0

Delivery Ratio plot of Unlimited Flood on Studivz 3 2 0 0

This achieves a delivery ratio of roughly 65 percent, so we have a bit of work to do to be able to match this!

Studivz 4 2 0 0

When we add another nodes to the initial seed set, we get a step up in the total number of nodes, 914 to be exact, this is currently running through the simulator.

Studivz 4 2 0 0

Studivz 4 2 0 0

UPDATE:

Below is the weekly activity during the set using 914 nodes (4,2,0,0)

Weekly activity in STUDIVZ 4,2,0,0

Weekly activity in STUDIVZ 4,2,0,0

The results on the larger dataset are shown below, these runs were taking considerably longer, and highlighted a couple of minor bugs in the simulator (not closing files properly! which means that file not found, too many open files messages kept occurring).

Delivery Ratio, Cost, Latency, Average Delivered Hops and Average Undelivered Hops for STUDIVZ with 4 seed nodes and a depth of 2.

Delivery Ratio, Cost, Latency, Average Delivered Hops and Average Undelivered Hops for STUDIVZ with 4 seed nodes and a depth of 2.

We see here that BubbleH is doing well in terms of delivery ratio compared to bubbleRAP , link clustering, which created a huge number of communities does particularly well  (at ~3o% for BubbleRAP and BubbleH), this adds weight to the idea that a large number of communities does well, and in fact, (in this case only, where there is only on set of parameters) we see that the Average cost is roughly the same as with the other CFAs.  BubbleH also performs well in terms of cost.  Latency very high for all CFAs as the dataset is very long.

Unlimited Flood and Prophet on STUDIVZ 4 2 0 0

Unlimited Flood and Prophet on STUDIVZ 4 2 0 0

However, we see from the Unlimited flood run, that we have a way to go to match the best possible delivery ratio, at around 90% delivery ration, it beats BubbleH hands down. Some consolation though, the advanced Prophet algorithm also only gets around 52% delivery ratio.

Categories: Datasets, experiments

Update 27 Sep 2011

September 27th, 2011 No comments

In our last meeting I took down these actions:

  • Ressurect NGMAST paper and use Enron + MIT-NOV – argue that MIT is not big enough to make assumptions

I have started another paper in the Clique SVN here: https://erdos.ucd.ie/svn/clique/papers/BubbleH – at the moment its just the same as the NGMAST paper.

  • Write Thesis outline and get started on background chapters – message dissemination, and social networks side (from milgram onwards etc)

I created a thesis document in the repository too: https://erdos.ucd.ie/svn/clique/papers/MattStabelerThesis – I have started to compile some notes into background chapters, based on my transfer report. I have also started a rough outline of chapters. The latest version in PDF is here.

  • Speak to Conrad about finding a non-overlapping Hierarchical CFA

Since the feedback in CLIQUE mini-workshop, I asked Fergal, Conrad and Aaron again about community finding algoriths. They mentioned the Blondel one and the InfoMap one again.

  • Get a decent sub-set of STUDIVZ dataset

To get a sub-set of nodes in the STUDIVZ dataset I started by picking N nodes randomly, and included their immediate neighbours. I do the L more times to get the nodes a depth of L hops away from the source.  Using 10 random nodes, with a depth of 2 yields a network of around 3500 nodes (12% of all nodes).  When reduced to 5 seed nodes, we get ~1000 nodes (~4%). Going the other way, 100 seed nodes, with a depth of 1 gives 14571 nodes covering ~50% of the network. These figures change depending on which nodes are selected at random initially.

Currently, i’m testing the setup with 5 seed nodes and 2 levels of network, with the hope that there will be some overlap.

Conrad suggests that we take them non-randomly – by first reducing our set of nodes to those with high activity (either number of posts, or total length of posts), then using the network L hops from the remaining nodes.

Enron Dataset Analysis

September 2nd, 2011 No comments

Comparing BubbleH and Bubble RAP for the Enron dataset produced the following results. The plot shows the results for the best run of each parameter, best is determined, in this case, by delivery ratio. i.e. I selected the run with the highest delivery ratio for each CFA and each routing type, and used these as the basis for the plot. This was done automatically, and this initial version does not take into account secondary ordering – meaning that in a run with two identical best delivery ratios, the first to be encountered is picked, ignoring secondary data such as cost or latentcy.

Delivery Ratio, Cost, Latency and Delivered Hops for InfoMap, HGCE, LinkClustering and KCLIQUE,  for both BubbleRAP and BubbleH

Delivery Ratio, Cost, Latency and Delivered Hops for InfoMap, HGCE, LinkClustering and KCLIQUE, for both BubbleRAP and BubbleH

WE can see that in terms of Delivery Ratio, BubbleH outperforms BubbleRAP when there is overlap (LinkClustering, HGCE) and when there is Hierarchical Data, it performs better (HGCE). When there is little or no overlap, BubbleH and BubbleRAP perform identically, as we know/expect. I wonder if we can explicitly test the effect of Hierarchy  by finding an algorithm that partitions into hierarchy (i.e. without overlap)?

CFA Types

  Hierarchical Overlapping
KCLIQUE No Yes
HGCE Yes Yes
InfoMap No No
LinkClustering No Yes
???? Yes No

Next to do: Test Hypothesis – Does HGCE deep Hierarchy beat HGCE flat HEIRARCHY.

Enron, HGCE, BubbleH Rankings - (Exp. Group: DATASET_EXPLORE2)

M Delivery Ratop Cost Latency Delivered Hops Depth Width
0.0000001 0.761918726 4.964306661 16372578807 5.097707153 3 87
0.0000002 0.733766234 4.909635526 16340070463 4.980074222 3 105
0.0000003 0.743108504 4.878466695 16159078259 5.010429586 2 98
0.0000004 0.66686217 4.599790532 15721736431 4.858462118 3 65
0.0000005 0.762253875 4.944449099 16483522910 5.035449299 3 87
0.0000006 0.755215752 4.998659405 16253907145 5.17961946 4 80
0.0000007 0.73150398 4.928529535 15895770224 5.107038543 3 91
0.0000008 0.750775031 4.871638039 16568740115 4.992132135 3 70
0.0000009 0.699329703 4.749811479 16050835473 4.977056251 3 95
0.000001 0.762337662 4.9228739 15679054807 5.05423971 3 88
0.000002 0.732258065 4.97666527 16414982185 5.081354769 3 74
0.000003 0.694888982 4.745789694 16134434295 4.914812805 3 90
0.000004 0.708127357 4.881650607 15674372660 5.067443649 2 84
0.000005 0.729032258 4.937704231 16027611756 5.112458338 5 61
0.00001 0.728320067 4.768831169 16197342711 4.934943917 3 76
0 0.706200251 4.959405111 15616885803 5.189772795 3 55

The table above shows the statistics for Enron, HGCE, BubbleH,  for DATASET_EXPLORE2 which is a new run using the original, simple datasets (without multiple runs over concatenated datasets). it still needs depth,width data adding.

MIT-NOV Hierarchy Analysis

September 1st, 2011 No comments

I re-ran the simulations for a simplified set of paramets for MIT-NOV and Cambridge, to make better sense of the data, I ranked each parameter to HGCE (which correspond to different hierarchichal structures) for each metric – see table below:

MIT-NOV, HGCE, BubbleH Rankings - (Exp. Group: DATASET_EXPLORE)

HGCE, M Paramater Del.Ratio Cost Hops Latency Depth(not ranked) Width (not ranked) Score?
0.001 8 5 7 5 3 15  
0.002 2 7 6 14 5 11  
0.003 13 3 4 4 3 13  
0.004 11 4 3 3 4 20  
0.005 1 2 2 12 2 17  
0.006 3 9 8 10 5 10  
0.007 6 12 12 9 5 14  
0.008 9 13 13 5 4 11  
0.009 4 11 10 8 5 14  
0.01 7 14 14 13 4 15  
0.02 14 7 9 1 3 16  
0.0 10 9 11 2 3 16  
0.1 5 1 1 11 5 16  
0.2 12 5 5 7 3 11  

This table shows the relative rankings for Delivery Ratio, Cost, Hops and Latency for each parameter of M to HGCE (values were derived from the optimum parameters to KCLIQUE based on structure of its output communities, the same parameters were used for each CFA). (Table columns can be sorted by clicking on column headings).

Below is an image of the associated Community Hierarchies

MIT-NOV, HGCE Communities for different threshold values (parameter M to HGCE)

MIT-NOV, HGCE Communities for different threshold values (parameter M to HGCE)

Show below are the four metrics in barchart form, for comparison of actual values:

Delivery Ratio vs Cost, and Hops vs Latency for MIT-NOV, HGCE, BubbleH

Delivery Ratio vs Cost, and Hops vs Latency for MIT-NOV, HGCE, BubbleH

From the above we can see that generally, delivery ratio improves with more depth to the hierarchy, however, in this instance a shallow, broad structure does best overall. When ranking by latency, it appears that broad shallow structures perform best.

thought: should we run HGCE multiple times on the same data to see what the range of different structures it comes up with are? Also, we should get another hierarchical CFA (e.g. the other version of link clustering): Did this, and will post results soon.

thought: is there a way of scoring heirarchy based on depth, width and population of communities?

post holiday meeting

August 25th, 2011 No comments

Main objectives:

Test the hypothesis that deep heirarchical structure for BubbleH performs better than flat structure. (and compare to BUBBLE Rap)

Simplify the parameterisation of each community finding algorithm (I have already reduced this down to a range of threshold values used in each of the CFAs)

Work on analysing the Enron dataset as above

Try to incorporate the dataset from the paper (On the dynamics of human proximity for data diffusion in ad-hoc networks) – chase up email if no response in a few days

Update Pádraig with results

Categories: Uncategorized

pre-holiday update

July 26th, 2011 No comments

Met with Pádraig to catch up with where we are an what to be getting on with. Talked about previous post.

Said that I am going though MIT-NOV and Cambridge, picking out a (range of?) best parameters for each CFA.

Need to pick best hierarchy based on outputs, we want to be able to compare flat structures to deep hierarchical structures, to see if  hierarchical clustering really does improve results. or is it just overlap?

We need to finish testing/exploring Enron and Padraig suggests we cluster the Studivz  and pick out a sub-tree to make a manageable number of nodes, and perhaps picking different sub-trees to make different datasets.

Plan would be to submit the enron analysis to one of the NIPS workshops (Christmas in Spain), to get some feedback, might give us a line on the thesis, and write up the thesis in the new year.

mentioned the ranking of community structures again

We will meet again when I get back to talk about it in more detail.

Categories: Uncategorized

Odd results – an update

July 13th, 2011 No comments

Just before Pádraig went away, we realised that the results we were getting for Bubble and BubbleH were exactly the same in some cases. We also wanted to see whether hierarchy was making any difference to results, and so we set these goals for the period whilst Pádraig was away:

  • work out why bubbleH and bubble are the same in some cases
  • get enron and wall posts dataset
  • pick the most hierarchical looking clustering  in these
  • see whether BubbleH works better than BubbleRap
  • (concentrate on HGCE and KCLIQUE)

The first question: Is Bubble (no global parent) always the same result as BubbleH (global parent)?

  InfoMap KCLIQUE HGCE LinkClustering
MIT-NOV Yes No No No
CAMB Yes Yes Yes No
IC05 Yes Yes No No
IC06 Yes Yes No No
SS Yes Yes No No *
Enron Yes No No No
Studivz ? ? ? ?

* in the case of Social Sensing Study LinkClustering all are same for Delivery Ratio, apart from where the threshold is 0.0. (with Social Sensing we used multiple tuning values, hence multiple results, the others used only one set of parameters)

Answer: Not always.

Is the answer overlap?
I think that these results are down to the structure of the communities. InfoMap ALWAYS produces the same results for BubbleH and BubbleRAP. I wonder if this is this down to the fact that InfoMap partitions the network, and therefore there are no overlaps? Could it be that for the most part, KCLIQUE creates non-overlapping communities and hence the results? HGCE creates a large number of highly overlapping communities, which are also hierarchical. LinkClustering also creates a large number of communities and whilst edges cannot overlap, nodes can belong to multiple communities. Or is it really that the inherent Hierarchy in community structure is causing results to differ?

The question is also, why do BubbleH and BubbleRAP get EXACTLY the same results if there is no overlap? Well, this is because in the end, there is no difference between them when there is no complicated structure to the network. Even if BubbleH is using the global parent community, that is EXACTLY the same as using the Global Ranking in BubbleRAP, so when there is no overlap, each node belongs to EXACTLY one community, and has a local rank in both BubbleH and BubbleRap. The global parent in BubbleH is the same as using the global rank in BubbleRAP. In fact, we could re-write BubbleH to incorporate this explicitly, and do away with a forced global parent, but this is just implementation detail, the end results will be the same.

Second Part: get enron and wall posts dataset

I used Conrad’s version of the Enron Dataset, as he had taken the time to remove irregularities, and in case of future papers, he would have an in depth knowledge of how he processed the data, saving me a lot of time!

The connected time graph is below, showing a decent number of clusters, hopefully with some nice hierarchy!

Connected Time Graph of the Enron dataset.

Connected Time Graph of the Enron dataset.

I explored this dataset in the same way as the previous ones, by experimenting with settings for the different algorithms, InfoMap is easy to visualise, and so below is the InfoMap clustering of the dataset:

Connected Time Graph for Enron Dataset, coloured with InfoMap clustering

Connected Time Graph for Enron Dataset, coloured with InfoMap clustering

I also used Conrads Studivz wall post dataset, (see here), this dataset is huge, and so I haven’t worked out how to run the full set of simulations. I was able to create a connected time edgelist (connected time is based on wall post length, 1000ms per character). Below is the graph of all connections, with node size and colour related to degree, edges removed for clarity.

network_degree_colournetwork_degree

network_degree_colour_section

Studivz connected time graph close-up with edges

Enron Dataset

In order to get any clusters at all out of KCLIQUE and LinkClustering, I had to plug in some very small threshold values, this is probably due to the way in which I created the events (An event occurs when an email is sent, the duration of every event is set to 2 seconds), so for the most part, nodes are not connected, and therefore there are small overall connected times. (Conrad’s dataset did not include the content of the messages, so I was not able to give contact events any length based on message size). To simplify things,  I used the interesting parameters from KCLIQUE and LinkClustering, to inform the parameters for HGCE, specifically the –minCliqueEdgeWeight parameter (more info about HGCE parameters), which excludes edges based on weight, effectively thresholding the graph edges as with KCLIQUE and LinkClustering.

To recap, the threshold means that (in the case of KCLIQUE and LinkClustering and now HGCE) edges are removed where the connected time between individuals is lower than the threshold.

Threhold parameters used for the Enron dataset:

0.0, 0.0000001, 0.0000002, 0.0000003, 0.0000004, 0.0000005, 0.0000006, 0.0000007, 0.0000008, 0.0000009, 0.000001, 0.000002, 0.000003, 0.000004, 0.000005, 0.00001

The plot below shows the results for Delivery Ratio for BubbleRAP using no global parent (NGP), and BubbleH using global parent (GP) (in future, we can safely ignore the global parents part, as BubbleRAP should always be run without a global parent and BubbleH with a global parent).

BubbleRAP vs BubbleH on the Enron Dataset, showing results for multiple parameters (of M). BubbelH beats BubbleRAP in all cases

BubbleRAP vs BubbleH on the Enron Dataset, showing results for multiple parameters (of M). BubbelH beats BubbleRAP in all cases

This is a good result, it shows that in this dataset, for these parameters, BubbleH beats BubbleRAP, but now we need to consider why. I had done an earlier exploration of the enron dataset with default values for M ( 0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.9), and so I looked back the results for that, and was surprised to see that BubbleRAP does much better in some cases. Below is a plot with the new data included (Solid red BubbleRAP line and blue dot dash BubbleH show this data).

BubbleRAP vs BubbleH for Enron Dataset, with extra (default) values for HGCE parameters (for M). Showing stronger results for BubbleRAP

BubbleRAP vs BubbleH for Enron Dataset, with extra (default) values for HGCE parameters (for M). Showing stronger results for BubbleRAP

So, BubbleRAP does better in some situations, (PS. NEED TO CHECK THIS AGAIN TO MAKE SURE BUBBLEH IS PROPERLY PLOTTED).

I started to look at then next step, hoping it will give some answers: pick the most hierarchical looking clustering  in [datasets].

I picked the best results for BubbleH, and mapped the communities to a hierarchical tree structure, shown below:

Community structure for nest run of BubbleH on Enron dataset where M = 0.000001.

Community structure for best run of BubbleH on Enron dataset where M = 0.000001

So far so good it seems to have a broad structure with some specific hierarchical clusters, I also mapped the worst run:

Community structure for best run of BubbleH on Enron dataset where M = 0.0000007

Community structure for best run of BubbleH on Enron dataset where M = 0.0000007

This too has some good structure to it, note however, that this is a plot of the  worst performing run, in a set of best parameters, (we didn’t run a whole range of values – so, this is worst of the best)

The next to show is the best BubbleRAP run, as below:

0.0000005

Community structure for best run of BubbleRAP on Enron dataset where M = 0.0000005

Interestingly, this has broad set of high level communities (as with the previous plots, but they had a global parent), but less broadness lower in the hierarchy (i.e. fewer lower communities).

TODO: plot the worst run of BubbleRAP

UPDATE: I found a better way to show the Dendograms of the Community hierarchy, and have automated the process. This page shows all of the plots for Enron without global parent and this page WITH global parent. To get a better idea of the relationship between hierarchical structure and results, I need to combine the results from runs, with the structure in the same place – so we can compare side by side what happens when there is different types of community.