### Archive

Archive for February, 2012

## Updated Hierarchy Stats

It turns out that the two clumps for Studivz-three-B related to K=3 and K=4. So we ran more runs on Enron and Mit to see what happens, the following are only for K=3.

In the case of the MIT datasets, I included the 4 thresholds we have been using (20th,80th,mean and median thresholds)  and a range of 20 values from the lowest edge weight to the highest. I will also run this on the Enron Subsets.

MIT-OCT

MIT-NOV

MIT-ALL

Enron (all)

Categories:

## The effect of Hierarchy and Overlap on routing

Following on from the previous post, I implemented the algorithms that evaluate the structure of the hierarchy, and plotted this for all datasets against the main routing metrics.

However, this showed a confusing picture, so Pádraig suggested I run multiple parameters to HGCE over one dataset, and see if it made a difference. We chose the Studivz-three-B dataset, as it is large enough to give a nice general picture.

The parameters were as follows:

K=3,4

M (threshold) = 0,1e-05,2e-05,3e-05,4e-05,6e-05,6e-05,7e-05,8e-05,9e-05,6.83085891329e-05,2.04848750968e-05,8.6207182699e-05,5.97475523656e-06

the other parameters were kept at their default (see here)

Again, on first look, this seemed confusing, but by removing the two very low delivery ratio results (leaving 24 out of 26), the visualizations seemed to show a trend towards increased hierarchical structure meaning increased delivery ratio.

It is quite interesting to explore the data by filtering out different runs, e.g removing all values where K=3, or 4 makes a difference: Hierarchy Data Studivz-B

I suspect the two clumps in delivery ratio form either side of the point an important Threshold parameter.

Categories:

## Measuring Hierarchy

Pádraig and I discussed a mechanism for measuring hierarchy, to give a given structure a metric. Pádraig’s suggestion was to measure the average distance to the root, and I came up with a normalised version that should also take into account the relative size of a node (in this case a community), as follows:

The root is always assumed to be a node with a weight of exactly the total size. e.g. for community hierarchy, the size of the node represents the number of members of the community. The root node in this instance contains all members of of the network.

We measure the distance from the root to each community, and multiply the size of the community by the distance. We divide the sum of these values by the size of the root community. If the results is 1, then the community structure is shallow, if the result it less than 1, then the community structure does not contain all nodes, the further the result is upwards of 1 the more depth the structure has.

```\$depth_sum = 0; \$levels = array(); foreach(\$data as \$commid=&gt;\$commdata){ \$levels[\$commdata['level']] +=1; \$depth_sum += ((\$commdata['level'] + 1) * count(\$commdata['members'])); } \$depth_metric = \$depth_sum / count(\$nodes);```

It still needs something to balance for breadth, so,

At each level of the dendogram, we take the size of each node, multiply it by the number of nodes at this level, and divide by the size of the root node, then multiply by the depth. The sum of the results for each node divided by the size of the root node gives us the breadth metric.

```\$breadth_sum = 0; foreach(\$data as \$commid=&gt;\$commdata){ \$breadth_sum += ( (\$commdata['level']+1) * ( (\$levels[\$commdata['level']] * count(\$commdata['members'])) / count(\$nodes) ) ); } \$breadth_metric = \$breadth_sum / count(\$nodes);```

Average Overlap is simply the sum of the number of communities each node is a member of divided by the number of communities.

It will be interesting to calculate these metrics for each dataset and each threshold value, we can plot the delivery ratio (and other metrics)  in a scatter plot for breadth and depth, and see if there is a trend.

Categories: