Calculating betweenness centrality using NodeXL


For my dissertation, a statistic I used to determine who was at the “center” of the network was the centrality calculation. This was not done by hand. I used software called NodeXL to do the downloading, calculating, and visualization of the network. The statistic ended up choosing to use is called the Betweenness Centrality, which measures the typical shortest path length between each and all of the vertices. A vertex (or node) is an account on Twitter. I won’t call it a person, because it is possible that a vertex is a company, an organization, or even a bot.

In my data set, I had 1,319 vertices and 13,524 edges. Not a giant data set compared to some, but still, it was large enough to require some computing power and time to run the visualizations. After calculating the statistics for the data, the vertex with the largest betweenness centrality had a score of 215,883.10*. My advisor cried foul. He did not understand how the software could assign a score of 215,000 to a vertex in a data set with only 1,319 vertices and only 13,524 edges. Honestly, I didn’t either, at first. So I went back to the books and really dove into the calculation to figure out how betweenness centrality was calculated. One book I used was Hennig et al. (2012) who had this nice image and a wonderful table next to it which gave the raw score. I dove into the method of calculation to figure out how they came up with this score.

It wasn’t hard, but it was slightly complex. The formula Hennig et al. uses for the betweenness centrality is: (# of shortest paths through a vertex)/(# of shortest paths). The denominator in the given graph is 8 because they didn’t count the double arrows as two paths. The numerator is a little more complicated.

First, list every connected dyad (pair of vertices) with the number of shortest path edges it takes to connect them. Then, pick a vertex you are interested in. For this purpose I am interested in vertex 2, since it has the largest value.  Finally, count how many edges go through the selected vertex, and divide by how many vertices are immediately connected to the selected vertex. 

1-2-3: 2

1-2-3-4: 3

1-2-5: 2

4-2-5: 2

There are some tricky ones. For example, 2 to 6 can occur two different ways, 2-3-6 and 2-5-6. Each is a value of 2, but there are 2 equal ways, so the value of 2 is divided by 2, so the counted value is one for each path. The same thing for 1 to 6, except the value is 1.5, counted twice.

In the end, this method counts 30 paths crossed through vertex 2, divided by 8, resulting in the value of 3.75.

This is not how NodeXL counts. The same graph in NodeXL yields a value of 5 for vertex 2’s betweenness centrality.  This confused me greatly. Clearly NodeXL is doing something differently than Hennig et al. was doing.

Vertex Degree Betweenness Centrality
vertex 1 1 0
vertex 2 4 5
vertex 3 3 0.666667
vertex 4 3 0.666667
vertex 5 2 0.666667
vertex 6 3 1

What I figured out was NodeXL counts only the paths through a vertex, and does not divide by the number of shortest paths. This is easily confirmed by taking the six vertex sample, and simplifying it down to the simplest type of network, and then rebuilding it.

In this network, vertex 2 is at the heart of the star, and the other nodes are arranged around it. If vertex 1 is the right most arm of the star, and we count counter-clockwise, the pairs of dyads connected through vertex 2 are: 1-3; 1-4; 1-5; 1-6; 3-4; 3-5; 3-6; 4-5; 4-6; and 5-6. 

Calculating the values on this arrangement yields the following values.

Vertex Degree Betweenness Centrality
vertex 1 1 0
vertex 2 5 10
vertex 3 1 0
vertex 4 1 0
vertex 5 1 0
vertex 6 1 0

NodeXL did not add any values for 2,1; 2-3; 2-4; 2-5 or 2-6. It only counts the paths THROUGH the vertex in question, not terminating at the vertex in question. This can be checked by removing paths one by one, and calculating the values. For example, if vertices 4 and 6 are connected and 3 moved inward to start matching Hennig et al.’s graph above, the following results.

Vertex Degree Betweenness Centrality
vertex 1 1 0
vertex 2 5 9
vertex 3 1 0
vertex 4 2 0
vertex 5 1 0
vertex 6 2 0

This is what would be expected if the 4-6 dyad was removed from the list above. Connecting additional dyads follows the same pattern. For example, connecting dyad 3-4 should result in multiple paths being counted between 3 and 6, so we should get a fractional count.

Vertex Degree Betweenness Centrality
vertex 1 1 0
vertex 2 5 7.5
vertex 3 2 0
vertex 4 3 0.5
vertex 5 1 0
vertex 6 2 0

The fractional count is confirmed. 3-6 has two equal shortest paths. One through 2 (3-2-6) and one through 4 (3-4-6). Since only 1 of the pairs goes through vertex 2, only ½ is counted towards vertex 2’s centrality, the other ½ is counted towards vertex 4’s centrality.

That the calculation yields large values quite easily can also be checked. If a vertex 7 is added to the graph, in a similar way as vertex 1 is, the betweenness centrality should double. 

Vertex Degree Betweenness Centrality
vertex 1 1 0
vertex 2 5 10
vertex 3 3 1
vertex 4 3 1
vertex 5 2 1
vertex 6 3 1
vertex 7 1 0

It does. At this point, I am comfortably saying that the formula for betweenness centrality used by NodeXL is the sum of all shortest paths between two vertices that goes through (but does not terminate at) a particular vertex.

This method of calculating centrality makes much more sense for large data sets found on social media, and Twitter. The method Hennig et al. used may work when the graphs are small or compact, but once hundreds of vertices are included in a data set, counting the shortest paths between all of them would be prohibitively intensive. In addition, while the value of the count changes, the order is similar. Hennig et al.’s calculation was directional, which increased the value of betweenness centrality for vertex 4.

Hennig, M., Brandes, U., Pfeffer, J., & Mergel, I. (2012). Studying social networks: A guide to empirical research. Frankfurt am Main: Campus-Verlag.

*Who was this person with the amazingly huge betweenness centrality score? More on that to come! I have to keep some secrets right for the moment.

, ,

2 responses to “Calculating betweenness centrality using NodeXL”

  1. Ben, I have the data to check that, but have not done the longitudinal comparisons. That is one of the things I plan on doing in the near future.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.