NodeXL Data Analysis Guide NodeXL Links NodeXL Publications
Overview
NodeXL is an Excel 2007 template for viewing and analyzing network graphs, along with a set of .NET Framework 3.5
class libraries that can be used to add network graphs to custom applications. NodeXL used to be called ".NetMap".
A network graph is a series of vertices (sometimes called nodes) connected by edges. See this
Wikipedia article for an overview of network graphs.
This is what NodeXL looks like:

A DRAFT slide deck providing an overview of NodeXL is now available
here. This document will be extended with a guide to the many core tasks and operations to perform on network data sets.
Here is a sample network graph created by NodeXL. It shows an individual's network of friends within Facebook.com:

Here are some other NodeXL graphs:

NodeXL was created by Marc Smith's team while he was at Microsoft Research. Smith is now at
Telligent Systems.
Contributors to NodeXL include:

Dan Fay (Microsoft Research Redmond)
Cody Dunne (University of Maryland)
Marc Smith (
Telligent Systems)
Vladimir Barash (Microsoft Research Silicon Valley/Cornell)
Tony Capone (Microsoft Research Redmond)
Natasa Milic-Frayling (Microsoft Research Cambridge)
Eduarda Mendes Rodrigues (Microsoft Research Cambridge)
Eric Gleave (University of Washington)
Adam Perer (University of Maryland)
Ben Shneiderman (University of Maryland)
NodeXL Excel 2007 Template
The easiest way to use NodeXL is to install the
NodeXL 2007 Template. With the template installed, you can enter network data into a template-based Excel 2007 workbook, then view the network graph within the workbook's window:

A simple two-column edge list is all that is required by the template, but a variety of optional columns can be used to customize the graph's appearance. These include edge color, width, and opacity; and vertex color, shape, radius, opacity, label, tooltip, and location. In this example, a few of the optional columns have been used to customize the previous graph:

Because you enter graph data in a familiar Excel workbook format, there is no need to learn a complex, arcane file format to display your graph. And because Excel 2007 is used as an application platform, the full power of Excel is available for filtering and computing vertex and edge data.
Layout Algorithms
By default, the
Fruchterman-Reingold layout algorithm is used to lay out the graph's vertices, but a variety of additional layout algorithms are provided as well. You can repeatedly lay out either all of the graph's vertices or just a selected subset.
Graph Metrics
A number of graph metrics can be computed and inserted into the Excel 2007 workbook on demand, including vertex degree, betweenness centrality, eigenvector centrality, closeness centrality, and clustering coefficient. The graph metric framework is extensible and other metrics will likely be added in future releases.
Data Import/Export
You can import graph data from delimited text files,
Pajek files, and other Excel workbooks. Selected subgraphs can be exported to other Excel workbooks.
Built-In Data Sources
If you use Outlook, Outlook Express, Windows Mail, or a similar email client, and if you have Windows Desktop Search installed on your computer (it comes with Vista and can be installed separately on Windows XP), you can use the Excel 2007 template's "Analyze Email Network" feature to graph the network of people you communicate with via email.
The "Analyze Twitter Network" feature displays a Twitter user's one- or two-degree social network, including the latest "tweet" from each person in the network.
Dynamic Filters
A set of sliding "range" controls let you filter the graph's vertices on the workbook's numeric and date/time columns. For example, you can easily hide all vertices with degree less than three, or all edges that have associated dates earlier than January 1, 2008.
Clustering
A graph's vertices can be automatically or manually grouped into clusters that are distinguished by color and shape.
Class Libraries
The Excel 2007 Template displays graphs using a custom Windows Presentation Foundation (WPF) control that can also be used in other applications. In fact, the template is just a Visual Studio Tools for Office 3.0 wrapper around a stack of reusable, prebuilt
class libraries.
The WPF control is one of several graph "visualizers" that are packaged in a pair of Visualization assemblies. There is also an Adapters assembly for reading and writing graph data in various formats, a SocialNetworkLib assembly for analyzing social networks, a Core assembly that implements the low-level vertex, edge, and graph classes, a Layouts assembly that lays out graphs using various layout algorithms, and an Algorithms assembly that calculates graph metrics.
The class libraries are documented in a help file created with NDoc. Search for the NodeXLApi.chm file in the
class libraries.
Important Note: You may see nothing but empty topics when you attempt to view the NodeXLApi.chm file. To fix this problem, which is due to a security restriction in Internet Explorer 7, right-click the chm file in Windows Explorer and click the "Unblock" button on the General tab.
NodeXL Data Analysis Task List: Steps for data import, scrub, analysis, and visualization
Most data analysis tasks with NodeXL will follow a common set of steps:
- Import
- Merge duplicate edges
- Calculate (ALL) network metrics
- Create clusters
- Insert sub-graph images
- Sort edge list in a way that usefully effects the order of layout in the graph display
- Auto-fill columns (and map data to display attributes): Set shape, color, opacity, size, and label/tooltip
- Show graph
- Read workbook
- Adjust layout
- Dynamic Filters – selectively hide edges and nodes
- Layout Again
- Return to spreadsheet to sort or calculate data
- Integrate additional edge lists
Did I miss any steps that you use? There are multiple analytic goals for network data sets that guide what happens next.
What are your goals for network analysis? Please share them in the
discussion board!