We are launching Graph Viz 101, a series of posts to teach the basics of graph visualization, written by Sébastien Heymann in collaboration with Bénédicte Le Grand of Université de Paris 1. This is our first post, please discuss it below!
A graph (also called network) is made of a set of entities, called nodes, and a set of relationships between entities (also called edges or links). The way nodes are connected constitutes the topology of the network. Moreover, additional information can be added such as properties, which are key-value pairs associated to each node or relationship. For example, individuals of a social network may be characterized by properties like gender, language, and age.
The analysis of complex networks consists in (but is not limited to) diverse types of tasks, such as the understanding the statistical properties of their topologies, the identification of significant nodes, and the detection of anomalies. One of the biggest challenges encountered is to get a good intuition of the network under study. Even when information like node properties is available, extracting valuable knowledge and providing insights is challenging. Analysts may indeed deal with multiple dimensions made of (but not limited to) social, topical, geographical, and temporal data, which may also be aggregated at different levels of detail.
Faced with such diversity of data and the potentially unlimited number of analysis to perform at the first steps of a new project, analysts usually follow an exploratory approach to inspect data and outline interesting perspectives before drilling down to specific issues. When the datasets describe complex networks, this process is called Exploratory Network Analysis (ENA); it is based on data visualization and manipulation to analyze complex networks. This framework takes its roots in the more general framework of Exploratory Data Analysis (EDA), which consists in performing a preliminary analysis guided by visualization before proposing a model or doing a statistical analysis. Described by J. Tukey in the book “The future of data analysis” (1962), the philosophy of EDA can be wrapped up as follows: