Linkurious Enterprise

Visualizing the network of Hillary Clinton’s emails

March 7, 2016

Courtesy of the US State Department we have access to part of Hillary Clinton’s emails. Using graph visualization we will explore this data, focusing not on the content of the emails but on their metadata. Let’s see what kind of information we can uncover about Hillary Clinton’s professional network.

Building a graph of Clinton’s emails with Neo4j

Clinton, then Secretary of States of the United States, had the habit of using a personal server email server to exchange professional emails. When this was revealed, it caused a public controversy.

The data was later the object of a public records request. The State Department reviewed the emails to decide which were too sensitive to be turned over. The rest of the emails were published on a monthly basis as PDF documents. WSJ journalistsBen Hamner and others have undergone the task of turning it in a more exploitable format. For the purpose of the article we will use a cleansed version of the data prepared by Ryan Boyd. It consists in a single CSV file.

The script below (courtesy of Boyd) turns the CSV file into a Neo4j graph database:

// Creating the graph USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM “https://s3-us-west-2.amazonaws.com/neo4j-datasets-public/Emails-refined.csv” AS line MERGE (fr:Person {alias: COALESCE(line.MetadataFrom, line.ExtractedFrom, ”)}) MERGE (to:Person {alias: COALESCE(line.MetadataTo, line.ExtractedTo, ”)}) MERGE (em:Email { id: line.Id }) ON CREATE SET em.foia_doc=line.DocNumber, em.subject=line.MetadataSubject, em.to=line.MetadataTo, em.from=line.MetadataFrom, em.text=line.RawText, em.ex_to=line.ExtractedTo, em.ex_from=line.ExtractedFrom MERGE (to)<-[:TO]-(em)-[:FROM]->(fr) MERGE (fr)-[r:HAS_EMAILED]->(to) ON CREATE SET r.count = 1 ON MATCH SET r.count = r.count +1; MATCH (a:Person)-[r]-(b:Email) WITH a, count(r) as count SET a.count = count;

The result is a graph of 8,278 nodes and 16,335 edges.

In our graph we have 2 types of nodes: persons and emails. Persons are linked to emails by “from” and “to” relationships. In addition, persons are directly linked by a relationship when they have exchanged emails.

Schema of our data.

Visualizing Clinton’s emails and her professional network

Now that we have prepared the data, we can use Linkurious to start investigating it. First let’s look up Hillary Clinton.

Hillary Clinton
Hillary Clinton

Time to explore what Clinton is connected to. Instead of visualizing all the 7,945 emails she has sent or received, let’s focus on the people she is connected to.

Who has Hillary exchanged emails with.
Who has Hillary Clinton exchanged emails with?

Clinton has exchanged emails with 210 persons. Already there are some interesting things to notice. We have a lot of isolated nodes (nodes which are only connected to Clinton) in the top right corner of the screen. In the bottom we have a group of nodes which are highly interconnected. Among them are Cheryl Mills, former Counselor and Chief of Staff,  and Lona Valmoro, Special Assistant. The people in this group are in contact together and form a community. These are Clinton’s closest professional contacts.

Hillary Clinton's closest contacts
Hillary Clinton's closest contacts

In this network, who are the most active persons? Let’s map the size of the nodes to the number of emails sent and received.

Who are the most active participants in the network?
Who are the most active participants in the network?

We can see that Cheryl Mills, Huma Abedin and Jake Sullivan are the biggest nodes and thus the most active participants (after Clinton) in the network.

Let’s shift our focus to the isolated nodes. They represent people who exchanged with Clinton but were not involved in her day to day work. For example, Cherie Blair, wife of former British PM Tony Blair, is one of these isolated nodes connected to Clinton.

Clinton and Blair are connected
Clinton and Blair are connected.

When we expand Blair’s connections, we see that she received four emails from Clinton with subjects “Confidential, “Get well soon”, “Sorry to miss you” and “Get well soon”.

Clinton and Blair exchanged 4 emails
Clinton and Blair exchanged 4 emails.

We can select the “Confidential” email and read the content.

A “confidential” email.
A “confidential” email.

We don’t have to look at the content of Clinton’s emails though to learn more about her activity at the State Department. Graph visualization helps us turn the emails’ metadata into a clear view of Clinton’s network. We can identify key people and communities quickly and easily.

Subscribe to our newsletter

A spotlight on graph technology directly in your inbox.

TOP