Analysing the Offshore Leaks with graphs
The Offshore Leaks released in 2013 by the ICIJ is a rarity. It is a big dataset of real information about some of the most secret places on earth : the offshore financial centers. The investigation of the ICIJ brought to the surface many interesting stories including the suspicious activities of the President of Azerbaijan. We are going to see how graph technologies can help us make sense of the complex data in the Offshore Leaks.
The Offshore Leaks : a treasure trove or a maze?
In April 2013, the  International Consortium of Investigative Journalists unmasked details about 130 000 offshore accounts in what is possibly the biggest hit against tax fraud of all time. The source of the data is still very mysterious. It hasnât stopped an international network of more than 100 journalists to explore what is called the Offshore leaks.
Â
It is not an easy job. Offshore accounts are used by savvy people who sometimes use them to hide funds. These funds can be linked to tax fraud, corruption or criminal activities. Needless to say, a lot of precautions are used to hide the origin of funds, who control them and what they are used to. The publicly released dataset alone contains contains around 250k nodes, 500k edges and 1.2 million properties. The size and complexity of the Offshore leaks makes it challenging to uncover information in the data.
Â
That is where graph analytics and graph visualization really shine. These techniques can help us uncover hidden facts. To show this we are going to focus on Azerbaijan. The Offshore Leaks include information on how the President of Azerbaijan had a mogul secretly create offshore accounts for his adolescent daughters, possibly against the law. Ready to jump in?
Using graphs to model financial and personal ties : our ICIJ data model
We want to know how the President of Azerbaijan is connected to offshore accounts. This means that we will need to focus on the network he uses to control his assets stored in offshore entities. These networks includes family members and a complex set of intermediaries or partners. We want to see how things are connected so we are going to have to represent each of these entities as distinct nodes in a graph.
Â
Our graph will include :
Â
- people : they are the ones building and using the network. Some of them are quite visibleâŚsome are working behind the scene ;
- companies : offshore entities, banking services providers, businesses ;
- addresses : they are interesting to track, especially as they have legal implications (eg a company registered in a financial heaven pays no tax) ;
Â
People and companies have addresses they are connected to. People can be linked together directly by family ties. Links between companies are more complex. A company can be designated as the âmaster clientâ, as the âoffshore providerâ or as the ârecords and registersâ of an other company. All off these relationships refer to diverse roles as intermediary or go-between. They are usually played by very discrete companies. They are selling their know-how and contacts to the people who want to take advantage of the offshore system.
Â
Here is a concrete example of the network behind the ICIJ dataset :
John and Sam are married. They have stored assets in the company Treasure ltd. They control the company as they are both shareholders and John is a Director. To set it up, they have used an address in Dubai. The company, Treasure ltd, is established in the Bahamas. That means that the assets it controls are private and tax-free. Treasure ltd has been set up with the help of two companies :Â Good Advice Inc and Hide and Seek. Oleg, a business partner of John, is also a Director of Treasure ltd.
Â
This schema is one of the many ways we can model the ICIJ dataset. I have added here the concept of family ties. That kind of data is not present in the Offshore leaks but will be key for our subject.
Â
The modeling phase we wen through is a first step. It will allow us to start asking questions to the data. In order to do that, I have extracted from the ICIJ dataset the records linked to President Ilham Aliyev and loaded it in a Neo4j graph database. You can download it here.
Uncovering suspicious connections with Cypher
Neo4j comes with a powerful querying language called Cypher. It is perfect to ask sophisticated questions related to the connections in a graph. We want to know if the ICIJ Offshore Leaks prove that President Ilham Aliyev has an offshore bank account.
Â
A traditional approach would be to look for the occurrences of his name and start digging from here. That method would have us following a difficult paper trail, hopping from document to document. It would be time consuming and error-prone. With graph analytics we can formulate the same question and get a precise answer very fast.
Â
Here is how to see the account(s) President Ilham Aliyev controls directly according to the ICIJ dataset :
MATCH (president)-[r]->(offshoreaccount:Company)
WHERE president.first_name = âIlhamâ
RETURN offshoreaccount.name as company, offshoreaccount.form as form, offshoreaccount.incorporation as incorporation, offshoreaccount.status as status, r.date as date, r.role as role
The results are :
We have two results. In 2003, Ilham Aliyev became the shareholder and  Director of a company called Rosamund International Ltd. That company has ceased to be active. Nonetheless in 2003 Ilham Alieyv was serving in parliament. Thus he may have violated constitutional provisions against members of Parliament operating or owning businesses.
Â
Rosamund International Ltd is no longer active though. On appearance it seems President Ilham Aliyev has no other offshore accounts according to the ICIJ dataset. The only issue is that for the moment we have focused on direct paths between Ilham Aliyev and offshore assets. People who are trying to hide money tend to use proxies they can hide behind. That means that we must enlarge our search and look for indirect connections.
Â
Using traditional data analysis techniques, we would embark on a evidence hunt. Looking at who the president is linked to, who they are linked to, etc. Doing this manually is time consuming and hard to do on large scale data. Traditional databases and data analysis tools are not always more helpful : they lack the ability to express queries around the connections in the data. When they do, the process is cumbersome and slow.
Â
That is where graph technologies shine. With a Neo4j database for example, finding all the foreign assets Ilham Aliyev controls directly or indirectly is as simple as adding a â*â to our first query. The search will return all the paths in the data between Ilham Aliyev and offshore accounts.
MATCH (president)-[r*]->(offshoreaccount:Company)
WHERE president.first_name = âIlhamâ
RETURN offshoreaccount.name as company, offshoreaccount.form as form, offshoreaccount.incorporation as incorporation, offshoreaccount.status as status
The results are :
With the graph visualization interface of Linkurious we can visualize directly the result of the Cypher query. Just by looking we understand that through his family connections, Ilham Aliyev has ties with 3 more companies : Harvard Management Limited, LaBelleza Holdings and Limited Arbor Investments Limited. They were setup in 2008 with the involvement of Leyla and Arzu Aliyeva, the Presidentâs daughters. At this time the two women were 19 and 23 years old. Quite young to be interested in international financeâŚ
We can also see that the Aliyev family shares connections with other actors. Portcullis Trustnet and Naziq & Partners in particular are linked with Harvard Management Limited, LaBelleza Holdings and Limited Arbor Investments Limited.
Â
Naziq & Partners is a Malaysian Law Firm. It is listed as a âMaster Clientâ, an advisory role. Portcullis Trustnet appears as in charge of âRecords And Registersâ. This company has an amazing story that you can check. According to its website, it provides a wide range of services to high net worth individuals and funds. Its large presence in the ICIJ dataset attests to one thing : it is specialized in setting up complex legal schemes for people and companies who want to use offshore accountsâŚdiscreetly.
Â
If we want to measure the role of the two middle-men, we can use a Cypher query again.
MATCH (president)-[r*]->(offshoreaccount:Company)
WHERE president.first_name = âIlhamâ
WITH offshoreaccount
MATCH (offshoreaccount)-[t]-(middlemen:Company)
RETURN middlemen.name as name, count(DISTINCT t) as mentions, type(t) as type, t.role as role
ORDER BY mentions DESC
The results we get back are :
But what do these findings mean? I havenât selected mentions of the Aliyev family in the ICIJ dataset at random. Two ICIJ associated journalists, Khadija Ismayilova and an anonymous Azerbaijani reporter, wrote an amazing article detailing the Aliyev story. Their take on the data is clear. The offshore accounts have been setup to funnel money to President Aliyev. The funds are payments made by Hassan and Abdolbari Gozal, two businessmen who have been awarded more than $4.5 billion in construction contracts in Azerbaijan.
MATCH (Ilham:Person {first_name:âIlhamâ}),(Abdolbari:Person {first_name:âAbdolbariâ}), p = shortestPath((Ilham)-[*]-(Abdolbari))
RETURN p
This simple query will look into all the data and finding the shortest path between Ilham Aliyev and Abdolbari Ghozal. We can see that these two persons are separated by three connections. Even in a large dataset like the Offshore Leaks, bringing hidden connections to the surface is dead simple.
On some of the stories the ICIJ has surfaced, the journalists were able to leverage technology. A story about the commerce of dead bodies for example comes with a complete article about methodology. Not surprisingly, the ICIJ journalists relied heavily on network analysis software, including Palantir.
Â
Palantir is a software company specialized in data analysis. Its main customer is the US government and it works on fraud, cyber-terrorism or intelligence analysis. But that kind of technology is no longer confined to US government agencies. Today with tools like Neo4j or Linkurious, it is possible for most companies to store and analyse big graphs.
Â
Â
The ICIJ journalists did a great job of investigating the Offshore leaks. Their work shows the suspicious dealings of the Aliyev family. Equipped with Neo4j and Linkurious, we have seen that we can quickly come to the same conclusions. Graph technologies offer an exciting way to bring to the surface information. If you have complex, highly connected data you struggle to understand, give it a try!
A spotlight on graph technology directly in your inbox.