Cyber security : how to use graphs to do an attack analysis
Cyber security experts have a challenging job. They analyse huge datasets to track anomalies, find security holes and patch them. Reacting quickly against an attack is key. We are going to see how graphs can accelerate an attack analysis and help identify potential attack vectors before they are used.
In the past few years organizations like Sony, LinkedIn, NASDAQ or the CIA have been hacked. For these organizations, it has resulted in private information exposure, downtime, tarnished reputations and millions of dollars in lost revenues.
There is no sign that these attacks are going to stop either. Criminals are well other of the value of information. Today for example, there is a black market where Zero-Day exploits, an attack method that exploits a previously unknown security breach, can be sold. The best hackers can sell to the highest bidder their discoveries. The market is booming,Ā stimulated by governmentsĀ who are looking to arm themselves.
The cyber security teams are under increasing pressure from these new threats. To defend their organizations, they can rely on a wealth of data. Typical monitoring systems can generate in the terabytes of data : can it be used toĀ thwart attacks?
Cyber security teams have a wealth of data on their hand.Ā From IP logs, network logs, communication or server logs, the various tools they use to monitor their systems generate a lot of data.Ā Large enterprises generate an estimated 10 to 100 billion events per day. That sort of volume is a challenge for traditionalĀ security information and event management (SIEM) tools. They Ā are designed to analyse logs, network flows and system events for forensics and intrusion detection but are not equipped to handle big data.
Volume is an issue for security experts but it is not necessarily the biggest one. Security data is often both largeĀ andĀ unstructured as it comes from heterogeneous, incomplete data sources. Working with unstructured data in tabular oriented tools is never a good idea. It can work but at a price :
- difficulty to integrate new sources ;
- complexity in structuring and querying the data ;
- poor performances when querying the data ;
On the contrary, graph databases likeĀ Neo4j,Ā TitanĀ orĀ InfiniteGraphĀ make it easy to store and query unstructured data, even as the volume grows. That is whyĀ companies like Cisco are turning to graph technologies to design the next generation of cyber security solutions. With Titan, Cisco ingestĀ 10 terabytes of security data per month.
To understand why graph technologies can help cyber security we are going to use a concrete example. Cisco has published a blog post that details how its graph analytics capability can protect customers against zero-day exploits. A zero-day exploit is a previously undiscovered security flaw in a software. Between the moment it is discovered and until the software is patched by those who use it, hackers can use the flaw to compromise systems.
A common technique isĀ phishing. A criminal masquerades as a trustworthy entity to obtain sensitive information. We are going to study an example of this.
RecentlyĀ Internet Explorer zero-day exploit (CVE-2014-1776)Ā was used in phishing attacks. The hackers sent mails to victims who were asked to login into a website where their identification information was captured.
As we can see in the mail above, one of the domain used by the hackers wasĀ inform.bedircati.com. Among the other domains wereĀ Ā profile.sweeneyphotos.com, web.neonbilisim.com and web.usamultimeters.com. Security providers quickly blocked these domains. In addition to this, Cisco was able to quickly identify other potential domains used by the hackers and protect its customers against them. Letās see why.
As all web domains, the domains used in the phishing attack are linked to a couple of entities :
- an IP address :Ā a numerical label assigned to each device (e.g., computer, printer) participating in a computer network that uses the Internet Protocol for communication ;
- a name server : aĀ name serverĀ is a computer hardware or software server that implements a network service for providing responses to queries against a directory service (it turns a domain name into an IP address) ;
- a registar : Ā an organization or commercial entity that manages the reservation of Internet domain names ;
The IP address are unique but the name servers and registars can link the domain names to other domain names. A graph model is ideal to represent these entities and their connections :
This model shows how easy it is to model our data with a graph. For Levi Gundert from CiscoĀ :
Visually we can interpret the data in a glimpse.
As a cyber security provider, Cisco keeps track of the domain names.Ā Through its data collection program, CiscoĀ has good information onĀ 25 to 30 million Internet domains. It knows which of these millions of domains are controlled by hackers and which are not. It might sound like a lot. But there is an additional 180 million domains on which Cisco has no information.
When Cisco was first alerted about the attack, it analyzed its data to find what the domains involved were connected to. Finding connections between entities in a large dataset is where graph databases are most useful.
The schema below represent the result of the investigation Cisco conducted after the zero-day attack. Notice all the domain names in blue. Cisco started with two domain names but used graph analytics to identify 21 other domain names suspiciously linked to the first two.
We can see :
Ā
- 23 domains (light blue) ;
- 3 name servers (pink) ;
- 2 IP addresses (green) ;
- 1 registrar (orange) ;
Ā
The suspicious domain names can now be monitored so that they cannot be used in other phishing attacks. What is really impressive in this investigation is that Cisco was able to quickly block domain names before they were used by the hackers.
Graph technologies like Neo4j, GraphLab or Titan can help analyse large graph datasets quickly. Graph visualization solutions like Linkurious complement this by making the insights derived from graph analytics easy to interpret.
The picture above represents data on IP addresses, domains, DNS records and WHOIS information. This information can be shown in lists, tables but as such is hard to interpret. The analyst would struggle to grasp the connections without graph visualization.
Ā
Michael HoweĀ from Cisco, explains that using edges and nodes to represent the data is very important :
A common pitfall though is to try to always look at all the data at once. It might sound seducing (āIām looking at everything so Iām sure Iām not missing somethingā) but is ill-advised, especially for large datasets. According to Michael Howe :
To get the most out of visualization, the analysts should be focused on specific subsets of their data. As in the graph visualization of the zero-day exploit.
Cyber security is a huge challenge. Today, the emerging graph technologies offer new ways to tackle security data and use it to prevent attacks or react faster.
A spotlight on graph technology directly in your inbox.