Technology

Anomaly detection algorithms: How to find outliers in your connected data

May 29, 2024

Anomalies and outliers in your data often tell an important story. Understanding unusual behavior within your data helps extract business insights and operational intelligence, manage risk, and lets you get to the root cause of important changes within your organization. For cases like fraud, money laundering, cybersecurity, etc., detecting these anomalies can be the key to stopping harmful activities before they do too much damage.

In a context where organizations are working with large networks of complex, interconnected data, graph technology stands out as a powerful tool for anomaly detection. It lets you quickly cut through the noise to find the signals, outliers, and unusual patterns that matter.

Here’s a look at some of the top anomaly detection algorithms for graph that let you get to the bottom of your connected data quickly.

Detecting anomalies with graph technology

Graph analytics is particularly well suited for anomaly detection in connected data because of its ability to capture and analyze relationships between data points. Traditional anomaly detection techniques often treat data points as independent instances, failing to account for the complex connections and dependencies that exist in many real-world scenarios.

In contrast, graph analytics relies on representing data as a network of nodes (entities) and edges (relationships), allowing for a more holistic and contextual understanding of the data. A graph representation enables the identification of anomalies not only based on the properties of individual nodes but also on the patterns of connections between nodes.

Graph analytics can uncover anomalies in various forms, such as:

  1. Structural anomalies: These involve unusual patterns in the graph structure itself, such as nodes with an abnormally high or low number of connections, or clusters of nodes with unexpected connectivity patterns.
  2. Attribute anomalies: In addition to structural anomalies, graph analytics can detect anomalies in the attributes or properties of nodes and edges, such as outliers in numerical attributes or unusual combinations of categorical attributes.
  3. Temporal anomalies: When dealing with dynamic graphs that evolve over time, graph analytics can identify anomalies in the temporal patterns of node and edge creation, deletion, or attribute changes.

Types of graph algorithms for anomaly detection

You can apply several broad types of algorithms for anomaly detection in graph analytics, for example:

  1. Community detection algorithms: These algorithms identify tightly-knit groups or communities within the graph, allowing for the detection of nodes or edges that do not conform to the expected community structure.
  2. Similarity-based algorithms: By measuring the similarity between nodes based on their structural properties, attribute values, or a combination of both, these algorithms can flag nodes or edges that deviate significantly from their peers.
  3. Pattern matching algorithms: These algorithms identify specific patterns hidden within a graph.
  4. Probabilistic models: Techniques like Bayesian networks or Markov models can be applied to graph data to model the expected distributions and dependencies, enabling the identification of anomalous deviations from these models.

Anomaly detection algorithms for your complex connected data

The exact anomaly detection algorithms you use will depend on your specific graph data model, which may not be the same as the one you’re using. These examples provide inspiration for some of the many ways you can query your graph to detect anomalies across use cases.

AML: Detecting a discrepancy between owner income and property value

In a real estate scenario, discrepancies between owner income and monthly loan installments represent an anomaly that can indicate financial crime. For example, if one-third of the buyer’s income is less than the monthly cost of the loan to purchase the property, the loan can represent a risk of either going into default or being part of a money laundering scheme.

The pattern is translated into Cypher language as follows (you can use other graph query languages for anomaly detection):  

MATCH (l:MortgageLoan)<-[e:HAS_LOAN]-(p:Person) WITH e,l,p,p.annual_revenues/36 as max_monthly_instalment WHERE max_monthly_instalment < l.monthly_instalment RETURN p, l, e

Financial crime: Detecting a synthetic identity pattern for multiple customers sharing the same contact details

Fraudsters and money launderers may use multiple accounts to steal assets or conceal their origins. In many cases, one person would be operating these accounts while hiding behind synthetic identities. Without graph databases to reveal these networks, it’s hard to find these individuals in our data. With graph databases, a simple pattern finding customers connected through personal information can reveal the synthetic identities.

In Cypher language here’s how the pattern translates:

MATCH path=(p:PERSON)-[:HAS_ADDRESS|HAS_IP_ADDRESS|HAS_PHONE_NUMBER*..10]-(b:PERSON) WHERE length(path) > 5 RETURN p,path

Financial crime: Detecting potential communities of fraudsters

If multiple clients are sharing personal identifiable information (PII) such as email addresses, home addresses, addresses or phone numbers, maybe there’s something suspicious going on. Graph analytics can help leverage simple insights to uncover hidden risks.

In Cypher language here’s how to create a virtual graph of individual clients sharing personal information and then using a community detection algorithm to identify communities of suspicious clients:

//Create personal information graph CALL gds.graph.project.cypher( 'individual-graph', 'MATCH (n) WHERE n:PERSON OR n:COMPANY RETURN id(n) AS id', 'MATCH (a)-[:HAS_IP_ADDRESS|HAS_EMAIL|HAS_ADDRESS|HAS_PHONE_NUMBER]->(PI)<-[:HAS_IP_ADDRESS|HAS_EMAIL|HAS_ADDRESS|HAS_PHONE_NUMBER]-(b) RETURN id(a) AS source, id(b) AS target', {validateRelationships:False} ); //Identify connected components based on personal information (IP address, email, address, phone number) CALL gds.wcc.write('individual-graph', { writeProperty: 'componentId' }) YIELD nodePropertiesWritten, componentCount;

Once this information has been computed, a simple pattern matching algorithm can help surface risky communities of clients:

MATCH (a:PERSON) WITH a.componentId as community, count(*) as size_of_community, COLLECT(distinct a) as clients WHERE size_of_community > 5 RETURN community, clients, size_of_community;

How to leverage graph analytics for anomaly detection with Linkurious Enterprise

Linkurious Enterprise includes an alert system that helps turn graph queries into actionable alerts. As a data scientist, you can simply save a graph query as a model associated with an alert. As new data is ingested, every time a subset of the graph matches the graph query a new case will be created.

End users can simply review cases within the Linkurious Enterprise UI and leverage advanced visualization capabilities to assess their relevance.

Linkurious Enterprise also includes an advanced anomaly detection feature, multi-model alerts. This key milestone in graph-based anomaly detection transforms the process of anomaly detection by aggregating multiple detection rules into a single alert. Thus uncovering more complex hidden patterns that may be a combination of weak signals, delivering a maximum amount of context to analysts and investigators, and reducing their backlog.

Conclusion

Throughout this article, we’ve seen how graph analytics can aid in anomaly detection and offset some of the shortcomings posed by traditional detection methods. By leveraging graph analytics, organizations can better extract valuable insights, manage risks, and make sound decisions to address potential issues such as fraud or cybersecurity threats.

Tools like Linkurious Enterprise, further streamline the anomaly detection process by integrating multiple detection rules, providing comprehensive and context-rich alerts. Embracing graph-based anomaly detection can significantly bolster an organization's operational intelligence and security measures.

Watch our product tour to learn more. 

A banner reading "Watch the Linkurious Enterprise product tour" with a call to action to watch now
Subscribe to our newsletter

A spotlight on graph technology directly in your inbox.

TOP