As artificial intelligence and machine learning have become more accessible, the question has become less about whether to use this technology, but rather about how to use it. It has broad applications across industries, from recommendation engines to online chat features to risk scoring for fraud detection, and more. And, there is good reason to turn to AI and ML: according to Gartner, organizations describing AI as “strategic” outperformed their peers 80% of the time over the past 9 years.
Machine learning can be all the more powerful when paired with other types of technology. Graph technology, a relatively new type of analytics that thrives on connected data, can be naturally combined with machine learning in a complementary approach. Graph machine learning has the potential to be transformational for many businesses, bringing benefits such as improved efficiency and lower costs.
This article gives a brief introduction to graph analytics, then looks at how graph machine learning models can enhance artificial intelligence and machine learning, with a recommendation engine use case as an example of graph machine learning in action.
Let’s start with the basics of what graph analytics is. Graph is built to work on connected data. The essential components of a graph data model are nodes or vertices and edges or relationships. A node represents an individual data point, such as a person, a place, a phone number, etc. An edge is how two nodes are connected, for example a person has a phone number.
Graphs can be used to represent all kinds of networks: social networks, networks of fraudsters, IT infrastructure networks, etc. Because graphs are built on the idea of connections and links, they are a good choice for integrating data from across sources.
The functionalities of graph and machine learning complement each other. Machine learning combines statistical and analytical techniques to classify information and spot patterns within data. It can go beyond static rules in doing so, and scale human insights, turning them into algorithms.
Graph is used to detect complex patterns and provide visual context to analysis. Graph data can be ingested into machine learning algorithms, and then be used to perform classification, clustering, regression, etc. Together, graph and machine learning provide greater analytical accuracy and faster insights.
Graph also increases the explainability of machine learning. It can demonstrate why an AI system arrived at a certain decision, for example assigning a specific risk score to a particular loan applicant.
Graph features can work as an input for machine learning. For example, you can use PageRank score as a feature in a machine learning model.
On the other hand, machine learning can also enhance graph. You can use machine learning to perform entity resolution, for example, and combine different data sources into one single graph.
Graph visualization, or network visualization, can then be used to assess anomalies, patterns, or insights identified using graph machine learning. A graph visualization tool also makes the information generated by a graph machine learning system easily accessible to analysts and other end users with limited technology skills.
A simple architecture mixing machine learning and graph analytics might look like the following. It starts with the data you already have, maybe in legacy systems such as relational databases. That data then flows into an AI platform, such as Dataiku. There it is cleaned and enriched.
That data can then be stored in a graph database such as Neo4j. Such a database can also run graph analytics. For example, the graph database and the Graph Data Science Platform can be used to compute generic graph metrics (e.g. betweenness centrality, PageRank score), domain-specific graph metrics (e.g. the number of known fraudsters a given client is indirectly connected to) or even graph embeddings (which translate graph data into a machine learning friendly format).
This data can be looped back in the AI platform as extra features. Here it can be used with the other data points to train a machine learning model and make the model more efficient. The machine learning model can enrich the graph with extra properties such as a risk score.
All of this information can be made readily available to end users through a graph visualization tool such as Linkurious Enterprise. Specifically, the risk score computed in the AI platform can be used to generate alerts in the Linkurious Enterprise case management system. Such alerts can be visually investigated by analysts to eventually confirm or dismiss them. These human insights can also be captured via the Linkurious Enterprise API, providing a feedback mechanism to the machine learning algorithm.
As an alternative to integrating your data science platform with your graph analytics stack, it’s also possible to run your graph analytics and machine learning workloads in a single integrated environment.
You can do this using a tool such as the Neo4j Graph Data Science library, which includes machine learning capabilities. This type of tool can be used to predict links and classify nodes.
The first step is to load your data into your graph database. You can then use the data science library to train a machine learning model with your graph data. The model can then be used to make predictions. In a fraud detection use case, those predictions might include whether two persons are likely to know each or whether a node is likely to be similar to other nodes representing known fraudsters.
Just like in the previous example, this information can be made readily available in a graph visualization tool. For instance, the likelihood that a given node is indeed similar to existing “fraud” nodes can be used to generate alerts in the case management system. These alerts can be visually investigated by analysts to eventually confirm or dismiss them.
Analysts can enrich the graph with extra information by flagging a transaction or individual as suspicious or specifying what type of risk it is related to (terrorism funding, trade-based money laundering, etc).
As these human insights are stored in the graph database, they can immediately be leveraged by machine learning, helping close the loop between the machine learning algorithm and human intelligence.
Recommendation engines are tools to help users find relevant information among many different options. In a world with increasing choices, recommendation engines are essential features of e-commerce platforms, social networking sites, media platforms, and more. But recommendation is challenging, since people tend to have diverse and varied interests and tastes.
Graph machine learning can help more easily tackle the problem of building a recommendation engine.
To start out, you’d need to gather data around user features and around product features in the case of an e-commerce recommendation engine. Next, you’d build the graph where each feature is a node, and each interaction is an edge. For example, a customer we’ll call Rebecca might have clicked on items 2, 5, and 15, and purchased item 5. Based on these past interactions, you want to recommend new products to customers.
One option is to use knowledge graphs containing all kinds of information about a product. If we’re talking about clothing, that might include the brand, the size, the style, the color, etc.
Through an embedding layer, you can understand the path a customer takes to arrive at one product of interest from another - what the path is that takes them there. A long short-term memory (LSTM) neural network layer on top of that can then embed, learn, and generate probabilities for the paths for each item for sale.
Linkurious is a software company providing technical and non technical users alike with the next generation of detection and investigation solutions powered by graph technology. Simply powerful and powerfully simple, Linkurious Enterprise helps more than 3000 data-driven analysts or investigators globally in Global 2000 companies, governmental agencies, and non-profit organizations to swiftly and accurately find insights otherwise hidden in complex connected data so they can make more informed decisions, faster.