Entity resolution: how graph analytics helps build a holistic view of data
Data drives today's business operations and decision-making, but organizations often need to work with a complex and ever-growing array of data sources. With such a complex data landscape, entity resolution is an important asset in ensuring data quality. Entity resolution is the process of associating any type of related information across different data sources to build a holistic view of the underlying entities.
This article explores the ins and outs of entity resolution: how it works and why ensuring you're working with high-quality data quality leads to stronger and more reliable analysis of your data. We'll also look at why graph visualization and analytics is particularly well suited for performing entity resolution across industries, including for financial crime use cases.
Across industries, most businesses are working with data that is scattered across multiple systems. Entity resolution, also called data matching, is the process used to determine whether records from one or multiple of these data sources represent the same entity, and then linking those records.
In an ideal world, each individual or company in your databases would be unique. That most likely isn’t the case, however. The John Smith in the retail bank database might have a different ID than the John Smith in the consumer credit database, or than the John Smith in the company owners database you have purchased. In reality, these three individuals are actually the same person, and entity resolution can inform you that it’s the case.
It is critical when trying to build a holistic view of data scattered across different systems.
Technology can help perform this process at scale. Entity resolution technology is based on probability, so no matter the amount of data you have or how smart your algorithm is, there may be ambiguity in matching records.
Entity resolution is an important step in getting a clear view of your data, especially when it's coming from different sources. You might decide to complement internal data with external systems, which are susceptible to contain records that already exist in your own systems. Then you’d want to cross reference to understand if similar records are actually the same person, and link those records.
If you skip this step, you might not be able to accurately assess the risk of a given client, for example, or miss other key information you need to make good business decisions.
In short, entity resolution lets you gain a single view of your data, leading to better, more accurate decision-making.
There are many use cases for anti-financial crime applications. Modern anti-financial crime systems rely on vast amounts of data from multiple sources to accurately evaluate risk, uncover suspicious activity, and more. Entity resolution is a key step in getting a holistic view of various data sources to enable analysts to effectively carry out these processes.
Identifying the Ultimate Beneficial Owner (UBO) of an organization is a key step in understanding who your organization is doing business with. It’s a standard component of know your customer (KYC) processes.
Many financial institutions and other businesses subject to AML regulations obtain databases of ownership information for KYC purposes. Entity resolution can help you better leverage this information to enrich your understanding of your clients, make the right onboarding decisions, and minimize risk.
Financial institutions must constantly assess levels of risk, both at customer onboarding and as part of ongoing customer due diligence (CDD). A single client may have multiple records with slightly different information in multiple databases. Entity resolution helps look at clients holistically to properly assess risk. It helps ensure you see the full picture.
Fraudsters may create multiple fake identities or misrepresent certain information to avoid detection. In a world where organizations are inundated with data, this can be an effective criminal technique. By gathering disparate information, entity resolution can effectively thwart this type of fraud by flagging such suspicious behavior.
The records on which you want to perform entity resolution may contain tens of millions of data points that you need to analyze quickly. That’s where a graph visualization and analytics solution comes in. Graph analytics enables you to perform entity resolution at scale.
Graph analytics is able to quickly detect common links across different entities to help identify potential duplicates and then group those entities to provide a single source of truth for your decision making. A tool like Linkurious Enterprise can do this automatically through queries and alerts, performing entity resolution in near real time.
There are multiple ways to leverage graph analytics to perform entity resolution. In this article we highlight a concrete approach that can be used to kickstart your journey.
- The first step is to import your data into a graph. This can be data from internal sources such as databases containing KYC information or transaction data. You can also import external data, such as databases of company ownership or information on politically exposed persons (PEPs). Each entity is turned into a node connected to other entities via relationships. As an example, a client might be connected to a phone number. There may already be indirect connections between entities, such as two clients sharing the same phone number.
- The next step involves analyzing the connections between entities to identify similar entities. Simple business rules can be applied here. Two clients using the same ID number, for example, should be considered as similar. The same goes for two clients sharing a name, an address, and a phone number. Graph algorithms quickly identify these similarities.
- These similar entities can then be grouped together using community detection algorithms. These groups are then resolved into a single entity called a “golden record”.
- Finally, all of this information can be made accessible to analysts via Linkurious Enterprise.
Linkurious Enterprise is a powerful tool to help investigation teams with entity resolution challenges.
First, the connections between records identified during the entity resolution process are visible to the analyst. Visual aids such as size and color help identify the level of confidence that multiple records represent the same entity. It helps open the black box of the entity resolution process so analysts can make informed decisions.
Second, the Linkurious Enterprise alert system helps detect potential duplicates and showcases these results to the analyst. The analyst can then confirm or dismiss these results. Human and algorithmic intelligence are thus combined in the same process.
Finally, analysts are able to delete or merge entities to manually resolve duplicates in Linkurious Enterprise, giving them as much control as they need over the process.
What does an entity resolution operation look like in practice? During our joint webinar on the topic, Softlink Analytics CTO Shrey Iyengar demonstrated how they have set up the process with Linkurious Enterprise for one client working with some 1.1 billion nodes and 4.2 billion relationship, numbers which are growing all the time. The process operates in continuous integration, continuous development mode, or CICD mode. In other words, the system is allowing multiple data sources to feed into the graph subsystem on a continuous basis, including adding new data sources. And rules can be modified or added while ensuring these changes are impacted in real time as the data is being ingested.
They have set up several rules for matching entities. For example, if two individuals share a social security number, they are considered the same. Linkurious Enterprise has alerts that can run queries automatically to find these specific attributes. When there is a match according to Softlink’s rules, entities are grouped together to form a golden record. The alerts to cluster related entities are programmed to run every 5 minutes, which can be modified according to the client’s needs.
Softlink has found the system easy for their end users. “Linkurious Enterprise is able to house two different types of users,” says Shrey Iyengar. “The power user, who can modify the queries as the rules change. The second type of user is the business user, who will see these relationships being created automatically. For them, the system is almost magic, with golden records being formed out of thin air, and these entities are just being resolved automatically. This is what a good entity resolution system should be able to do.”
Watch our in-depth webinar and see for yourself how our partner Softlink Analytics is using Linkurious Enterprise to streamline the process.
A spotlight on graph technology directly in your inbox.