Unlocking the power of unstructured data with integrated link analysis

July 4, 2023

Unstructured data is everywhere: it makes up some 90 percent of the digital universe (1). It has huge potential to deliver value and meaningful insights to data-driven organizations for use cases like facilitating R&D, improving risk management, optimizing customer experience, knowledge discovery, or even for security and intelligence.

But it can be a real challenge to effectively manage and derive value from unstructured data. Its lack of defined structure and siloed nature make it difficult to query. And technical limitations within organizations can seriously limit the potential of the data at your disposal. 

But new technological innovations in the field of Natural Language Processing (NLP) and AI can make your data searchable in a consistent way and help you understand the significance of your data. One of the benefits of these advances is that it’s becoming possible to perform advanced link analysis on massive sets of unstructured data to explore the relationships within and gain additional key insights.

In that perspective, graph technology brings a lot of value, enabling organizations to swiftly and easily perform link analysis on huge volumes of connected data, generating important insights that would have otherwise been overlooked. Solutions that integrate NLP, AI, and graph technology are therefore opening new avenues for unstructured data analysis and enhanced intelligence. We’ll explore how such an integrated solution presents important advantages for organizations.

To help tackle the topic of unstructured data and how to manage it, we spoke with Stephen Stewart, Chief Technology Officer at Nuix. Nuix is a leading provider of investigative analytics and intelligence software, and has developed an advanced NLP solution that is helping push the boundaries of complex data analysis.

What is unstructured data?

Unstructured data is a type of data that does not have a defined structure or format. It includes a variety of sources, including text files, emails, social media posts, images, videos, and audio recordings. This type of data is not easily searchable or analyzed using traditional methods.

Depiction of some sources of unstructured data: images, videos, emails, audio, social media posts, text files
Unstructured data comes from a variety of sources, including text files, emails, social media posts, images, videos, and audio recordings.

A PDF of scanned documents may have no predefined structure, for example, making it difficult to extract meaningful insights from the data. As another example, audio recordings of customer phone calls may contain hours of information that needs to be analyzed for key insights.

“I like to refer to unstructured data as ‘messy data’,” says Stephen Stewart, CTO at Nuix, a company that offers tools specialized in processing and understanding unstructured data. “The reality is that ‘unstructured data’ covers a huge landscape.  

“Structured and semi-structured data is easy to manage because you have a huge amount of context for the data AND both structured and semi-structured data is usually written and enriched by some type of process. We call this machine data. Where Nuix comes into its own is with messy, human generated, unstructured data – emails, text messages, documents, presentations, spreadsheets, etc. We can also get to it in a huge range of locations – everything from file servers, cloud services like M365 or AWS S3, corporate PCs and laptops – even mobile and digital forensic images.”

The challenges of using unstructured data

“Unstructured data presents two challenges,” says Stephen Stewart. “First, it’s messy. This has to do with the fact that humans are creating the information. We create, store, and share it in ways that help us get stuff done. You have to look no further than emojis to see how far we will go to save a little work. I send a thumbs-up instead of typing out ‘OK’.  

“Second, it’s stored in a huge range of places: everywhere from email servers to cloud repositories, enterprise CRM systems - even mobile phones. This second point about being stored in a huge number of places is the key to why organizations need solutions to the data silo problem of unstructured data.” 

These two foundational challenges of unstructured data lead to several difficulties for organizations looking to harness that data. Because it lacks a defined structure, and because this data is siloed, it’s challenging to extract, structure, and organize it in a way that makes it usable. To compound that, there’s a huge variety in the nature, quality, and reliability of the data. Think about it: since unstructured data contains text, audio, images, and more, you need systems that understand semantics and context. Finally, you need link analysis to connect the dots between data points to gain a deeper understanding that will help you discover actionable insights. 

These multi-layered challenges require sophisticated techniques and powerful systems to overcome - not to mention time and resources, which can represent a huge cost for an organization.

Using unstructured data to solve business problems

Despite the challenges organizations may run up against in processing and analyzing their unstructured data, the enormous value it can hold shouldn’t be overlooked. All kinds of organizations can benefit greatly from using unstructured data as it provides a wealth of information that can’t be obtained from structured data alone. 

“Our customers include corporations, federal regulators, law enforcement agencies, law firms, and advisories,” says Stephen Stewart. “In almost every instance, they are trying to combine data that is collected from a variety of sources to answer the basic questions of Who, What, Where, Why, When and How. 

And with technological innovations, organizations are now able to go beyond answering basic questions with their unstructured data. “What is so interesting about the addition of AI to the mix is it allows us to move beyond answering questions through data retrospectively – but now we can start thinking about using AI and natural language processing to assign risk, priority, and significance. What is cool about assigning risk, priority, and significance is that it comes down to what matters most to the organization. Are you looking for internal/external fraud? Are you looking to understand trends in customer/consumer complaints? What about commonality across medical and scientific research reports? At the end of the day, when you can break down an organization's data silos, understand what people have written and start to prioritize risk OR significance – you solve a lot of problems.”

With unstructured data analysis, organizations can gain valuable insights into customer behavior, market trends, fraud risk, and more. Here are some common use cases that illustrate how unstructured data can be an asset to different types of organizations:

Unstructured data use case examples

Fraud detection: By analyzing data such as call center logs and customer emails, organizations can identify patterns and anomalies that may indicate fraudulent activity. For example, a bank may use unstructured data analysis to detect fraudulent transactions by identifying patterns in transaction descriptions and customer behavior.

Medical data analysis: Analyzing medical records, clinical notes, research papers, or imaging data to extract information about patient health, treatment effectiveness, or disease patterns. Unstructured data analysis can assist in medical diagnosis, drug discovery, or personalized medicine.

Social media analysis: Social media platforms generate vast amounts of unstructured data in the form of posts, comments, and reviews. By analyzing this data, organizations can gain insight into how customers perceive their brand, which products or services are most popular, and emerging market trends. 

Unstructured data clearly holds a lot of potential to help organizations make informed business decisions. But to access the insights unstructured data can provide, you’ll need the right tools and techniques.

Unleashing the potential of unstructured data with next-generation technologies

As data management and analytics technology advances, the ability for organizations to analyze and leverage unstructured data is growing. Two of those technologies are natural language processing (NLP) which makes unstructured data understandable and searchable, and graph technology, which makes it easy to explore the relationships between data elements.

A diagram of a tech stack to manage unstructured data: AI, graph database, and graph visualization

Understanding unstructured data with NLP

Natural language processing provides a solution for deciphering unstructured data. NLP technology uses artificial intelligence to not only understand the content of language in various forms but also identify patterns and themes within it. By being able to search through unstructured data with NLP, businesses and organizations can turn large amounts of seemingly meaningless data into actionable insights, ultimately improving decision-making processes. NLP technology has seen impressive advances over the past few years, making it all the more important for managing and analyzing unstructured data.

Nuix is a market leader in making unstructured data searchable with NLP, and provides a great example of what this kind of technology can offer. “Organizations using Nuix process petabytes of data annually,” says Stephen Stewart. “They rely on Nuix to make their unstructured data searchable in a consistent, repeatable fashion. And beyond searchability, our systems can point organizations to where to look first. 

“With Nuix’s AI (NLP Engine) we understand the text. We can tell you what it is about (topics like healthcare, government, science, arts and leisure). We can tell you what type of document it is (document types like tax forms, contracts, menus, resumes, research papers). We also extract facts using what we call cognitive expressions. These are basically like regular expressions on steroids. And then finally we can apply a risk/priority/significance score to point people to where they should look first.”

Graph visualization for unstructured data exploration

Inputting the data that has been processed by NLP tools into a graph can be a game changer in extracting insights from your unstructured data by delivering a deep understanding of the connections within. “Graph visualization is paramount to organizations being able to understand the hidden relationships between the unstructured data silos,” says Stephen Stewart. But for him, a tool like Linkurious Enterprise offers much more than simple visualization. It’s an ideal tool for exploring data. 

“What started as Nuix needing a way to show documents and then expand nodes and edges – basic graph visualization - has evolved to an understanding of what it means to do graph exploration. This is something Linkurious offers. Graph exploration starts with being able to search node and edge names and metadata, being able to conditionally scale and highlight certain aspects of the visualization, filter based metadata values and do both temporal and geospatial plots of the graph data.  

“Where Linkurious really differentiates is its combination of extensibility and interoperability. Linkurious’s extensibility allows me to customize the experience to my data. I can use a combination of custom actions to reach other web services using templated URLs and custom queries to run targeted graph analytics. You have all of this power and control from a simple right click context menu. And Linkurious’s interoperability allows me to embed Linkurious graphs in my application and easily jump from an interactive, but restricted visualization into the full graph exploration experience. In short “graph exploration” is being able to materialize all the intelligence you can infuse into the graph in an intelligent and thoughtful way – giving our user the flexibility and control they need to answer some of their most difficult questions.”

Overall, combining NLP and graph technology can revolutionize the way we analyze and interpret unstructured data, providing businesses with a competitive edge.

Integrated link analysis combining AI, NLP and graph technology

To make it simple for organizations needing to perform link analysis to mine insights and information from their siloed, unstructured data, Linkurious, Nuix, and Memgraph propose a joint solution combining the best advanced technologies. 

By integrating advanced NLP and AI with powerful yet user-friendly graph technology, the joint solution called Nuix NLP AI empowers data-driven professionals to unlock the power of link analysis on complex connected data without the need for extensive resources and highly specialized expertise.

Nuix NLP AI is paving the way for a future where unstructured data becomes a source of actionable insights and informed decision-making across industries.

The joint solution works together to resolve some of the concrete problems organizations face with their unstructured data. “One of the biggest challenges that organizations face when trying to create graphs from unstructured data is the “hairball problem”: too many nodes and edges,” says Stephen Stewart. “You don’t really have this problem with structured data because you have clear field definitions. With unstructured data you don’t have that luxury, so organizations often turn to using regular expressions. Nuix actually tried it: giant hairballs! To overcome the giant hairball problem, you need a bunch of things.” 

The joint solution answers those needs through three complementary tools:

  • Nuix AI (NLP) converts mountains of messy, unstructured data into highly searchable, well-structured data.
  • Memgraph offers a graph database that seamlessly stores this data, along with its myriad relationships, in a scalable, high-performance graph structure.
  • Linkurious Enterprise enables dynamic, interactive exploration of those relationships, offering meaningful insights into the data that are simply impossible to achieve with other solutions. 

“When you put all of this together and then layer in Linkurious’s alerting framework, you have jumped well beyond anything in the market,” says Stephen Stewart. “Nuix is able to handle the data, Memgraph can handle the data volumes and query performance required for real-time alerting and Linkurious provides an intuitive front end for graph exploration that makes it easy to interact with the entire ecosystem.”

Contact us to learn more about the joint solution today.


Subscribe to our newsletter

A spotlight on graph technology directly in your inbox.