Recommandation and graphs: an online dating use case
Graph technologies are very good for recommendation. It is no wonder that the biggest online dating websites are using it. We are going to see through a concrete example how to use graphs to find love with the Neo4j graph database.
Online dating and graphs : a love story?
In the last months a few of the major dating sites have switched to Neo4j? Why are they interested in graphs? Their business is match-making, helping singles find other people they’d like to meet. This is a typical recommendation problem. Major online dating sites are using graph databases like Neo4j to solve that recommendation problem. It helps them suggest in real-time potential dates to their customers. The better the suggestions, the more chances people will want to meet…and enjoy doing so. We are going to see how to do recommendation with graphs. For this we will use a online dating example. Of course, the same approach could be applied to other domains like retail.
A graph data model for online dating
In order to show how to use graphs to write recommendation algorithms, we are going to use a fake dataset. It emulates the kind of data an online dating site would have. It has been prepared by Max de Marzi of Neo Technology who used to show how Neo4j can be used for match making. Here is a quick overview of the underlying data model :
As you can see, the data can be modeled as a graph with people, locations and attributes.
The graph is centered around the people. People are linked to the locations where they live in and to attributes. People can have two kinds of relationships with an attribute : they can “want” it (it means they want their potential dates to have that attribute) or they can “have” it (it means they have the attribute). For example, in the graph, we can see that Nicole has the attributes “calm” and “smart”. She wants someone who has the attribute “sweet”.
At this point what we have done is simply express the data in a way that makes sense. Now, let’s start asking questions. What we want is to find good matches between people. The data model we have is going to help us do that. It illustrates a simple truth : people are connected through the things they share.
Simply by looking at the data model, we can see that Nicole and John would be a good match. They both live in London for one. In addition John is “sweet” which is what Nicole wants. Nicole is “smart” and that is what John is looking for.
Using Neo4j as a recommendation engine
How to apply this simple insight at scale? Old-school match makers could consider individual people and their preferences , weigh them and come up with introduction. Online dating websites have to sort through hundreds of thousands of users and preferences. Thankfully, we are going to see that it’s easy to write a quick recommendation algorithm using Cypher, the query language for Neo4j. Max de Marzi has written a nice recommendation for the data we are using :
START me=node:users_index(name={user})
MATCH me-[:lives_in]->city<-[:lives_in]-person
WHERE me.orientation = person.orientation AND
((me.gender <> person.gender AND me.orientation = "straight") OR
(me.gender = person.gender AND me.orientation = "gay")) AND
me-[:wants]->()<-[:has]-person AND
me-[:has]->()<-[:wants]-person
WITH DISTINCT city.name AS city_name, person, me
MATCH me-[:wants]->attributes<-[:has]-person-[:wants]->requirements<-[:has]-me
RETURN city_name, person.name AS person_name,
COLLECT(attributes.name) AS my_interests,
COLLECT(requirements.name) AS their_interests,
COUNT(attributes) AS matching_wants,
COUNT(requirements) AS matching_has
ORDER BY matching_wants / (1.0 / matching_has) DESC
LIMIT 10
If you want to learn more about Cypher, I encourage you to go to Max’s blog and read the breakdown of the query.
The query outputs a list of potential matches for a given user. What it does is take a person, find the contacts that live close, share his sexual orientation and have a good fit in terms of interests. That is the simple but powerful logic behind most dating sites.
With Neo4j, the algorithm can be expressed in a few lines of code and give results in real-time. That is a huge difference compared to what traditional databases offer. Typically, the same kind of query would involve multiple table-joints : it would be harder to write and slow to run.
How to apply it in production
A few online dating web sites are already using Neo4j in production. It powers the recommendation users are offered. Here is for example, how Meetic (a French-based dating website) uses Neo4j. In these contexts tools like Linkurious can bring a lot of value :
- help refine the algorithm (looking at the data will help understand what’s going and how to improve the results);
- manage and correct the data to make sure it stays relevant ;
- investigate suspicious activities (spammers for example) ;
Graph data can be complex to work with. Linkurious and other graph visualization solutions make it easier for IT teams, data scientists and analysts to leverage graph data.
Graphs are not only useful for online dating. If your team is working on recommendation problems, graph technologies might be a good choice. With graphs, it is possible to store big data sets and analyse it to provide user-centered insights. Start doing it too!
A spotlight on graph technology directly in your inbox.