Whitepages needed something that could match its strong requirements :
- Scalable — Distributed solution; just add nodes ;
- Available — AP design; robust fault-tolerance ;
- High performance — > 30,000 vertices/sec ;
- High ingest rate — 200+ updates/sec ;
The system would have to support a dataset that is naturally connected. It would also have to support queries that are centered on the exploration of relationships between entities. Finally it would have to be agile enough to adapt to new business/customer requirements.
The team led by Devin Ben-Hur settled on Titan. Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time.
Whitepages decied to use Titan together with Cassandra. It tested this architecture in 3 steps :
- Local Deployment (Single node Cluster on a Mac) ;
- Small Cluster (5 nodes in AWS, 7.5-10 GB of data) ;
- “Full” Cluster (60 nodes in AWS, 3 TB of data) ;
The test proved highly conclusive with 400 requests per second Titan delivered the results in 47 ms whereas the old system took 140 ms. Adding 200 writes simultaneously barely impacted the performances.
Whitepages is moving to Titan to handle a billion+ entities. When the project will be over, the Whitepages graph will store the most comprehensive and accurate data for people and businesses in North America, including the best mobile data available anywhere.
Using a graph database like Titan allows Whitepages to use a more natural model to store its data.