Find the distance between friends/connections - LinkedIn System Design with

Have you noticed a small icon on the right of LinkedIn profiles?

It tells you how closely you are related to a user (1st, 2nd or 3rddegree connection).

These distances are calculated using graph algorithms. Specifically, LinkedIn uses Bidirectional BFS.

This is a twoway algorithm where a search is started from both source and destination, and terminated midway. The total distance traveled from source+destination is the degree of the connection.

But at scale, this solution brings challenges. Let's limit ourselves to finding 3rddegree connections:

1. Every query needs a graph traversal on our social network.
2. A naive BFS would take O(n^3) for search. A bidirectional BFS needs O(n^2) for searching users + O(n^2) for merging results.
3. To avoid performing these queries repeatedly, we cache the seconddegree connections of users. This saves up on the first O(n^2) factor for subsequent queries.

Great, but how do we store all this data in a single cache?

Well, we can't. LinkedIn is forced to scale out, with multiple cache nodes storing seconddegree connections. UserConnection data is sharded into caches according to user_id.

But what if a machine crashes? All queries to the shards hosted by that machine will fail.

To avoid this, we replicate shards into different machines. Even if one of the machines fails, the shard can be found on another machine.

And here we meet our main villain: High Latency.

We don't want to hit all the machines with our shards. We want to hit the SMALLEST possible number of machines, that contain all our shards.

For the mathematically inclined, this is a setcover problem. LinkedIn uses a modified greedy version of the problem to reduce its compute costs in half!

The basic idea is to find machines that host most of the users who are your 2nddegree connections, and avoid those that have few 2nddegree connections.

Less effort, more work done!

00:00 Introduction
00:35 Capacity Estimation
01:20 Bruteforce SQL query
04:00 Bidirectional BFS
06:37 Caching connection data
10:00 Potential databases
11:11 Redundancy for fault tolerance
12:12 Cold starts
13:48 Removing connections
14:37 Replicated Caches
16:03 Ideal replication factor?
17:25 Summary

Here is the paper: https://www.usenix.org/system/files/c...

References:
Sharding: • What is DATABASE SHARDING?
System Design at InterviewReady: https://interviewready.io/
Designing DataIntensive Applications Book: https://amzn.to/3SyNAOy

You can follow me on:
Github: https://github.com/InterviewReady/sys...
Twitter: / gkcs_

#SystemDesign #InterviewReady #SocialNetworks