Online social networks have become important tools for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed: two individuals might know each other, but may not have established a connection on the site.
Therefore, link prediction and recommendations are important tasks for any online social network. We published a paper in the 22nd International World Wide Web Conference (WWW), May 2013 that describes how we developed a novel organizational overlap model for link prediction between two users in a network:
In this post, I’ll briefly discuss some of the highlights from the paper, including link prediction, recommendations, and community detection.
Link prediction and recommendations
Social network sites use recommendation systems such as LinkedIn’s ‘People You May Know’ to enable a significant number of link creations.
A basic problem in network analysis is predicting links for partially observed networks, that is, given a snapshot of connections at time t, can we predict links at time t+1. On any online social network, two members might know each other, but may not have established a connection on the site. Link prediction and recommendations help address this problem and create a more complete social graph to improve user involvement.
As part of our research to understand edge affinity between users, we built a novel model factoring in the time of joining and departing an organization. The logic is simple: the affinity between two members who worked together in an organization for 10 years is greater than members who've worked together for only a few months. We built a mathematical model based on this organizational time overlap and validated the model with LinkedIn’s social network data.
We used this model to predict existing edges on LinkedIn and two other public networks and found that this method’s top-5 prediction accuracy was 42% better than Common Neighbor and Adamic-Adar based link prediction. We also showed empirically that our model works for diverse organizations such as companies, schools, and online groups.
Detecting communities within an organization is another important challenge. On most online social networks, a user can follow an entity to receive updates on it within a personalized news feed. For example, members can follow a company on LinkedIn and receive company updates.
To recommend entities for a member to follow, we look at entities the member's community is already following. Simply using the entire organization yields inferior results, as most organizations are diverse and contain several orthogonal groups (for example, sales, marketing, engineering) and subgroups (for example, front-end, database, machine learning).
As another example, consider a news feed generated by online activity and how its volume can quickly overwhelm a user. A key feature in ranking a news feed is to promote an update if the member is in the same community as the originator of the update.
The organizational overlap model also works well for detecting communities within an organization. It is usually hard to evaluate the quality of communities because of a lack of ground truth. We used an indirect method to evaluate the quality: intuitively, the speed of information propagation should be faster within a community, so we measured the quality of detected communities by the speed of information propagation within it.
We evaluated detected communities within the LinkedIn network by the propagation speed of company follows and sharing activity. Results show that communities detected by our method are up to 66% better than communities detected by only links in terms of the propagation speed of shared articles, and 15% better in terms of the propagation speed of company follows.
For more details, check out the full paper: