Leveraging Relationships: Graph-Powered Data Science and Machine Learning - Part 1

Aoife McCardle
Digital Marketing Executive

One of the biggest challenges for any company is tracking a customers’ journey with them across multiple channels, devices, purchases and interactions. A 360 degree customer view collects all of the necessary data about a customer and keeps it together, usually in a CRM system where you can track all of the relevant information in one single, digital profile. 

All this data about a customers’ previous purchases like their contact details and their interactions with customer service may not seem relevant to the untrained eye, but to data scientists this is the key to predicting the future! That is to say, understanding the relationship cycle of a customer can lead to predictions about what they might like to purchase in the future.

In the 2019 Experience Index Digital Trends Report, large organisations, both B2B and B2C, saw data-driven marketing as the most exciting opportunity for the coming year. Following this, in the 2020 Digital Trends Report, it was stated that the leading companies in customer experience were three times more likely to have significantly exceeded their 2019 business goals in comparison with their peers. 

How did they achieve this? 

Data-driven marketing can massively enhance customer experience as it can target and personalise the touchpoints in the customer journey. This particular application is pushing interest towards data structure centres which are able to manage the masses of highly connected data; structure centres like graph analytics and graph technologies. While SQL databases are good for other data-driven marketing algorithms such as outlier detection, traditional content recommendation, customer segmentation, etc., they are not good for graph algorithms. This is because they store data in separate tables e.g. a table of products, a table of customers and so on, and to connect the data you have to do a JOIN in SQL. Unfortunately, while this works if you have a relatively small amount of data, it is incredibly slow when you scale up and have to join a lot of information together.

In comparison, graph databases are tailored to connections and the queries used are just looking at the connected parts instead of scanning all of the possibilities, making it a significantly shorter process. For example, in an SQL database, if you take the data of 1000 users on a social media platform and try to find “Friend Suggestions” by five degrees of separation, it could take hours to complete, whereas in Graph Database it takes seconds.

What is a Graph Database?

A graph is a collection of entities (nodes) and the relationships (edges) between them, the directional connections between nodes are the relationships. You can also include properties and constraints which are extra details about the nodes and their relationships. By using a graph database, it is possible to explore the relationships in the data far more efficiently with the use of specific algorithms.

Graph Data Science Algorithms

There are a few key data science algorithms that can analyse the graphs in order to reveal insights that are not immediately obvious:

  • Path Finding algorithms are able to answer questions such as "what is the fastest way from point A to point B?". 
  • Community Detection algorithms find partitions within the graph which can be used to analyse specific sub-communities. 
  • Link Prediction looks for nodes which are likely to become connected in the future or nodes that should be connected; for example the "people you may know" sections in Facebook and Linkedin, or recommendations sections in general.
  • Similarity algorithms which measure how similar two nodes are based on either their properties or their relationships; for example one similarity measure is their Jaccard similarity which is the number of common node ‘neighbours’ there are between two nodes, divided by the total number of ‘neighbours’.

The traditional approach of processing data in most mathematical models assumes a fully connected data set, when in reality most data points are only connected to a few others with some overlap. 

------------  Part 2 coming soon -------------

Written by:
Aoife McCardle
Contact email: