To use the pandemic as an example, the traditional method assumes all members of the population are mixing at once which would mean that all infected individuals have an equal rate of infecting all members of the population or sub population that they belong to. In truth, most of us only have daily contact with our family, co-workers and people on public transportation, meaning we can ignore the rest of the population as we do not have a direct connection with them, creating graphs with individuals only connected to a few others.
When examining data in a graph, we can see the degree of a node, this is the number of neighbouring nodes it is connected to, the higher the number of connections, the higher the degree. The more connections, the more susceptible it is to influence from the other nodes. We can also examine the betweeness of nodes, which is how many shortest paths pass through each node. Another feature is the number of triangles a node is part of, three nodes connected to one another. Finally you have transitivity which shows how many of a node's neighbours are also neighbours. If you were to use a graph database to track the spread of a virus, those features can further be combined with a number of cases in each area and its nearby areas using a simple case of linear model to get an estimate of future cases.
Graphs can also be used to suggest recommendations. Previously Collaborative Filtering and Content-Based Filtering were the main methods of recommending products and features. Collaborative filtering makes recommendations based on what similar users like while content-based filtering searches for products with similar attributes e.g. a user would be recommended Interstellar because it shares some genres with Star Wars. However, when you start to consider relationships beyond these primary ones, the need for graphs becomes prevalent.
A knowledge graph shows how many ways you can make recommendations e.g. if they like the movie “Cloud Atlas”, they may also like “Catch Me If You Can” because Tom Hanks stars in both. Alternatively, if they are looking to buy a book we can recommend the book Cloud atlas. Although these suggestions are technically possible in a traditional SQL database, it becomes increasingly complicated and extremely slow. Graphs allow you to focus on only the nodes with connections, ignoring the irrelevant observations and saving significant resources.
Another option for running graph algorithms is GraphX in Apache Spark, it's not a graph native as it just allows you to run the algorithms in a distributed fashion and the data storage is not optimised for graphs. However data can be transmitted between Neo4j and GraphX if you want to use both systems, Neo4j for storage and GraphX for computation.
While this is a very effective strategy, to maintain the benefits it is essential to free up human resources through Artificial Intelligence (AI) and Machine Learning (ML) automation.
AI/ML based technologies not only process data for easier human consumption, but they can learn and adapt from the data they process, taking automation further by constantly testing what works to make smarter assumptions in future experiments.
Using a real-time ML engine to process the graph data science algorithms means that millions of nodes and relationships can be analysed quickly and quietly in the background, allowing for employees to focus on development, creativity and future strategy while keeping costs down significantly.