A graph database is a special type of database that stores data in the form of nodes and edges. This approach enables efficient modelling and querying of complex relationships. Graph databases are therefore particularly suitable for applications that map highly interconnected information.

What is a graph database made of and what is it used for?

A graph database is, as the name suggests, based on graphs. These graphs clearly display complex interconnected information and their relationships with one another and store them as a large, connected dataset.

The graphs consist of nodes, which are uniquely designated and identifiable data entities or objects, and edges, which represent the relationships between these objects. Visually, the two components are represented as points and lines. Edges each have a start and end point, while each node always has a certain number of relationships—whether incoming, outgoing, or undirected—with other nodes.

Graph databases are used, for example, to analyse user relationships in social networks or the purchasing behavior of customers in online shops. By storing relationships, it is possible to provide product and friendship recommendations and build individual person and product networks.

Note

Relational databases store data in tables and use SQL for queries. In contrast, graph databases belong to the NoSQL family and offer a more flexible structure for efficiently handling complex relationships between data.

Examples of graph databases

There are different concepts that describe how such graph DBMS are structured. The best-known are the labeled property graph and the resource description framework (RDF).

Labeled property graph

In the labeled property graph (LPG), each node and edge of the graph is assigned specific properties, known as properties, and labels. These store specific information about the entities or relationships. Labels serve for categorisation so that, for example, a node can be marked as a ‘person’ or ‘company’, while properties can contain additional attributes such as names, ages, or geographic coordinates.

This structure enables very flexible and powerful data querying because relationships and properties are stored directly in the database and can be retrieved through simple queries. LPGs are particularly well-suited for modelling complex networks in which entities and their connections are described in different contexts.

Resource description framework

In the resource description framework (RDF), information is organised in triples consisting of subject, predicate, and object, providing a simple structure for representing relationships between entities. Each triple represents a statement where the subject designates the resource, the predicate describes the property or relationship, and the object represents the value or another resource.

With RDF, data can be linked in a standardised way, allowing it to be combined and retrieved across different systems. This flexibility makes RDF particularly useful for applications that depend on connecting data from various sources, such as knowledge graphs.

How do queries work in a graph database?

When working with a graph-based database, various query methods are used. This is primarily because there is no unified query language. Unlike traditional models, graph databases also rely on special algorithms to fulfill their primary task: simplifying and accelerating complex data queries.

The most important algorithms include depth-first search and breadth-first search: depth-first search explores the next deeper node, while breadth-first search moves level by level. These algorithms make it possible to find patterns (called graph patterns) as well as direct and indirect neighboring nodes. Other algorithms allow calculation of the shortest path between two nodes and identification of cliques (subsets of nodes) and hotspots (highly connected data). One strength of graph databases is that relationships are stored directly in the database, so they do not need to be computed at query time. This results in high performance even for complex queries.

Advantages and disadvantages of graph databases

The strength of a database can primarily be measured by four factors: integrity, performance, efficiency, and scalability. Graph databases aim to make data queries faster and easier—that is essentially their main purpose. Where relational databases reach their performance limits, the graph-based database model is particularly agile because data complexity and size do not negatively impact the query process.

In addition, the graph database model allows real-world scenarios to be stored in a natural way. The structure is similar to human thinking, which makes the connections easy to understand. However, graph databases are not all-encompassing solutions. They reach their limits in terms of scalability, for example, because they are primarily designed for single-server architecture, which poses a mathematical challenge for scaling. There is also no single standardised query language.

The advantages and disadvantages of graph databases at a glance:

Advantages Disadvantages
query speed depends only on the number of specific relationships, not on the amount of data poor scalability because of single-server architecture
results delivered in real time
clear and intuitive representation of relationships
flexible and agile structures

Graph databases should not be considered an absolute or better replacement for traditional databases. Relational structures remain useful standard models that guarantee high integrity and stability of data and allow flexible scalability. As is often the case, the key is the intended use!

Graph database comparisons

There are various graph database examples suited for different use cases. Below are four popular models:

  • Neo4j: Neo4j is the most popular graph DBMS, designed as an open-source model.
  • Amazon Neptune: This graph database is available through the Amazon Web Services public cloud and was released in 2018 as a high-performance database.
  • SAP Hana Graph: With SAP Hana, the developer SAP created a platform built on a relational database management system and enhanced it with the integrated graph-based model SAP Hana Graph.
  • OrientDB: This database combines document-oriented and graph-based database approaches and is considered one of the fastest currently available models.

A direct comparison shows that these databases offer various features that can be helpful depending on the specific use case:

Neo4j Amazon Neptune SAP HANA Graph OrientDB
type native managed (cloud) graph extension multi-model
query languages Cypher SPARQL, Gremlin, OpenCypher SQL-based SQL-like, Gremlin
data model(s) property graph property graph, RDF relational, graph model graph, documents
typical use cases social networks, fraud detection, recommendation services, network management knowledge graphs, identity and access management, cloud-native apps business analytics, IoT, financial analysis, SAP applications content management, complex data relationships, distributed systems
Was this article helpful?
Go to Main Menu