A graph database is a special type of database that stores data in the form of nodes and edges. This approach enables efficient modelling and querying of complex re­la­tion­ships. Graph databases are therefore par­tic­u­larly suitable for ap­plic­a­tions that map highly in­ter­con­nec­ted in­form­a­tion.

What is a graph database made of and what is it used for?

A graph database is, as the name suggests, based on graphs. These graphs clearly display complex in­ter­con­nec­ted in­form­a­tion and their re­la­tion­ships with one another and store them as a large, connected dataset.

The graphs consist of nodes, which are uniquely des­ig­nated and iden­ti­fi­able data entities or objects, and edges, which represent the re­la­tion­ships between these objects. Visually, the two com­pon­ents are rep­res­en­ted as points and lines. Edges each have a start and end point, while each node always has a certain number of re­la­tion­ships—whether incoming, outgoing, or un­dir­ec­ted—with other nodes.

Graph databases are used, for example, to analyse user re­la­tion­ships in social networks or the pur­chas­ing behavior of customers in online shops. By storing re­la­tion­ships, it is possible to provide product and friend­ship re­com­mend­a­tions and build in­di­vidu­al person and product networks.

Note

Re­la­tion­al databases store data in tables and use SQL for queries. In contrast, graph databases belong to the NoSQL family and offer a more flexible structure for ef­fi­ciently handling complex re­la­tion­ships between data.

Examples of graph databases

There are different concepts that describe how such graph DBMS are struc­tured. The best-known are the labeled property graph and the resource de­scrip­tion framework (RDF).

Labeled property graph

In the labeled property graph (LPG), each node and edge of the graph is assigned specific prop­er­ties, known as prop­er­ties, and labels. These store specific in­form­a­tion about the entities or re­la­tion­ships. Labels serve for cat­egor­isa­tion so that, for example, a node can be marked as a ‘person’ or ‘company’, while prop­er­ties can contain ad­di­tion­al at­trib­utes such as names, ages, or geo­graph­ic co­ordin­ates.

This structure enables very flexible and powerful data querying because re­la­tion­ships and prop­er­ties are stored directly in the database and can be retrieved through simple queries. LPGs are par­tic­u­larly well-suited for modelling complex networks in which entities and their con­nec­tions are described in different contexts.

Resource de­scrip­tion framework

In the resource de­scrip­tion framework (RDF), in­form­a­tion is organised in triples con­sist­ing of subject, predicate, and object, providing a simple structure for rep­res­ent­ing re­la­tion­ships between entities. Each triple rep­res­ents a statement where the subject des­ig­nates the resource, the predicate describes the property or re­la­tion­ship, and the object rep­res­ents the value or another resource.

With RDF, data can be linked in a stand­ard­ised way, allowing it to be combined and retrieved across different systems. This flex­ib­il­ity makes RDF par­tic­u­larly useful for ap­plic­a­tions that depend on con­nect­ing data from various sources, such as knowledge graphs.

How do queries work in a graph database?

When working with a graph-based database, various query methods are used. This is primarily because there is no unified query language. Unlike tra­di­tion­al models, graph databases also rely on special al­gorithms to fulfill their primary task: sim­pli­fy­ing and ac­cel­er­at­ing complex data queries.

The most important al­gorithms include depth-first search and breadth-first search: depth-first search explores the next deeper node, while breadth-first search moves level by level. These al­gorithms make it possible to find patterns (called graph patterns) as well as direct and indirect neigh­bor­ing nodes. Other al­gorithms allow cal­cu­la­tion of the shortest path between two nodes and iden­ti­fic­a­tion of cliques (subsets of nodes) and hotspots (highly connected data). One strength of graph databases is that re­la­tion­ships are stored directly in the database, so they do not need to be computed at query time. This results in high per­form­ance even for complex queries.

Ad­vant­ages and dis­ad­vant­ages of graph databases

The strength of a database can primarily be measured by four factors: integrity, per­form­ance, ef­fi­ciency, and scalab­il­ity. Graph databases aim to make data queries faster and easier—that is es­sen­tially their main purpose. Where re­la­tion­al databases reach their per­form­ance limits, the graph-based database model is par­tic­u­larly agile because data com­plex­ity and size do not neg­at­ively impact the query process.

In addition, the graph database model allows real-world scenarios to be stored in a natural way. The structure is similar to human thinking, which makes the con­nec­tions easy to un­der­stand. However, graph databases are not all-en­com­passing solutions. They reach their limits in terms of scalab­il­ity, for example, because they are primarily designed for single-server ar­chi­tec­ture, which poses a math­em­at­ic­al challenge for scaling. There is also no single stand­ard­ised query language.

The ad­vant­ages and dis­ad­vant­ages of graph databases at a glance:

Ad­vant­ages Dis­ad­vant­ages
query speed depends only on the number of specific re­la­tion­ships, not on the amount of data poor scalab­il­ity because of single-server ar­chi­tec­ture
results delivered in real time
clear and intuitive rep­res­ent­a­tion of re­la­tion­ships
flexible and agile struc­tures

Graph databases should not be con­sidered an absolute or better re­place­ment for tra­di­tion­al databases. Re­la­tion­al struc­tures remain useful standard models that guarantee high integrity and stability of data and allow flexible scalab­il­ity. As is often the case, the key is the intended use!

Graph database com­par­is­ons

There are various graph database examples suited for different use cases. Below are four popular models:

  • Neo4j: Neo4j is the most popular graph DBMS, designed as an open-source model.
  • Amazon Neptune: This graph database is available through the Amazon Web Services public cloud and was released in 2018 as a high-per­form­ance database.
  • SAP Hana Graph: With SAP Hana, the developer SAP created a platform built on a re­la­tion­al database man­age­ment system and enhanced it with the in­teg­rated graph-based model SAP Hana Graph.
  • OrientDB: This database combines document-oriented and graph-based database ap­proaches and is con­sidered one of the fastest currently available models.

A direct com­par­is­on shows that these databases offer various features that can be helpful depending on the specific use case:

Neo4j Amazon Neptune SAP HANA Graph OrientDB
type native managed (cloud) graph extension multi-model
query languages Cypher SPARQL, Gremlin, OpenCypher SQL-based SQL-like, Gremlin
data model(s) property graph property graph, RDF re­la­tion­al, graph model graph, documents
typical use cases social networks, fraud detection, re­com­mend­a­tion services, network man­age­ment knowledge graphs, identity and access man­age­ment, cloud-native apps business analytics, IoT, financial analysis, SAP ap­plic­a­tions content man­age­ment, complex data re­la­tion­ships, dis­trib­uted systems
Go to Main Menu