Databases are necessary for or­gan­ising in­form­a­tion in a practical manner. However, there are various ways for databases to be struc­tured. In elec­tron­ic data pro­cessing, re­la­tion­al databases are par­tic­u­larly common and wide­spread. Besides these, there are document-based databases. These are based on a simple table structure with documents for storing in­form­a­tion. How do these databases work and what are their ad­vant­ages?

What is a Document Store?

Document-oriented databases, also known as document stores, are used to manage semi-struc­tured data. This data does not adhere to a fixed structure, instead it forms its own structure. The in­form­a­tion can be ordered using markers within the semi-struc­tured data. Due to the lack of a defined structure, this data is not suitable for re­la­tion­al databases since its in­form­a­tion cannot be arranged in tables.

A document database creates a simple pair: A key is assigned to a specific document. The actual in­form­a­tion is then located within this document, which may be formatted as an XML, JSON or YAML file. Since the document does not require a specific schema, different types of documents can also be in­teg­rated together in a document store. Changes to the documents do not have to be com­mu­nic­ated to the database.

Note

Document-based databases are very similar to other database models: the system can be con­sidered a sub­cat­egory of NoSQL databases and it’s closely related to key-value databases due to the com­bin­a­tion of keys and documents. As a row-oriented system, it stands in contrast to column-oriented databases.

How Do Document Databases Work?

In theory, data in all sorts of formats, even without a con­sist­ent schema can be stored in a document-based database. In practice, however, a file format is typically used for the documents and the in­form­a­tion is ordered in a certain structure. This makes it easier to work with the in­form­a­tion and database. By using data struc­tures, database search queries can be processed more ef­fect­ively for example. You can generally perform the same actions in a document-based database as with a re­la­tion­al system: in­form­a­tion can be added, changed, deleted, and queried.

To allow these actions to take place, each document is given a unique ID. How this iden­ti­fi­er is con­sti­tuted is not par­tic­u­larly important. Both a simple number series, or the complete pathway can be used to address the document. When searching for in­form­a­tion, the documents them­selves are checked. In other words, the data is pulled directly from the documents rather than from the columns within the database.

What Are the Pros and Cons of Document Stores?

In con­ven­tion­al re­la­tion­al databases, a field has to exist for each piece of in­form­a­tion—and in every entry. If the in­form­a­tion is not available, the cell is kept empty, but it must still exist. Document-oriented databases are much more flexible: the structure of in­di­vidu­al documents does not have to be con­sist­ent. Even large volumes of un­struc­tured data can be ac­com­mod­ated in the database.

Plus, it’s easier to integrate new in­form­a­tion. While in the case of a re­la­tion­al database a new in­form­a­tion criterion must be added to all datasets, the new in­form­a­tion only needs to be included in just a few datasets in a document store. The ad­di­tion­al content can be added to further documents, but it’s not required.

Moreover, with document stores the in­form­a­tion is not dis­trib­uted over multiple linked tables. Everything is contained in a single location, and this can result in better per­form­ance. However, this speed advantage is only realised in document databases so long as you don’t attempt to use re­la­tion­al elements: ref­er­ences don’t really suit the concept of document stores. If you do try to interlink the documents, the system will become highly complex and cum­ber­some. So, a re­la­tion­al database system is more advisable for highly networked data volumes.

The Most Well-Known Document Databases

Es­pe­cially for the de­vel­op­ment of web apps, databases for documents are hugely important. Due to the increased need resulting from web de­vel­op­ment, numerous database man­age­ment systems (DBMSs) have meanwhile been released on the market. The most well-known examples are outlined below:

  • BaseX: This open-source project uses Java and XML. BaseX is supplied with a graphical user interface.
  • CouchDB: The Apache Software Found­a­tion released the open-source software CouchDB. The database man­age­ment system is written in Erlang, uses JavaS­cript, and is utilized in Ubuntu and Facebook ap­plic­a­tions among others.
  • Elast­ic­search: This search engine works based on a document-oriented database. JSON documents are used to this end.
  • eXist: The open-source DBMS runs on a Java virtual machine and can therefore be used re­gard­less of the operating system. XML documents are primarily used.
  • MongoDB: MongoDB is by far the most wide­spread NoSQL database. The software is written in C++ and uses JSON-like documents.
  • SimpleDB: With SimpleDB (written in Erlang), Amazon developed its own DBMS for the company’s Cloud services. The provider charges a fee for use.
Go to Main Menu