Ceph is a com­pre­hens­ive storage solution that uses its very own Ceph file system (CephFS). Ceph offers the pos­sib­il­ity to file various com­pon­ents within a dis­trib­uted network. In addition, the data can be phys­ic­ally secured in various storage areas. Ceph guar­an­tees a wide variety of storage devices from which to choose, alongside high scalab­il­ity.

IONOS Cloud Object Storage
Cloud storage at an un­beat­able price

Cost-effective, scalable storage that in­teg­rates into your ap­plic­a­tion scenarios. Protect your data with highly secure servers and in­di­vidu­al access control.

What you should know about Ceph and its most important features

Ceph was conceived by Sage A. Weil, who developed it while writing his dis­ser­ta­tion and published it in 2006. He then led the project with his company Intank Storage. In 2014, the company was acquired by RedHat, with Weil staying on as the chief architect, in charge of the software’s de­vel­op­ment.

Ceph only works on Linux systems, for example CentOS, Debian, Fedora, RedHat/RHEL, OpenSUSE, and Ubuntu. Accessing the software through Windows systems cannot be done directly, but is possible through the use of iSCSI (Internet Small Computer System Interface). As such, Ceph is par­tic­u­larly suitable for use in data centers that make their storage space available over servers, and for cloud solutions of any kind that use software to provide storage.

We have complied a list of the most important features of Ceph:

  • Open-source
  • High scalab­il­ity
  • Data security through redundant storage
  • Absolute re­li­ab­il­ity through dis­trib­uted data storage
  • Software-based increase in avail­ab­il­ity through an in­teg­rated algorithm for locating data
  • Con­tinu­ous memory al­loc­a­tion
  • Minimal hardware re­quire­ments (set-up possible with 1 GB RAM on a computer with a single-core processor and only a few GB of available storage space, depending on the task in the network)

Ceph func­tion­al­it­ies

Ceph requires several computers that are connected to one another in what is called a cluster. Each connected computer within that network is referred to as a node.

The following tasks must be dis­trib­uted among the nodes within the network:

  • Monitor nodes: Monitor the status of in­di­vidu­al nodes in the cluster, es­pe­cially the managers, object storage devices, and metadata servers (MDS). In order to ensure maximum re­li­ab­il­ity, at least three monitor nodes are re­com­men­ded.
  • Managers: Manage the status of storage usage, system load, and the capacity of the nodes.
  • Ceph-OSDs (Object Storage Devices): The back­ground ap­plic­a­tions for the actual data man­age­ment; they are re­spons­ible for the storage, du­plic­a­tion, and res­tor­a­tion of data. At least three OSDs are re­com­men­ded for a cluster.
  • Metadata servers (MDSs): Store metadata, including storage paths, file names, and time stamps of files stored in the CephFS for per­form­ance reasons. They are POSIX-com­pat­ible, and can be queried using Unix command lines such as is, find, and like.

The center­piece of the data storage is an algorithm called CRUSH (Con­trolled Rep­lic­a­tion Under Scalable Hashing). It uses an al­loc­a­tion table called the CRUSH Map to find an OSD with the requested file.

Ceph pseudo-randomly dis­trib­utes files, meaning that they appear to be filed in­dis­crim­in­ately. However, CRUSH actually chooses the most-suitable storage location based on fixed criteria, after which the files are du­plic­ated and then saved on phys­ic­ally separate media. The ad­min­is­trat­or of the network can set the relevant criteria.

Files are organized into placement groups. File names are processed as hash values. Another or­gan­iz­a­tion­al property is the quantity of file du­plic­ates.

Note

Hash values are strings of numeric values that are returned following the pro­cessing of input by certain computing op­er­a­tions. An easier approach would be to generate the checksum from the raw data. However, highly-complex al­gorithms do come into play that create unique digital fin­ger­prints out of data of any length. The output always has the same compact length and does not contain any unwanted symbols, making it suitable for the pro­cessing of file names, as well.

In order to guarantee data security, journ­al­ing is used on the OSD level. Every file to be saved is stored tem­por­ar­ily until it has been properly saved on the intended OSD.

Accessing stored data

The base of the Ceph data storage ar­chi­tec­ture is called RADOS, a reliable, dis­trib­uted object store comprised of self-healing, self-mapping, in­tel­li­gent storage nodes.

There are several ways to access stored data:

  • librados: Native access is possible by using the librados software libraries through APIs in pro­gram­ming and scripting languages, such as C/C++, Python, Java, and PHP.
  • radosgw: Data can either be read or written by means of the HTTP Internet protocol in this gateway.
  • CephFS: This is the POSIX-com­pat­ible, inherent file system. It offers a kernel module for computers with access, and also supports FUSE (a file system creation interface that does not require ad­min­is­trat­or rights).
  • RADOS Block Device: In­teg­ra­tion using block storage through a kernel module or a virtual system like QEMU or KVM.

Al­tern­at­ives to Ceph

The most popular al­tern­at­ive is GlusterFS, which also belongs to the Linux dis­trib­ut­or RedHat/RHEL and can also be used at no cost. Gluster follows a similar approach for ag­greg­at­ing dis­trib­uted memory into a unified storage location within the network. Both solutions, GlusterFS vs Ceph, have their own pros and cons.

There are other free al­tern­at­ives, such as XtremFS and BeeGfs. Microsoft offers com­mer­cial, software-based storage solutions for Windows servers, including Storage Spaces Direct (S2D).

Compute Engine
The ideal IaaS for your workload
  • Cost-effective vCPUs and powerful dedicated cores
  • Flex­ib­il­ity with no minimum contract
  • 24/7 expert support included

Pros and cons of Ceph

Ceph is the best choice in many situ­ations, but this method for storing data also comes with some dis­ad­vant­ages.

Pros of Ceph

Ceph is free and is also an es­tab­lished method, despite its com­par­ably young de­vel­op­ment history. You can find a large amount of helpful in­form­a­tion online regarding its set-up and main­ten­ance. In addition, the ap­plic­a­tion has been ex­tens­ively doc­u­mented by the man­u­fac­turer. The take-over by RedHat is enough to ensure that it will continue to be developed for the near future. Its scalab­il­ity and in­teg­rated re­dund­ancy ensure data security and flex­ib­il­ity within the network. On top of that, avail­ab­il­ity is also guar­an­teed by the CRUSH algorithm.

Note

Re­dund­ancy in this sense means “surplus.” In computer tech­no­logy, it is used to designate ad­di­tion­al, surplus data. In this case, data re­dund­ancy is often de­lib­er­ately executed in order to ensure data security and re­li­ab­il­ity. This is possible on both the soft- and hardware levels: On the one hand, data or in­form­a­tion that is required for data re­con­struc­tion can be stored several times in the memory; on the other hand, phys­ic­ally separate storage com­pon­ents can be made available in several places, in order to com­pensate for any mal­func­tions from in­di­vidu­al computers.

Cons of Ceph

Due to the variety of com­pon­ents provided, a com­pre­hens­ive network is required, in order to be able to fully use all of Ceph’s func­tion­al­it­ies. In addition, the set-up is re­l­at­ively time consuming, and the user cannot be entirely sure where the data is phys­ic­ally being stored.

Go to Main Menu