GlusterFS is a dis­trib­uted, ar­bit­rar­ily scalable file system that ag­greg­ates storage com­pon­ents from several servers into one, uniform file system. Files systems work in the back­ground. Hardly anyone thinks about them after they’ve been installed. That most often changes, though, when data is lost or the file system has reached its limits, either because the maximum size of a partition has been reached or due to lim­it­a­tions on the depths of storage path segments, for example.

Who and what is behind GlusterFS?

The name “Gluster” is a com­bin­a­tion of “GNU” (itself an acronym for “GNU’s not Unix!”) and “cluster.” The system was published with a GNU-General Public License (GNU-GPLS), making it free of charge to use. The term “cluster”, in relation to data carriers, is used to describe a com­bin­a­tion of physical storage units. In relation to computers, it is used to indicate a connected network of several systems. GlusterFS merges these concepts by combining storage space from computers connected over a network and using it as a single logical entity.

The project was published in 2005 by Gluster Inc. In 2011, the Linux Dis­trib­ut­or RedHat took over the company and has since continued to develop the file system. Version 7 of GlusterFS made its debut in January 2020, and has been pre­com­piled for the following Linux dis­tri­bu­tions:

  • CentOS
  • Debian
  • Fedora
  • RedHat/RHEL
  • SUSE
  • Ubuntu

The lim­it­a­tion to a Unix-based system is the memory’s in­teg­ra­tion into the FUSE module, which has yet to be made ad­equately stable for Windows systems.

Note

FUSE is an acronym for Filesys­tem in Userspace. Operating systems are usually sub­divided into user and kernel modes. The latter is par­tic­u­larly well-secured; for example, it can only be accessed by someone with ad­min­is­trat­or rights. As such, mounting and managing drives can normally only be done by a network ad­min­is­trat­or. However, FUSE allows other users to manage the file system.

Computers can function both as servers and clients. Access to the file system is also possible from other systems that are supported, such as NFS (Network File System) and SMB/CIFS (Server Message Block/Common Internet File System).

GlusterFS func­tion­al­it­ies

A dis­trib­uted file system only really makes sense when several computers are connected to each other. The doc­u­ment­a­tion published by GlusterFS states that at least three servers are required. However, the term “server” in this sense should not be taken literally. Virtually any kind of physical or emulated hardware can be in­teg­rated. Besides normal computers, the use of virtual machines is also feasible. This also comes with many benefits, es­pe­cially with regard to flex­ib­il­ity.

In­teg­rated servers act as nodes, which are connected to each other through the TCP/IP network. The in­teg­rated devices create a so-called trusted storage pool, whose memory is provided in the form of bricks. Volumes are then built from these bricks. These can sub­sequently be in­teg­rated and used like normal data carriers. Computers with access are iden­ti­fied as clients, but it is possible for one PC to be both a server and a client.

A special feature is the software’s tre­mend­ous scalab­il­ity. Any number of nodes and bricks can then be added later on, and the size of the storage space can be adjusted according to any new re­quire­ments. The storage space to be managed has a maximum size of several petabytes.

In addition, GlusterFS guar­an­tees re­li­ab­il­ity through re­dund­ancy. The risk of mal­func­tion is initially dis­trib­uted among several systems that can also be spatially separated from one another. It is also possible to set up RAID networks. However, in contrast to the standard specified dis­trib­uted volume, a rep­lic­ated volume must be stored in this case. As such, each file will be saved twice, which is called RAID mirroring.

Fact

Redundant Array of In­de­pend­ent Disks (RAID) is a network of phys­ic­ally in­de­pend­ent hard drives, from which one unified drive is created. The focus can be centred around speed or data security, depending on your objective. Storage space is cor­res­pond­ingly reduced through the repeated saving of data or the storage of ad­di­tion­al in­form­a­tion needed for restoring a file.

For op­er­a­tions done within the storage space, GlusterFS offers ten pre­defined trans­lat­ors, which translate commands that are given by users to be executed. Two examples are the “storage” trans­lat­or, which stores data on the local file system and controls access to it, and the “en­cryp­tion” trans­lat­or.

A new function is geo-rep­lic­a­tion, which can be used to execute an asyn­chron­ous dis­tri­bu­tion of data among servers in different locations. This provides ad­di­tion­al pro­tec­tion from external, physical impacts on the servers, such as in the event of fire or theft. In this case, one computer acts as the master and another as the slave. Data transfer is secured by SSH (Secure Shell).

Pros and cons of GlusterFS

We’ve compiled a few pros and cons of a dis­trib­uted file system in com­par­is­on to con­ven­tion­al network memory in the table below:

Pros of Gluster Cons of Gluster
Good util­isa­tion of existing ca­pa­cit­ies Creation of a complex network structure
Increased re­li­ab­il­ity Increased ad­min­is­trat­ive effort during set-up
Network load dis­tri­bu­tion Quick network in­fra­struc­ture is needed
Very good scalab­il­ity Ad­di­tion­al effort required for technical security

Ap­plic­a­tions of GlusterFS

GlusterFS basically creates a classic cloud. Storage space within a network will then be made available to connected clients. This is par­tic­u­larly suitable for large networks that already have suf­fi­cient resources available for the creation of a grouped network.

Since devices are connected through the Internet protocol, the use of a dis­trib­uted file system is es­pe­cially suitable for company struc­tures that include several branch offices. However, dedicated network memory can also be saved in locally-re­stric­ted networks this way, without even needing to forego re­dund­ancy.

Tip

Would you like to work with GlusterFS yourself? IONOS has written a com­pre­hens­ive GlusterFS how-to article for in­stalling and setting up the file system.

GlusterFS al­tern­at­ives

One notable al­tern­at­ive to GlusterFS is Ceph, which is freely available and also offers many of the afore­men­tioned benefits of dis­trib­uted file systems. Ceph and Gluster each have their own differing pros and cons.

BeeGFS (formerly FhGFS) was developed by the Fraunhofer Society in Germany spe­cific­ally for powerful computer systems. It is available free of charge and focuses on easy usability.

In the com­mer­cial sector, there are ad­di­tion­al systems such as Storage Spaces Direct (S2D) by Microsoft. However, the use of this system is limited to fee-based, licensed Windows servers.

Go to Main Menu