Security is of paramount im­port­ance in many in­dus­tries when it comes to pro­cessing data. Companies handling important business processes are reliant on data stability, provided  from the likes of a hosting service that stores their customers’ in­form­a­tion. If there’s a serious memory error, it’s not just a financial loss that occurs, but a company’s position on the market can be seriously weakened if worst comes to worst. The more memory that’s stored, the more likely it is that errors will occur. This is why it’s so important to place great emphasis on com­pre­hens­ive pro­tec­tion of data in work and server en­vir­on­ments that require high data integrity. For example, ECC RAM is used in place of ordinary memory so that single-bit errors can be prevented.

ECC RAM: back­ground and defin­i­tion

Random Access Memory (RAM) is a storage medium used in computer systems as a memory. It’s also known as the main memory and is re­spons­ible for the execution of programs including the resulting user data. The volatile contents of the main memory are stored as binary code, which consists solely of zeros and ones, which makes it easier for the computer to process them. A single binary digit is called a 'bit'. These various causes

  • Voltage vari­ations,
  • Over­clock­ing,
  • Defective and old storage modules,
  • or energetic emission

can lead to a bit error whereby memory entry is changed. This is where a bit assumes the wrong value, i.e. '1' instead of '0' and vice versa. This is hardly no­tice­able in many ap­plic­a­tions. If a bit error occurs, for example, when working with an image-editing program, one pixel might receive a different colour, which isn’t no­tice­able to the human eye. On the other hand, it is quite different in complex databases or cal­cu­la­tion ap­plic­a­tions where a single bit error can lead to fatal con­sequences. In addition, a bit error can cause system crashes when it occurs in a part of the memory used by the operating system.

The simple solution to the problem is error cor­rect­ing code (ECC). This is a data code which has the ability to detect and correct single bit errors. In addition, ECC can detect rare two-bit errors. In order to benefit from this error cor­rec­tion method, ordinary RAM modules are extended by an ECC memory chip, which is where ECC RAM comes into play.

How the error cor­rec­tion process works

The error cor­rec­tion process for single-bit errors (which is used for RAM modules) was developed in 1950 by the math­em­atician, Richard Hamming, which is why the code is called the Hamming code. The special feature of this code is that several parity bits are used. They are also known as control bits and form different val­id­a­tion groups with the actual useful bits. If you want to use the Hamming code for single-bit error cor­rec­tion, you require a seven-digit binary code, con­sist­ing of three parity bits (P), four useful bits (N), and three val­id­a­tion groups. The parity bits are thereby set to the code word positions, whose number is a power of 2, in this example, 1, 2, and 4:

The val­id­a­tion groups of the parity bits of the received bit sequences are compared with the stored bit sequences. An error will always occur when the total number of bits with the value 1 is odd.

Applied to the exemplary bit sequence 0001001, the Hamming code de­term­ines the error as follows:

  • The val­id­a­tion group of parity bit 1 (1, 3, 5, 7) contains a bit with the value 1 and is therefore incorrect.
  • The val­id­a­tion group of parity bit 2 (2, 3, 5, 7) contains a bit with the value 1 and is therefore incorrect.
  • The val­id­a­tion group of parity bit 3 (4, 5, 6, 7) contains a bit with the value 1 and is therefore correct.

Since code word position 3 is present in the first two incorrect val­id­a­tion groups, this is where the error is. The correct bit sequence is 0011001.

ECC RAM – also suitable for personal use?

ECC fully protects the main memory against single bit errors and thereby prevents a large portion of possible data storage errors. Closely linked to this is the reduction of system crashes, which is par­tic­u­larly important for services or ap­plic­a­tions that guarantee high avail­ab­il­ity and have to serve a large number of users. These ECC RAM ad­vant­ages ensure that the special memory modules are par­tic­u­larly required as a server RAM solution and are part of the com­puls­ory program in high-per­form­ance centres.

ECC RAM has minor dis­ad­vant­ages, however, compared to non-ECC RAM: on the one hand, the error-cor­rect­ing memory modules are somewhat more expensive than the usual working memory modules, and the error detection process leads to an average 2% decrease in the system’s per­form­ance. Also, ECC RAM is not supported on all main­boards. So, if you plan on using ECC RAM on a normal board, you should first check the com­pat­ib­il­ity and assess the benefits. A com­bin­a­tion of ECC RAM and non-ECC RAM is not possible. By default, your personal computer or server comes with an ordinary working memory module without error cor­rec­tion.

Go to Main Menu