Long short-term memory (LSTM) is a way of training neural networks and storing important in­form­a­tion for the long term. The tech­no­logy uses short-term and long-term memory for this purpose and is of crucial im­port­ance for the further de­vel­op­ment of ar­ti­fi­cial in­tel­li­gence.

What is long short-term memory (LSTM)?

Long short-term memory (LSTM) is a computer science technique that’s used to store in­form­a­tion within a neural network over a longer period of time. This is par­tic­u­larly important in the pro­cessing of se­quen­tial data. Long short-term memory allows the network to access previous events and take them into account for new cal­cu­la­tions. This dis­tin­guishes it in par­tic­u­lar from Recurrent Neural Networks (RNN) or can ideally com­ple­ment them. Instead of a simple ‘short-term memory’, the LSTM has an ad­di­tion­al ‘long-term memory’ in which selected in­form­a­tion is stored over a longer period of time.

Networks with long short-term memory can therefore store in­form­a­tion over long periods of time and thus recognise long-term de­pend­en­cies. This is par­tic­u­larly important in the field of deep learning and AI. The basis for this is the gates. We’ll explain how these work in more detail later in this article. The networks provide efficient models for pre­dic­tion and pro­cessing based on time series data.

AI Tools at IONOS
Empower your digital journey with AI
  • Get online faster with AI tools
  • Fast-track growth with AI marketing
  • Save time, maximise results

Which elements make up an LSTM cell?

A cell that has a long short-term memory consists of different building blocks that offer the network various options. It must be able to store in­form­a­tion over a long period of time and link it to new in­form­a­tion as required. At the same time, it’s important that the cell in­de­pend­ently deletes un­im­port­ant or outdated knowledge from its ‘memory’. For this reason, it consists of four different com­pon­ents:

  • Input gate: The input gate decides which new in­form­a­tion is to be added to the memory and how.
  • Forget gate: The forget gate de­term­ines which in­form­a­tion should be stored in a cell and which should be removed.
  • Output gate: The output gate de­term­ines how values are output from a cell. The decision is based on the current status and the re­spect­ive input in­form­a­tion.

The fourth component is the cell interior. This is subject to its own linking logic, which regulates how the other com­pon­ents interact and how in­form­a­tion flows and storage processes should be handled.

How does long short-term memory work?

Similar to the afore­men­tioned Recurrent Neural Network or the simpler Feed­for­ward Neural Network (FNN), cells with long short-term memory also act in layers. Unlike other networks, however, they store in­form­a­tion over long periods of time and can sub­sequently process or retrieve it. To do this, each LSTM cell uses the three gates mentioned above as well as a type of short-term memory and long-term memory.

  • Short-term memory, i.e., the memory in which in­form­a­tion from previous cal­cu­la­tion steps is stored for a short time, is also known from other networks. In the case of long short-term memory, it’s called hidden state. Unlike other networks, however, an LSTM cell can also retain in­form­a­tion in the long term. This in­form­a­tion is stored in the so-called cell state. New in­form­a­tion now passes through the three gates.
  • In the input gate, the current input is mul­ti­plied by the hidden state and the weighting of the last run. This is how the input gate decides how valuable the new input is. Important in­form­a­tion is then added to the previous cell state to create the new cell state.
  • The forget gate is used to decide which in­form­a­tion should continue to be used and which should be removed. The last hidden state and the current input are taken into account. This decision is made using a sigmoid function (gooseneck function), which outputs values between 0 and 1. 0 means that previous in­form­a­tion is forgotten, while 1 means that the previous in­form­a­tion is retained as the current status. The result is mul­ti­plied by the current cell state. Values with 0 are therefore dropped.
  • The final output is then cal­cu­lated in the output gate. The hidden state and the sigmoid function are used for this. The cell state is then activated with a tanh function (hy­per­bol­ic tangent) and mul­ti­plied to determine which in­form­a­tion should pass through the output gate.

What different ar­chi­tec­tures are there?

While this mode of operation is similar for all networks with long short-term memory, there are sometimes serious dif­fer­ences in the ar­chi­tec­ture of LSTM variants. For example, peephole LSTMs are widely used, which owe their name to the fact that the in­di­vidu­al gates can see the status of the re­spect­ive cell. An al­tern­at­ive is peephole con­vo­lu­tion­al LSTMs, which use discrete con­vo­lu­tion in addition to matrix mul­ti­plic­a­tion to calculate the activity of a neuron.

What are the most important areas of ap­plic­a­tion for long short-term memory?

Countless ap­plic­a­tions now rely entirely or partially on neural networks with long short-term memory. The areas of ap­plic­a­tion are very diverse. The tech­no­logy makes a valuable con­tri­bu­tion in the following areas:

  • Automated text gen­er­a­tion
  • Analysis of time series data
  • Voice re­cog­ni­tion
  • Fore­cast­ing stock market de­vel­op­ments
  • Com­pos­i­tion

Long short-term memory is also used to identify anomalies, for example in the event of attempted fraud or attacks on networks. Cor­res­pond­ing ap­plic­a­tions can also recommend media such as films, series, bands or books based on user data or analyse videos, images or songs. This is a simple way of not only in­creas­ing security, but also sig­ni­fic­antly reducing costs.

Numerous large cor­por­a­tions use long short-term memory for their services and products. Google uses cor­res­pond­ing networks for its smart as­sist­ance systems, the trans­la­tion program Google Translate, the gaming software AlphaGo and speech re­cog­ni­tion in smart­phones. The two voice-con­trolled as­sist­ants Siri (Apple) and Alexa (Amazon) are also based on long short-term memory, as is Apple’s keyboard com­ple­tion.

Go to Main Menu