Longest common sub­sequences (LCS) are a typical problem in IT . Ap­proaches to solve the LCS problem often appear in in­ter­views for pro­gram­mers and algorithm courses.

What is the longest common sub­sequence?

The aim is to find the longest common sub­sequence in two sequences. This is where a sub­sequence is derived from a sequence. The sub­sequence has the same elemental order, in some instances when elements are removed. Let’s take a look at some examples to get a better idea of the principle:

Sequence X Sequence Y LCS(X, Y)
FATHER MATTER ATER
MOTHER HERBAL HER
DAVID DANIEL DAI
Note

There is an important dif­fer­ence between the longest common sub­sequence and the longest common substring being that a substring may not have any gaps. In the example of ‘DAVID’ and ‘DANIEL’ the longest common substring would be ‘DA’ since ‘I’ is separated by ‘V’ and ‘N’.

Are there any practical examples of the LCS problem?

The longest common sub­sequence problem is an important issue in all areas in which sequences which stem from each other are used. There are certain ways to find LCS to see whether there are sim­il­ar­it­ies or dif­fer­ences and, in turn, discover any pla­gi­ar­ism. The well-known ‘diff’ utility which checks for changes to source text files is based on the LCS problem.

In bioin­form­at­ics the longest common sub­sequence problem is often used when analysing DNA sequences. DNA bases change in certain positions over time due to mutations. The presence of a long common sub­sequence in two sequences would suggest a genetic re­la­tion­ship. This can allow us to retrace evolution between species thanks to genetic de­vel­op­ment.

Linguists can also use the longest common sub­sequence problem to research lin­guist­ic de­vel­op­ment over centuries. If you have two words from different languages which have the same meaning and a long common sub­sequence, it suggests that they have the same root and etymology. This would be con­sidered, then, a similar his­tor­ic­al de­vel­op­ment.

What are the solutions to the longest common sub­sequence problem?

First of all, you can solve the LCS problem with a naïve approach. This is a simple approach without using any op­tim­isa­tions or special methods. To do so you simply compute all sub­sequences from two sequences and find the longest common sub­sequence. Un­for­tu­nately, this approach isn’t very efficient and is, therefore, only suitable for short sequences.

Below you will find three efficient solutions to the LCS problem:

  1. Recursive approach
  2. Op­tim­isa­tion through memoisa­tion
  3. Dynamic pro­gram­ming

What all ap­proaches have in common is that in regard to the two sequences, they differ in three cases:

  • The last letter is the same.
  • The last letter is not the same.
  • One of the sequences’ length is zero.

The ap­proaches each differ in their time com­plex­ity (asymp­tot­ic run time) and space com­plex­ity (memory use):

Approach Run time Memory
Naive approach O(n * n²) O(n)
Recursive approach O(n²) O(1)
Op­tim­isa­tion via memoisa­tion O(n *m) O(n* m)
Dynamic pro­gram­ming O(n *m) O(n* m)
Note

The al­gorithms set out below all compute the length of the longest common sub­sequence. There are, if ap­plic­able, multiple set sub­sequences of this length which can be de­term­ined using other steps.

Recursive de­term­in­a­tion of the longest common sub­sequence

If you look at the LCS problem, you can see that it has an ‘optimal sub­struc­ture’. This means that the problem can be reduced to sub­prob­lems. As a solution you can use a recursive approach. Below you can find some examples of al­gorithms to implement this in three of the most common pro­gram­ming languages.

Python

def lcs(X, Y, m, n):
    # Base case
    if m == 0 or n == 0:
        return 0
    # Last elements are equal
    elif X[m-1] == Y[n-1]:
        return 1 + lcs(X, Y, m-1, n-1)
    # Last elements differ
    else:
        return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
# Let's test
X, Y = "DANIEL", "DAVID"
lcs_len = lcs(X, Y, len(X), len(Y))
print(f"Length of LCS is: {lcs_len}")
python

Java

import java.io.*;
class LCS {
    public static int lcs(String A, String B, int m, int n)
    {
        // Base case
        if (m == 0 || n == 0)
            return 0;
        // Last elements are equal
        if (A.charAt(m - 1) == B.charAt(n - 1))
            return 1 + lcs(A, B, m - 1, n - 1);
        // Last elements differ
        else
            return Math.max(lcs(A, B, m, n - 1),
             lcs(A, B, m - 1, n));
    }
    
    // Let's test
    public static void main(String[] args)
        {
            String X = "DAVID";
            String Y = "DANIEL";
            int lcsLength = LCS.lcs(X, Y, X.length(), Y.length());
            System.out.println("Length of LCS is: " + lcsLength);
        }
}
java

C++

#include <iostream>
using namespace std;
int lcs(string X, string Y, int m, int n)
{
    // Base case
    if (m == 0 || n == 0) {
        return 0;
    }
    // Last elements are equal
    if (X[m - 1] == Y[n - 1]) {
        return 1 + lcs(X, Y, m - 1, n - 1);
    }
    // Last elements differ
    else {
        return max(lcs(X, Y, m, n - 1), lcs(X, Y, m - 1, n));
    }
}
// Let's test
int main()
{
    // Initialize variables
    string X = "DAVID";
    string Y = "DANIEL";
    // Compute and output length of LCS
    cout << "Length of LCS is " << lcs(X, Y, X.size(), Y.size());
    return 0;
}
c++

Op­tim­ising the recursive approach using memoisa­tion

When looking again at the recursive approach, you can see that the over­lap­ping parts are computed. These char­ac­ter­ist­ics, called ‘over­lap­ping sub­prob­lems’ are known from Fibonacci sequences. In this case as well recursive sequences are con­stantly computed for a solution. To make the process more efficient, it’s worth­while using memoisa­tion. In other words, you can cache the sub­prob­lems that have already been computed in a two-di­men­sion­al matrix.

Python

def lcs(X, Y, m, n, table):
    
    # Base case
    if (m == 0 or n == 0):
        return 0
    # Already computed value at given position
    if (table[m][n] != -1):
        return table[m][n]
    # Last elements are equal
    if X[m - 1] == Y[n - 1]:
        table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table)
    # Last elements differ
    else:
        table[m][n] = max(lcs(X, Y, m, n - 1, table), lcs(X, Y, m - 1, n, table))
    return table[m][n]
# Let's test
X = "DAVID"
Y = "DANIEL"
m, n = len(X), len(Y)
# Initialize table fields to `-1`
table = [[-1 for i in range(n + 1)] for j in range(m + 1)]
// Compute and output length of LCS
print(f"Length of LCS is: {lcs(X, Y, m, n, table)}")
python

Java

import java.io.*;
class LCS {
    public static int lcs(String X, String Y, int m, int n, int[][] table) {
        // Base case
        if (m == 0 || n == 0) {
            return 0;
        }
        // Already computed value at given position
        if (table[m][n] != -1) {
            return table[m][n];
        }
        // Last elements are equal
        if(X.charAt(m - 1) == Y.charAt(n - 1)) {
            table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table);
            return table[m][n];
        }
        // Last elements differ
        else {
            table[m][n] = Math.max(lcs(X, Y, m, n - 1, table),lcs(X, Y, m - 1, n, table));
            return table[m][n];
        }
    }
    // Let's test
    public static void main(String args[]){
        // Initialize variables
        String X = "DAVID";
        String Y = "DANIEL";
        int m = X.length();
        int n = Y.length();
        int[][] table = new int[m + 1][n + 1];
        
        // Initialize table fields to `-1`
        for(int i=0; i < m + 1; i++) {
            for(int j=0; j < n + 1; j++) {
                table[i][j] = -1;
            }
        }
        // Compute and output length of LCS
        int lcsLength = lcs(X, Y, m, n, table);
        System.out.println("Length of LCS is: " + lcsLength);
    }
}
java
Cheap domain names – buy yours now
  • Free website pro­tec­tion with SSL Wildcard included
  • Free private re­gis­tra­tion for greater privacy
  • Free Domain Connect for easy DNS setup

C++

#include <bits/stdc++.h>
using namespace std;
int lcs(char *X, char* Y, int m, int n, vector<vector<int>>& table)
{
    // Base case
    if (m == 0 || n == 0)
        return 0;
    // Already computed value at given position
    if (table[m][n] != -1) {
        return table[m][n];
    }
    // Last elements are equal
    if (X[m - 1] == Y[n - 1]) {
        table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table);
        return table[m][n];
    }
    // Last elements differ
    else { 
        table[m][n] = max(lcs(X, Y, m, n - 1, table), lcs(X, Y, m - 1, n, table));
        return table;
    }
}
// Let's test
int main()
{
    // Initialize variables
    char X[] = "DAVID";
    char Y[] = "DANIEL";
    int m = strlen(X);
    int n = strlen(Y);
    // Initialize table with `-1`
    vector<vector<int>> table(m + 1, vector<int>(n + 1, -1));
    
    // Compute and output length of LCS
    cout << "Length of LCS is: " << lcs(X, Y, m, n, table);
    return 0;
}
c++

Dynamic pro­gram­ming for the longest common sub­sequence

Dynamic pro­gram­ming is a non-recursive way to solve op­tim­isa­tion problems by dividing them into smaller sub­prob­lems and then solving them ‘bottom up’ Among other things, dynamic pro­gram­ming is applied as a solution for pathfind­ing al­gorithms. The longest common sub­sequence problem can also be solved using dynamic pro­gram­ming when using a two-di­men­sion­al matrix.

Python

def lcs(X , Y, m, n): 
    
    # Initialize dynamic programming table fields to `None`
    table = [[None] * (n + 1) for _ in range(m + 1)]
    
    # Compute table values
    for i in range(m + 1):
        for j in range(n + 1):
            # Base case
            if i == 0 or j == 0 :
                table[i][j] = 0
            # Last elements are equal
            elif X[i - 1] == Y[j - 1]:
                table[i][j] = table[i - 1][j - 1]+ 1
            # Last elements differ
            else:
                table[i][j] = max(table[i - 1][j] , table[i][j - 1])
    
    return table[m][n]
# Let's test
X = "DAVID"
Y = "DANIEL"
# Compute and output length of LCS
lcs_len = lcs(X, Y, len(X), len(Y))
print(f"Length of LCS is: {lcs_len}")
python

Java

import java.io.*;
class LCS {
    public static int lcs(String X, String Y, int m, int n)
    {
        // Initialize dynamic programming table fields
        int table[][] = new int[m + 1][n + 1];
        for (int i = 0; i <= m; i++) {
            for (int j = 0; j <= n; j++) {
                // Base case
                if (i == 0 || j == 0)
                    table[i][j] = 0;
                // Last elements are equal
                else if (X.charAt(i - 1) == Y.charAt(j - 1))
                    table[i][j] = table[i - 1][j - 1] + 1;
                // Last elements differ
                else
                    table[i][j] = Math.max(table[i - 1][j], table[i][j - 1]);
            }
        }
        return table[m][n];
    }
    // Let's test
    public static void main(String args[]){
        // Initialize variables
        String X = "DAVID";
        String Y = "DANIEL";
        int m = X.length();
        int n = Y.length();
        // Compute and output length of LCS
        int lcsLength = lcs(X, Y, m, n);
        System.out.println("Length of LCS is: " + lcsLength);
    }
}
java

C++

#include <bits/stdc++.h>
using namespace std;
int lcs(string X, string Y, int m, int n) {
	// Initialize dynamic programming table
	int table[m + 1][n + 1];
	// Compute table values
	for (int i = 0; i <= m; i++) {
		for (int j = 0; j <= n; j++) {
			// Base case
			if (i == 0 || j == 0)
				table[i][j] = 0;
			// Last elements are equal
			else if (X[i - 1] == Y[j - 1])
				table[i][j] = table[i - 1][j - 1] + 1;
			// Last elements differ
			else
				table[i][j] = max(table[i - 1][j], table[i][j - 1]);
		}
	}
	return table[m][n];
}
// Let's test
int main() {
  // Initialize variables
  string X = "DAVID";
  string Y = "DANIEL";
  int m = X.size();
  int n = Y.size();
  // Compute and output length of LCS
  cout << "Length of LCS is " << lcs(X, Y, m, n);
  return 0;
}
c++
Go to Main Menu