SQL JOIN is an operation in re­la­tion­al databases that allows queries across multiple database tables. JOINs merge data stored in different tables and output it in filtered form in a results table.

The principle of SQL JOIN is based on the re­la­tion­al algebra operation of the same name – a com­bin­a­tion of Cartesian product and selection. The user de­term­ines which data from the output tables is trans­ferred to the results table by selecting a JOIN type and defining a selection condition.

We will introduce you to the math­em­at­ic­al principle of SQL JOINs, compare different JOIN types, and show you how to implement JOINs in the context of database queries via SQL using practical examples.

Tip

This article on SQL JOIN requires knowledge of certain concepts of the re­la­tion­al database model, es­pe­cially regarding relations, tuples, at­trib­utes, or keys. Our basis article on re­la­tion­al databases will give you a solid in­tro­duc­tion.

How do SQL JOINs work?

The basic principle of SQL JOIN can be il­lus­trated by deriving the database operation from its sub-op­er­a­tions. The following re­la­tion­al algebra op­er­a­tions form the basis of each JOIN:

  • Cartesian product
  • Selection

The Cartesian product

The Cartesian product (also called the cross product) is a set theory operation whereby two or more sets are linked together. In the re­la­tion­al database model, the Cartesian product is used to connect tuple sets in the form of tables. The result of this operation is a set of ordered tuples, where each tuple consists of one element of each initial set.

The mul­ti­plic­a­tion sign (x) is used as an operator for the Cartesian product in re­la­tion­al algebra.

Here is an example:

The Cartesian product A × B of both sets A = {x, y, z} and B = {1, 2, 3} is:

A × B = {(x,1), (x,2), (x,3), (y,1), (y,2), (y,3), (z,1), (z,2), (z,3)}

The cal­cu­la­tion can be il­lus­trated using the following graphic:

Please note the sequence of pair formation. For example, the Cartesian product A x B does not cor­res­pond to the same quantity as the Cartesian product B x A.

A × B = {(x,1), (x,2), (x,3), (y,1), (y,2), (y,3), (z,1), (z,2), (z,3)}

B × A = {(1,x), (1,y), (1,z), (2,x), (2,y), (2,z), (3,x), (3,y), (3,z)}

In SQL ter­min­o­logy, an operation in which the Cartesian product is formed from two database tables is called “cross JOIN.” In practice, cross JOINS are rarely used due to the un­filtered result set.

The selection

Selection is a re­la­tion­al algebra operation that allows you to select specific tuples in an initial set and output them as a result set. Which tuples are included in the result set can be de­term­ined by a com­par­is­on ex­pres­sion. The result of the selection is a set of tuples that fulfill the selection condition defined in the com­par­is­on ex­pres­sion. The Greek letter sigma (σ) is used as an operator. The operation is noted as follows:

σF (R)

The place­hold­er “F” cor­res­ponds to the re­la­tion­al ex­pres­sion, a formula of logical pre­dic­ates that define the selection con­di­tions. R stands for the dataset to be selected. Al­tern­at­ively, the linear notation R[F] can be used.

The usual com­par­is­on operators are available for for­mu­lat­ing selection con­di­tions: for example, equal to (=), greater than (>) or less than (<).

We will explain the selection using an example that we have already in­tro­duced in the basic text of the re­la­tion­al database model. The following table shows fic­ti­tious personnel data that a company could have entered for its employees. The personnel number (e_id), the name (surname, first name), the social security number (ssn), the address (street, number, zip code) and the assigned company car (company car ID) are specified for each employee.

Table: employees

e_id Surname First name ssn Street No. Zip code location vehicle_id
1 Schmidt Jack 25 120512 S 477 Main St. 1 11111 Denver 3
2 Muller Blain 25 100615 M 694 Station St. 2 22222 Boulder 1
3 McClain Walker 25 091225 M 463 Market Alley 3 33333 Denver 1
4 Cohn Greg 25 170839 K 783 Forest Way 4 44444 Niwot 2

If we want to select the employee table so that only the employees who drive the vehicle with the vehicle_id 1 are displayed, we can proceed as follows.

σvehicle_id=1(employee)

We only retrieve the tuples where the value in the column vehicle_id is 1.

The results are shown in the following table:

Table: employee (selected)

e_id Surname First name svn Street No. Zip code location vehicle_id
2 Muller Blain 25 100615 M 694 Station St. 2 22222 Boulder 1
3 McClain Walker 25 091225 M 463 Market Alley 3 33333 Denver 1

In the SQL database language, selection con­di­tions are defined using the “WHERE” command.

SELECT * FROM employee WHERE employee.vehicle_id = 1;

If a tuple fulfils the condition vehicle_id=1, the values of all columns should be shown for this tuple.

Note

The asterisk (*) rep­res­ents all columns in a table in the SQL syntax.

Com­bin­a­tion of Cartesian product and selection

All common JOIN types combine the Cartesian product with a selection condition. To explain this kind of database operation, we will reduce the “employees” table to four columns for the sake of clarity. In addition, we will introduce the car table, where detailed in­form­a­tion on the company’s vehicle fleet is kept.

Both tables are linked to each other using a foreign key re­la­tion­ship. The primary key of the “car” table functions as a foreign key in the “employees” table.

Table: employee

e_id Surname First name vehicle_id
1 Schmidt Jack 3
2 Muller Blain 1
3 McClain Walker 1
4 Cohn Greg 2

Table: car

vehicle_id Make Model Re­gis­tra­tion Year State in­spec­tion
1 VW Caddy B KH 778 2016 18.12.2018
2 Opel Astra B PO 654 2010 12.08.2019
3 BMW X6 B MW 780 2017 01.09.2018
Note

Storing in­form­a­tion in different database tables is a basic concept of the re­la­tion­al database model. The ad­vant­ages of this kind of database design and its im­ple­ment­a­tion are discussed in an article on nor­mal­iz­a­tion of databases.

If you want to merge the two tables and select relevant tuples at the same time, you combine the pre­vi­ously in­tro­duced database op­er­a­tions:

σvehicle_id=vehicle_id(employee × vehicle)

First, the Cartesian product is formed by employee x vehicle. The (intermin) result is a cross JOIN – a results table in which every tuple in the “employees” table is combined with every tuple in the “vehicle” table.

Table: Cartesian product “employee” × “car”

e_id Surname First name employee.vehicle_id vehicle.vehicle_id Make Model Re­gis­tra­tion Year State in­spec­tion
1 Schmidt Jack 3 1 VW Caddy B KH 778 2016 18.12.2018
1 Schmidt Jack 3 2 Opel Astra B PO 654 2010 12.08.2019
1 Schmidt Jack 3 3 BMW X6 B MW 780 2017 01.09.2018
2 Muller Blain 1 1 VW Caddy B KH 778 2016 18.12.2018
2 Muller Blain 1 2 Opel Astra B PO 654 2010 12.08.2019
2 Muller Blain 1 3 BMW X6 B MW 780 2017 01.09.2018
3 McClain Walker 1 1 VW Caddy B KH 778 2016 18.12.2018
3 McClain Walker 1 2 Opel Astra B PO 654 2010 12.08.2019
3 McClain Walker 1 3 BMW X6 B MW 780 2017 01.09.2018
4 Cohn Greg 2 1 VW Caddy B KH 778 2016 18.12.2018
4 Cohn Greg 2 2 Opel Astra B PO 654 2010 12.08.2019
4 Cohn Greg 2 3 BMW X6 B MW 780 2017 01.09.2018

Then only the tuples are selected which show that the vehicle_id of the “vehicle” table matches the vehicle_id of the “employees” table. The selection condition is that the foreign key of the “employee” table cor­res­ponds to the primary key of the “vehicle” table.

The (final) result is a table that combines both output tables without re­dund­an­cies.

Table: JOIN between “employee” and “car”

e_id Surname First name employee.vehicle_id vehicle.vehicle_id Make Model Re­gis­tra­tion Year State in­spec­tion
1 Schmidt Jack 3 3 BMW X6 B MW 780 2017 01.09.2018
2 Muller Blain 1 1 VW Caddy B KH 778 2016 18.12.2018
3 McClain Walker 1 1 VW Caddy B KH 778 2016 18.12.2018
4 Cohn Greg 2 2 Opel Astra B PO 654 2010 12.08.2019

As a com­bin­a­tion of Cartesian product and sub­sequent selection, JOINs combine both op­er­a­tions into one. The bowtie symbol (⋈) is used as an operator.

As seen here:

σvehicle_id=vehicle_id(employee × vehicle) := employee⋈vehicle_id=vehicle_idvehicle

Operation σvehicle_id=vehicle_id(employee × vehicle)cor­res­ponds to a JOIN via the tables “employee” and “vehicle” with the condition vehicle_id=vehicle_id.

Trans­ferred to the SQL syntax, the above operation would cor­res­pond to the following statement:

SELECT * FROM employee INNER JOIN car ON employee.vehicle_id = vehicle.vehicle_id;

The inner JOIN is one of the most important JOINS used in database queries. However, special JOIN types are sometimes required to achieve the desired result.

SQL JOIN types

Different SQL JOIN types are used in the re­la­tion­al database model, which enables queries to be executed using a group of database tables. The pre­requis­ite for this is that the selected tables are linked to one another using foreign key re­la­tion­ships.

The most important JOIN types include the following:

  • INNER JOINs: An inner JOIN is a filtered form of the cross JOIN, in which only the tuples of both output tables that fulfil the selection condition defined by the user are combined in the results.
  • OUTER JOINs: The outer JOIN is an extension of the inner JOIN result and contains the tuples of both output tables that fulfill the selection condition defined by the user, as well as all remaining tuples from the first table, the second table, or both tables. Outer JOINS are realized as left outer JOIN, right outer JOIN, or full outer JOIN.
Tip

You can find a detailed de­scrip­tion of INNER JOINs and OUTER JOINS in our series of articles on these JOIN types.

The dif­fer­ences between inner JOINs and the different variants of the outer JOIN can be il­lus­trated by quantity diagrams. The following graphic is a pictorial rep­res­ent­a­tion of the JOIN types presented:

Re­gard­less of the dis­tinc­tion between inner JOIN and outer JOIN, SQL JOINs can also be clas­si­fied in the following JOIN types:

  • EQUI JOIN
  • NON EQUI JOIN
  • SELF JOIN

Inner JOINs and outer JOINs can be im­ple­men­ted as equi JOINs and non equi JOINs. All JOIN examples presented so far represent equi JOINs. Equi JOINs are char­ac­ter­ised in that they only allow the equal sign as the re­la­tion­al operator (=).

The selection condition of an equi JOIN is always that column values need to be equal.

In principle, however, JOINs (like selection in re­la­tion­al algebra) are not limited to column equality. Possible com­par­is­on operators are:

Com­par­is­on operator Meaning
= Equal to
< Smaller than
> Greater than
≤ Small than or equal to
≥ Greater than or equal to
<> unequal
!= unequal

Since SQL-92, the database language with the keyword USING provides a short form for EQUI JOINS. However, this pre­sup­poses that the relevant columns have the same name, which does not ne­ces­sar­ily have to be the case.

The following example shows two different SQL state­ments that lead to the same result. In the first statement, the JOIN operation is ex­pli­citly defined using the keyword ON. For the second statement, we use the short notation with the key USING.

SELECT * FROM employee INNER JOIN vehicle ON employee.vehicle_id = vehicle.vehicle_id;
SELECT * FROM employee INNER JOIN vehicle USING vehicle_id;

NON EQUI JOINs on the other hand, exclude op­er­a­tions based on the equality of columns. All com­par­is­on op­er­a­tions are allowed except the equal sign (=).

Note

As re­la­tion­ships in re­la­tion­al databases are usually defined by the equality of primary and foreign keys, NON EQIO JOINs are of secondary im­port­ance in the re­la­tion­al database model. Not least because, just like with CROSS JOINs, these often lead to a large number of result data records.

Finally, a SELF JOIN is a special form of SQL JOIN in which a database table is linked to itself. In principle, any JOIN type can be executed as SELF JOIN.

If two tables are linked by columns with the same name, then it is called a NATURAL JOIN. A NATURAL JOIN is im­ple­men­ted by default as an INNER JOIN using the keyword with the same name. NATURAL JOINs are not set to this JOIN type. A NATURAL LEFT OUTER JOIN or a NATURAL RIGHT OUTER JOIN is also possible.

Since NATURAL JOINs are linked using columns with the same names, the re­spect­ive values are not output twice in the result set, but are instead combined into a common column. Examples of NATURAL JOINs can be found in the articles on INNER JOINs and OUTER JOINs.

Go to Main Menu