Con­vo­lu­tion­al neural networks (ConvNets, CNNs) are ar­ti­fi­cial neural networks whose layers (con­vo­lu­tion­al layers) are applied to input data to extract features and ul­ti­mately identify an object. ConvNets are essential to deep learning.

What are con­vo­lu­tion­al neural networks (CNN)?

A con­vo­lu­tion­al neural network is a spe­cial­ised type of ar­ti­fi­cial neural network that is par­tic­u­larly effective at pro­cessing and analysing visual data such as images and videos. CNNs are crucial in machine learning, es­pe­cially in the ML subset deep learning.

ConvNets consist of node layers, including an input layer, one or more hidden layers and an output layer. The in­di­vidu­al nodes are in­ter­con­nec­ted, and each one has a weight and threshold as­so­ci­ated with it. Once the output of a single node exceeds its specified threshold, it activates and sends data to the next layer of the network.

Different types of neural networks are used for different types of data and use cases. For example, recurrent neural networks are often used for pro­cessing natural language and speech re­cog­ni­tion, while con­vo­lu­tion­al neural networks (CNNs) are more commonly employed for clas­si­fic­a­tion and computer vision tasks. The ability of neural networks to recognise complex patterns in data makes them a sig­ni­fic­ant tool in ar­ti­fi­cial in­tel­li­gence.

AI Tools at IONOS
Empower your digital journey with AI
  • Get online faster with AI tools
  • Fast-track growth with AI marketing
  • Save time, maximise results

How do con­vo­lu­tion­al neural networks work?

ConvNets dis­tin­guish them­selves from other neural networks with their superior per­form­ance in pro­cessing image, speech and audio signals. They have three main types of layers, and with each layer, the CNN becomes more complex and is able to identify larger parts of an image, for example.

How the ConvNet algorithm processes images

Computers recognise images as number com­bin­a­tions, or more spe­cific­ally, as pixel values. The CNN algorithm does this as well. For example, a black-and-white image with length m and width n is rep­res­en­ted as a 2-di­men­sion­al array of size mXn. With a colour image of the same size, a 3-di­men­sion­al array is used. Each cell in the array contains the cor­res­pond­ing pixel value, and each image is rep­res­en­ted by the re­spect­ive pixel values in three different channels, cor­res­pond­ing to a red, blue and green channel.

Next, the most important features of the image are iden­ti­fied. These are extracted using a method known as con­vo­lu­tion. This is an operation where one function modifies (folds) the shape of another function. Image sharpen­ing, smoothing and en­hance­ment are common ways that con­vo­lu­tions are used for images. However, in the case of CNNs, con­vo­lu­tions are employed to extract sig­ni­fic­ant features from images.

A filter or kernel is used to extract key features from an image. A filter is an array that rep­res­ents a specific feature that should be extracted. The filter is applied over the input array, and the resulting array is a two-di­men­sion­al array that shows where and how strongly the feature appears in the image. The output matrix is known as a feature map.

Char­ac­ter­ist­ics of the different con­vo­lu­tion layers

During the con­vo­lu­tion process, the input field is trans­formed into a smaller field that retains the spatial cor­rel­a­tion between pixels by applying filters. Below we’ll take a look at the three main types of con­vo­lu­tion layers.

  • Con­vo­lu­tion­al layer: This layer is the first layer of a con­vo­lu­tion­al network. Using filters (small weight matrices) that slide over the images, the layer is able to recognise local features such as edges, corners and textures. Each filter creates a feature map that high­lights specific patterns. More than one con­vo­lu­tion­al layer can be used, creating a hier­arch­ic­al structure in the CNN, whereby the sub­sequent layers can see the pixels located in the receptive fields of the previous layers.
  • Pooling layer: This layer reduces the size of the feature maps by sum­mar­ising local areas and dis­card­ing ir­rel­ev­ant in­form­a­tion. This reduces com­pu­ta­tion­al com­plex­ity while ensuring that the most important in­form­a­tion is retained.
  • Fully-connected layer: Similar to the structure in a natural neural network, this layer connects all the neurons. Used for making the final clas­si­fic­a­tion, it combines the extracted features to identify an object in an image.

A more detailed look at the con­vo­lu­tion process

Imagine you are trying to determine whether an image contains a human face. You can think of the face as the sum of its parts: two eyes, a nose, a mouth, two ears, etc. This is what the con­vo­lu­tion process looks like.

  1. First con­vo­lu­tion­al layers: The first con­vo­lu­tion­al layers use filters to recognise features from in­di­vidu­al pixels. For example, a filter might recognise a vertical edge rep­res­ent­ing the edge of an eye. As mentioned, local features form patterns re­gistered as feature maps during con­vo­lu­tion. In this case, a feature map might represent the edges of the eyes, the nose and the mouth.
  2. Ad­di­tion­al con­vo­lu­tion­al layers: Following the first con­vo­lu­tion­al layers, ad­di­tion­al ones can be applied or the pooling layers can be applied. The sub­sequent con­vo­lu­tion­al layers combine simple features into more complex patterns. The in­di­vidu­al patterns gradually form a face. For example, edges and corners can be combined into shapes like eyes. The layers that are added see larger areas of the image (receptive fields) and recognise composite struc­tures, known as feature hier­arch­ies within the con­vo­lu­tion layers. A layer that is added later would be able to recognise that when two eyes and a mouth are arranged in a certain way, they form a face.
  3. Pooling layers: These reduce the size of the feature maps and further abstract the features. This reduces the amount of data that needs to be processed while still retaining the essential features.
  4. Fully-connected layer: The last layer of the ConvNet is the fully-connected layer. In this layer, the CNN would produce the image of a human face, which thanks to the con­vo­lu­tions would be clearly dis­tin­guish­able from another face.
Image: Diagram of a convolutional neural network
ConvNets auto­mat­ic­ally extract features needed to identify objects in an images.

Ad­di­tion­ally, tech­niques such as dropout and reg­u­lar­isa­tion optimise the CNNs by pre­vent­ing over­fit­ting from occurring. Ac­tiv­a­tion functions like ReLU (Rectified Linear Unit) introduce non-linearity and help the network recognise more complex patterns by ensuring not all neurons perform the same cal­cu­la­tions. Ad­di­tion­ally, batch nor­m­al­isa­tion sta­bil­ises and speeds up training by pro­cessing data more evenly.

What can con­vo­lu­tion­al neural networks be used for?

Before CNNs existed, objects were iden­ti­fied in images using time-consuming feature ex­trac­tion methods that had to be carried out manually. Con­vo­lu­tion­al neural networks offer a more scalable approach to image clas­si­fic­a­tion and object detection. Employing prin­ciples of linear algebra, (in par­tic­u­lar, matrix mul­ti­plic­a­tion), CNNs are able to recognise patterns in an image. They are now widely used in:

  • Image and speech re­cog­ni­tion: CNNs auto­mat­ic­ally recognise objects or people in images and videos, for example, for photo-tagging in smart­phones, facial re­cog­ni­tion systems and voice as­sist­ants like Siri or Alexa.
  • Medical dia­gnostics: Here, AI image re­cog­ni­tion tech­no­logy enhances dia­gnostics by aiding in the analysis of medical images such as X-rays, CT scans and MRIs.
  • Autonom­ous vehicles: ConvNets are used, for example, in self-driving cars to recognise road features and obstacles.
  • Social Media: CNNs are used for text mining, which allows social media platforms to auto­mat­ic­ally moderate content and create per­son­al­ised ad­vert­ising.
  • Marketing and retail: CNNs are used to mine data, enabling visual product searches and product placement.
IONOS AI Model Hub
Your gateway to a sovereign mul­timod­al AI platform
  • 100% GDPR-compliant and securely hosted in Europe
  • One platform for the most powerful AI models
  • No vendor lock-in with open source

What are the ad­vant­ages and dis­ad­vant­ages of con­vo­lu­tion­al neural networks?

Con­vo­lu­tion­al neural networks can auto­mat­ic­ally extract relevant features from data and they also achieve a high level of accuracy. However, training CNNs ef­fect­ively requires a sub­stan­tial amount of com­pu­ta­tion­al resources, including large volumes of labelled data and powerful GPUs, to produce optimal results.

Ad­vant­ages Dis­ad­vant­ages
Automatic feature ex­trac­tion High com­pu­ta­tion­al re­quire­ments
High level of accuracy Large datasets needed
Summary

CNNs have re­volu­tion­ised the field of ar­ti­fi­cial in­tel­li­gence and offer immense benefits across various sectors. Further de­vel­op­ments, such as hardware im­prove­ments, new data col­lec­tion methods and advanced ar­chi­tec­tures like Capsule Networks, can further optimise CNNs and integrate them into more tech­no­lo­gies, making it possible to use them for a wider range of use cases.

Go to Main Menu