Ag­greg­a­tion in MongoDB is a valuable tool for analysing and filtering databases. The pipeline system makes it possible to specify queries, allowing for highly cus­tom­ised outputs.

What is ag­greg­a­tion in MongoDB?

MongoDB is a non-re­la­tion­al and document-oriented database that is designed for use with large and diverse amounts of data. By forgoing rigid tables and using tech­niques like sharding (storing data on different nodes), the NoSQL solution can scale ho­ri­zont­ally while remaining highly flexible and resilient to failures.

Documents in the binary JSON format BSON are bundled in col­lec­tions and can be queried and edited using the MongoDB Query Language (MQL). Even though this language offers many options, it’s not suitable (or perhaps not suitable enough) for data analysis. That’s why MongoDB provides ag­greg­a­tion.

In computer science, this term refers to various processes. In MongoDB, ag­greg­a­tion refers to the analysis and sum­mar­ising of data using various operation to produce a single and clear result. During this process, data from one or more documents is analysed and filtered according to user-defined factors.

In the following sections, we not only look at the pos­sib­il­it­ies that MongoDB ag­greg­a­tion offers for com­pre­hens­ive data analysis, but also provide examples of how you can use the aggregate ( ) method with a database man­age­ment system.

What do I need for MongoDB ag­greg­a­tion?

There are only a few re­quire­ments for using ag­greg­a­tion in MongoDB. The method is executed in the shell and works according to logical rules that you can tailor to meet the needs of your analysis.

To use ag­greg­a­tion in Mongo DB, you need to have MongoDB already installed on your computer. If it isn’t, you can find out how to download, install and run the database in our com­pre­hens­ive MongoDB tutorial.

You should also use a powerful firewall and make sure your database is set up according to all current security standards. To run ag­greg­a­tion in MongoDB, you need to have ad­min­is­tra­tion rights.

The database works across all platforms, so the steps described below apply to all operating systems.

What is the pipeline in the MongoDB ag­greg­a­tion framework?

In MongoDB, you can carry out simple searches or queries, with the database im­me­di­ately dis­play­ing the results. However, this method is very limited, as it can only display results that already exist within the stored documents. This type of query is not intended for in-depth analysis, recurring patterns or for deriving further in­form­a­tion.

Sometimes different sources within a database need to be taken into account in order to draw mean­ing­ful con­clu­sions. MongoDB ag­greg­a­tion is used for situ­ations like these. To achieve such results, the aggregate ( ) method uses pipelines.

Role of the pipeline

Ag­greg­a­tion pipelines in MongoDB are processes in which existing data is analysed and filtered with the help of various steps in order to display the result users are looking for. These steps are referred to as stages. Depending on the re­quire­ments, one or more stages can be initiated. These are executed one after the other and change your original input so that the output (the in­form­a­tion you are looking for) can be displayed at the end.

While the input is made up of numerous pieces of data, the output (i.e., the end result) is singular. We’ll explain the different stages of MongoDB ag­greg­a­tion later on in this section.

Syntax of the MongoDB ag­greg­a­tion pipeline

First, it’s worth taking a brief look at the syntax of ag­greg­a­tion in MongoDB. The method is always struc­tured according to the same format and can be adapted to your specific re­quire­ments. The basic structure looks like this:

db.collection_name.aggregate ( pipeline, options )
shell

Here, collection_name is the name of the col­lec­tion in question. The stages of MongoDB ag­greg­a­tion are listed under pipeline. options can be used for further optional para­met­ers that define the output.

Pipeline stages

There are numerous stages for the ag­greg­a­tion pipeline in MongoDB. Most of them can be used multiple times within a pipeline. It would go beyond the scope of this article to list all the options here, es­pe­cially as some are only required for very specific op­er­a­tions. However, to give you an idea of the stages, we’ll list a few of the most fre­quently used ones here:

  • $count: This stage gives you an in­dic­a­tion of how many BSON documents have been con­sidered for the stage or stages in the pipeline.
  • $group: This stage sorts and bundles documents according to certain para­met­ers.
  • $limit: Limits the number of documents passed to the next stage in the pipeline.
  • $match: With the $match stage, you limit the documents that are used for the following stage.
  • $out: This stage is used to include the results of the MongoDB ag­greg­a­tion in the col­lec­tion. This stage can only be used at the end of a pipeline.
  • $project: Use $project to select specific fields from a col­lec­tion.
  • $skip: This stage ignores a certain number of documents. You can specify this with an option.
  • $sort: This operation sorts the documents in the user’s col­lec­tion. However, the documents are not changed beyond this.
  • $unset: $unset excludes certain fields. It does the opposite of what $project does.

An example of ag­greg­a­tion in MongoDB

To help you better un­der­stand how ag­greg­a­tion in MongoDB works, we’ll show you some examples of different stages and how to use them. To use MongoDB ag­greg­a­tion, open the shell as an ad­min­is­trat­or. Normally, a test database will be displayed first. If you want to use a different database, use the use command.

For this example, let’s imagine a database that contains the data of customers who have purchased a specific product. To keep things simple, this database has just ten documents, which are all struc­tured the same:

{
	"name" : "Smith",
	"city" : "Glasgow",
	"country" : "Scotland",
	"quantity" : 14
}
shell

The following in­form­a­tion about the customers has been included: their name, place of residence, country and the number of products they have purchased.

If you want to try ag­greg­a­tion in MongoDB, you can use the method insertMany ( ) to add all documents with customer data to the col­lec­tion named ‘customers’:

db.customers.insertMany ( [
	{ "name" : "Smith", "city" : "Glasgow", "country" : "Scotland", "quantity" : 14 },
	{ "name" : "Meyer", "city" : "Hamburg", "country" : "Germany", "quantity" : 26 },
	{ "name" : "Lee", "city" : "Birmingham", "country" : "England", "quantity" : 5 },
	{ "name" : "Rodriguez", "city" : "Madrid", "country" : "Spain", "quantity" : 19 },
	{ "name" : "Nowak", "city" : "Krakow", "country" : "Poland", "quantity" : 13 },
{ "name" : "Rossi", "city" : "Milano", "country" : "Italy", "quantity" : 10 },
{ "name" : "Arslan", "city" : "Ankara", "country" : "Turkey", "quantity" : 18 },
{ "name" : "Martin", "city" : "Lyon", "country" : "France", "quantity" : 9 },
{ "name" : "Mancini", "city" : "Rome", "country" : "Italy", "quantity" : 21 },
{ "name" : "Schulz", "city" : "Munich", "country" : "Germany", "quantity" : 2 }
] )
shell

A list of object IDs for each in­di­vidu­al document will be displayed.

How to use $match

To il­lus­trate the pos­sib­il­it­ies of ag­greg­a­tion in MongoDB, we’ll first apply the $match stage to our ‘customers’ col­lec­tion. Without ad­di­tion­al para­met­ers, this would simply output the complete list of customer data listed above.

In the following example, however, we’ve in­struc­ted it to only show us customers from Italy. Here’s the command:

db.customers.aggregate ( [
	{ $match : { "country" : "Italy" } }
] )
shell

You’ll now only be shown the object IDs and in­form­a­tion of the two customers from Italy.

Use $sort for a better overview

If you want to organise your customer database, you can use the $sort stage. In the following example, we instruct the system to sort all customer data according to the number of units purchased, starting with the highest number. The input looks like this:

db.customers.aggregate ( [
	{ $sort : { "quantity" : -1 } }
] )
shell

Limit the output with $project

With the stages used so far, you’ll see that the output is re­l­at­ively extensive. For example, in addition to the actual in­form­a­tion within the documents, the object ID is also always output. You can use $project in the MongoDB ag­greg­a­tion pipeline to determine which in­form­a­tion should be output. To do this, we set the value 1 for required fields and 0 for fields that don’t need to be included in the output. In our example, we only want to see the customer name and the number of products purchased. To do this, we enter the following:

db.customers.aggregate ( [
	{ $project : { _id : 0, name : 1, city : 0, country : 0, quantity : 1 } }
] )
shell

Combine multiple stages with ag­greg­a­tion in MongoDB

MongoDB ag­greg­a­tion also gives you the option of applying several stages in suc­ces­sion. These are then run through one after the other, and at the end there is an output that takes all the desired para­met­ers into account. For example, if you only want to display the names and purchases of Scottish customers in des­cend­ing order, you can use the stages described above as follows:

db.customers.aggregate ( [
	{ $match : { "country" : "Scotland" } }
	{ $project : { _id : 0, name : 1, city : 0, country : 0, quantity : 1 } }
	{ $sort : { "quantity" : -1 } }
] )
shell
Tip

Want to find out more about MongoDB? We’ve got a lot more in­form­a­tion in our Digital Guide. For example, you can read about how the list databases command works or how you can use MongoDB Sort to specify the order of your data output.

6c91bcb9b391864292700a1cf1b1a33a

60d998b3c0b60c9c3df84ab9635a326f

5ea2c6aa6c01a3ab74e73bb4a8493f37

4912435f410819dd8f176a25ddca2f57

b6e4bbb337f54db7fc3b8fcae4fbadb9

Go to Main Menu