A re­pos­it­ory stores data that can be retrieved and modified later. Different types of re­pos­it­or­ies exist. They can be used for version control, metadata and other purposes.

What is a re­pos­it­ory?

‘Re­pos­it­ory’ means ‘storage’ and comes from the Latin word re­pos­it­ori­um. In software tech­no­logy, a re­pos­it­ory is a digital archive in which data, documents, de­vel­op­ment progress, metadata and programs can be stored and shared. Version control is another feature of re­pos­it­or­ies. Depending on the intended use, this tech­no­logy enables large teams or com­munit­ies working all over the world to col­lab­or­ate on a shared project. Available types of re­pos­it­or­ies differ in terms of their approach and structure. The best-known re­pos­it­or­ies include GitHub and the Google Re­pos­it­ory.

The basis for a re­pos­it­ory is usually a database, which, depending on re­quire­ments, can be set up on a local hard disk or a server, or can also be dis­trib­uted across numerous servers in a content delivery network (CDN). Data cata­logues are created that contain the forms and rep­res­ent­a­tions of various stored objects and provide in­form­a­tion about their re­la­tion­ship to each other. All this in­form­a­tion is stored in the form of metadata and can be searched for, retrieved, modified and adapted at any time with the ap­pro­pri­ate au­thor­isa­tion.

How is a re­pos­it­ory struc­tured?

To il­lus­trate how a re­pos­it­ory is struc­tured, let’s visualise a tree. In software de­vel­op­ment, you can even see this reflected in the ter­min­o­logy. Here a dis­tinc­tion is made between the trunk, which contains the current version of a project and the source code, and the branches, where edits are stored. Changes are later added back to the trunk so that all par­ti­cipants have access to them. Storage works via tags.

What types of re­pos­it­or­ies are there?

Not all re­pos­it­or­ies are the same. They differ by their type of archive. Different ap­proaches exist. The following are the best-known ones.

Re­pos­it­ory for version man­age­ment

In version man­age­ment, the aim is to store data in a clear manner while logically working out steps and con­nec­tions in a common archive. Source code files and other data are stored and archived. Data can be copied from the re­pos­it­ory to a local hard drive for de­velopers to continue working with them. This process is referred to as ‘checking out’. The developer then works with the local data, making changes or dis­card­ing previous changes. Once the work is complete, the latest state of the project is uploaded back to the re­pos­it­ory, which is referred to as ‘checking in’. All changes and comments are logged during this process.

This approach has several ad­vant­ages. For one, users can col­lab­or­ate on a project without over­writ­ing older versions. Instead, all status updates are logged, making it possible to return to a previous version. A re­pos­it­ory enables small and large teams to col­lab­or­ate on the same project. Updates can be made sim­ul­tan­eously without over­writ­ing statuses or changes being lost. The­or­et­ic­ally, all users can continue a project at any state without any risks.

The most popular version control systems include CVS, GitHub and SVN.

Re­pos­it­ory for metadata

A re­pos­it­ory for metadata tends to be used in highly complex IT in­fra­struc­tures. Such a re­pos­it­ory contains the data of the entire system as well as in­form­a­tion about the in­fra­struc­ture’s context and en­vir­on­ment. The advantage of this type of re­pos­it­ory is that changes can be made without altering the source code or needing to implement ad­di­tion­al programs. Instead, the database table, which is the basis for the re­spect­ive system, is adapted in a straight­for­ward manner. The metadata re­pos­it­ory tends to be used in en­ter­prise ap­plic­a­tion in­teg­ra­tion (UAI) and data ware­hous­ing.

Re­pos­it­ory for software

A software re­pos­it­ory is par­tic­u­larly important for Linux users. A software re­pos­it­ory contains ap­plic­a­tion packages and the cor­res­pond­ing metadata such as ex­plan­a­tions, an­nota­tions, de­pend­en­cies and changes. In­stall­a­tion and updates are performed using a package manager. In this way, users don’t have to worry about updating their ap­plic­a­tions. Instead, the system is updated auto­mat­ic­ally. The updates them­selves are often provided by the community. Users main­tain­ing packages, known as package main­tain­ers, typically provide the updated data and carry out the main­ten­ance of the re­spect­ive software re­pos­it­ory.

Re­pos­it­ory for document servers

The term re­pos­it­ory is also applied to extensive network pub­lic­a­tions and document servers, at least fig­ur­at­ively. Although some special features of the re­pos­it­ory principle aren’t adopted one to one, the procedure is adapted for use. Well-known document servers such as arXiv publish pub­lic­a­tions from the fields of biology, computer science, math­em­at­ics, physics and stat­ist­ics. An expert reviews new articles and approves or rejects them. The sci­entif­ic works can then be made available for download. However, in contrast to a version control re­pos­it­ory, it is not possible to edit documents.

Re­pos­it­ory for CASE

A re­pos­it­ory is also fre­quently used in computer-aided software en­gin­eer­ing. It’s mainly used to store project data, doc­u­ment­a­tion and source code.

Which re­pos­it­or­ies are useful?

Numerous types of re­pos­it­or­ies are available for different purposes. A dis­tinc­tion is made between solutions that are open source and those offered com­mer­cially. The most popular open-source re­pos­it­ory is GitHub. However, there are various GitHub al­tern­at­ives such as Apache Allura, Bazaar, Gitolite, Mercurial or Source­Forge. A detailed com­par­is­on of GitHub and GitLab is available in our Digital Guide. Among the best-known pro­pri­et­ary re­pos­it­or­ies are Ali­en­brain, Bitkeeper, IBM Rational Synergy and MySQL Yum.

Whether a re­pos­it­ory is suitable for your project depends on your re­quire­ments and your way of working. For teamwork, a re­pos­it­ory can improve work processes and optimise workflow. Even if employees access a project and make changes at different times and from different locations, the trunk is always secure. Solutions can be tested without jeop­ard­ising previous progress. It’s a good idea to test an open-source solution before pur­chas­ing a com­mer­cial option.

How does a re­pos­it­ory work?

Used correctly, a re­pos­it­ory offers several ad­vant­ages. GitHub is a great example of this. Once you’ve installed and set up GitHub, you can use the intuitive user interface to assign and process tasks. Commits and pulls are used for listed changes. In this way, a team leader can track in­di­vidu­al progress steps and members can follow the project down to the smallest detail. To learn more about GitHub, have a look at our Git tutorial.

Go to Main Menu