Microdata is an HTML5 specification from WHATWG (Web Hypertext Application Technology Working Group). The data format offers a meta syntax for marking up, or tagging,structured data. By annotating sematic contexts so that they’re machine readable, this HTML extension allows website operators to enrich content with metadata.
Read out by programs, like web crawlers or browsers, this metadata comprises the basis of extended content presentations and for processing web content through assistive systems, like screen readers. Structured data is especially relevant for search engine optimisation. This is due to the fact that semantic annotation makes indexing websites easier and allows search results to be enhanced with additional information. For data structuring, microdata relies on a unified vocabulary defined by Schema.org.
Microdata in comparison to other data formats
While the internet community is able to agree on the fact that HTML has to become more semantic somehow, choosing the right data format for marking up metadata still remains a controversial topic. As a separate module for HTML, microdata was originally introduced as an alternative to the standard at the time, RDFa. The goal of this format was to achieve a simplified syntaxwith comparable functionality to the status quo. One additional advantage is its similarity to the newest HTML version. Microdata came to prominence through the project, Schema.org, a joint initiative organised by the leading search engine providers Google, Bing, Yahoo!, and Yandex; together they provide a microdata-based unified vocabulary for semantic annotation. Whereas microdata was once the preferred data format of the market leader, today’s recommendation from the Mountain Valley tech giant doesn’t seem as steadfast as it once did. In addition to microdata, the Schema.org-vocabulary also continues to support markup with RDFa. JSON-LD is also beginning to come more into the limelight as a new script-based markup format. Be this as it may, this option isn’t supported by Google for all data types, making microdata a very current option.
Microdata syntax is based on name-value pairs, known as items, and can best be described in three steps: first, an element is defined and labelled as an item. This item is then assigned to a certain type from Schema.org’s vocabulary. Once the item type is defined, different properties can then be allocated to it. Markup is carried out with the HTML5 attributes itemscope, itemtype, and itemprop:
- itemscope: the HTML5 attribute, itemscope, is used as part of a div tag in order to label a certain area as an item. This item is then further determined with the help of itemtype and itemprop.
- itemtype: the HTML5 attribute, itemtype, can be applied to all elements tagged by the itemscope attribute; its purpose is to mark predefined types. This allows relevant elements of a website to be assigned to universal types according to Schema.org. These can then be read by all the conventional search engines.
- itemprop: the HTML5 attribute, itemprop, indicates a property of a previously named itemtype. Just which properties can be assigned to the respective itemtype can be found on Schema.org’s website.
Incorporating the attributes itemscope, itemtype, and itemprop into the HTML code can be carried out according to the following basic structure:
<div itemscope itemtype="http://schema.org/type"> <span itemprop="property">Item</span> </div>
Microdata markup in practice
Like other formats for semantically tagging HTML documents, microdata is supported by a set of classic HTML tags. In principle, microdata attributes are independent of HTML tags. Therefore, semantically empty HTML tags, like <div> and <span>, are especially well suited to serve as support elements for microdata attributes.
|<div></div>||The div element defines a new text block; generally, an itemscope is both introduced and terminated with this tag.|
|<span></span>||The span element defines a general inline area without any influence on browser rendering. For this reason, it’s used to markup an itemprop.|
Tagging images with microdata
Embedding company logos constitutes a typical application for semantically annotating website content. While human readers are able to identify a website graphic as a company logo, programs like web crawlers have to rely on microdata in order to ‘understand’ such contexts:
In the first line of code, a new div element is opened that encompasses both the URL in line two as well as the embedded image in line three. This unspecific div tag is labelled with the attribute, itemscope, as being an information-bearing element. The itemtype attribute marks the type ‘Organisation’ according to Schema.org. Web crawlers are hence able to infer that the information located within the div tag refers to information about a company. Additionally, the itemtype is assigned the properties ‘url’ and ‘logo’ along with their corresponding values. This allows search engines to identify the graphic as a company logo connected to the company in the process. Search engines, like Google, use this type of tagged graphic for creating knowledge graphs.
In the case of a brandlogo, the following Schema.org markup is used:
<div itemscope itemtype="http://schema.org/Brand"> <span itemprop="name">brand name</span> <img itemprop="logo" src="http://www.examplebrand.com/logo.png" /> </div>
The element within the itemscope has been labelled as a brand according to Schema.org. The brand’s name and logo (including its web location) are stated as properties.
<div itemscope itemtype="http://schema.org/Brand"> <span itemprop="name">Name der Marke</span> <img itemprop="logo" src="http://www.beispielmarke.de/logo.png" /> </div>
Labelling contact data with microdata
In addition to labelling images, semantically annotating contact data is also important for companies. This is because this information comprises a basis for extended search engine results, like side links or the knowledge graph. An extensive microdata markup of contact information can be seen in an example on Schema.org:
<div itemscope itemtype="http://schema.org/Organization"> <span itemprop="name">Google.org (GOOG)</span> Contact Details: <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> Main address: <span itemprop="streetAddress">38 avenue de l'Opera</span> <span itemprop="postalCode">F-75002</span> <span itemprop="addressLocality">Paris, France</span> </div> Tel: <span itemprop="telephone">( 33 1) 42 68 53 00 </span>, Fax: <span itemprop="faxNumber">( 33 1) 42 68 53 01 </span>, E-mail: <span itemprop="email">secretary(at)google.org</span> </div>
In the first line of code, the itemtype attribute defines the element in the div tag from line one to line 13 as ‘Organisation’. With different itemprop attributes, these are assigned the properties ‘name’, ‘address’, ‘telephone’, ‘faxNumber’, and ‘email’ with their corresponding values. So far, nothing different than what was explained in the previous examples. There is, however, an outlier in line four. The microdata syntax allows for properties’ values to also be items. Here, the information under ‘main address’ is nested via a second div element with its own itemscope and further defined as the itemtype, ‘PostalAddress’. This is then more specifically defined through the properties ‘streetAddress’, ‘postalCode’, and ‘addressLocality’.
Tagging website content for rich snippets with microdata
Rich snippets is a feature that allows excerpts of site content to be depicted in the search results. Semantically tagging certain information is crucial in order for it to be correctly processed by the search engine providers. With rich snippets, details like product information, recipes, user reviews, events, software applications, videos, and news articles are all displayed directly in the search engine result pages (SERPS), provided their corresponding source information has been tagged to be machine readable. The following example of a mock-hotel offer schematically displays how such information with Schema.org microdata syntax is tagged.
Normally, on travel and vacation portals, perspective guests are provided with information on the hotel’s name, a photo, and a description of the location. Here’s how the HTML code of this basic information would look when marked up with microdata according to Schema.org:
<div itemscope itemtype="http://schema.org/Hotel"> <span itemprop="name">hotel name</span> <span itemprop="description">hotel description</span> <img itemprop="image" src="http://Images/hotel.jpg" /> </div>
The attribute, itemtype, in line one refers to the predefined type, ‘Hotel’. In lines two to four, the properties ‘name’, ‘description’, and ‘image’ are assigned to ‘hotel’, along with their corresponding values.
Additional information in the form of itemprops or subordinateitemscopes can be added to this basic framework at users’ discretion. For this step, it’s important to pay attention that subordinate div elements are placed within the higher-ranking itemscope’s div tag. The following code adds a price quotation to the semantic annotation for hotel offers.
<div itemprop="makesOffer" itemscope itemtype="http://schema.org/Offer"> <span itemprop="price">400 pounds</span> </div>
The first line of code defines the property, ‘makeOffers’, as the itemtype, ‘Offer’. And as offers generally also require prices, this is added in line two with the itemprop, ‘price’, and is paired with the value, ‘400 pounds’.
Additionally, information on payment methods (itemprop="paymentAccepted"), locations (itemprop="map"), or user experiences (itemprop="reviews") can also be tagged with Schema.org’s vocabulary. The markup now looks like this:
<div itemscope itemtype="http://schema.org/Hotel"> <span itemprop="name">hotel name</span> <span itemprop="description">hotel description</span> <img itemprop="image" src="http://Images/hotel.jpg" /> <div itemprop="makesOffer" itemscope itemtype="http://schema.org/Offer"> <span itemprop="price">400 pounds</span> </div> <span itemprop="paymentAccepted">credit, debit, etc.</span> <span itemprop="map">map URL</span> <div itemprop="reviews" itemscope itemtype="http://schema.org/Review"> <span itemprop="name">review title</span> <span itemprop="author">author</span> <span itemprop="reviewBody">review text</span> <span itemprop="datePublished">review date</span> </div> </div>
In the first line, the item, ‘Hotel’ is defined as a higher-ranking type for the subsequent entries until line 15. After that, basic information, name, description, and image are labelled as properties of the item, ‘Hotel’. The hotel price is then labeled in the lines 05 to 07 via the subordinate itemscope attribute, ‘Offer”. After this, the payment term and location, which are assigned to item, ‘Hotel’, follow. The subordinate itemscope with the item, ‘Review’, labels the information in lines 10 to 15 as belonging to a user report. This report contains the properties title, author, review text, and date, each of which is labelled with their own itemprop.
In the case that a hotel user report should also appear with a star-rating scheme in the rich snippets, Google recommends the following markup:
<div itemscope itemtype="http://schema.org/Review"> <div itemprop="itemReviewed" itemscope itemtype="http://schema.org/Hotel"> <span itemprop="name">hotel name</span> </div> <span itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating"> <span itemprop="ratingValue">4</span> </span> stars - <b>"<span itemprop="name">title of review</span>"</b> <span itemprop="author" itemscope itemtype="http://schema.org/Person"> <span itemprop="name">author</span> </span> <span itemprop="reviewBody">review text</span> </div>
Breadcrumbs with microdata tagging
Breadcrumbs offer a further option for displaying additional information in the SERPS. This feature revolves around the markup of a site’s navigation structure, which gives search engine users the ability to exactly locate the webpage within a given internet presence. Here is a typical example for adding breadcrumb trails to HTML code:
<ol> <li> <a href="http://www.provider.com/hotels/">Hotels</a> </li> <li> <a href="http://www.provider.com/hotels/UK/">Hotels in the UK</a> </li> <li> <a href="http://www.providers.com/hotels/UK/London/">Hotels in London</a> </li> <ol>
The example above shows a list element (ordered list, ol) that contains the links of a website’s various subpages. In order to markup this navigation structure for programs, like web browsers or search engine crawlers, the following Schema.org microdata syntax is used:
<ol itemscope itemtype="http://schema.org/BreadcrumbList"> <li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a itemprop="item" href="http://www.provider.com/hotels/"> <span itemprop="name">Hotels</span></a> <meta itemprop="position" content="1" /> </li> <li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a itemprop="item" href="http://www.provider.com/hotels/UK/"> <span itemprop="name">Hotels in the UK</span></a> <meta itemprop="position" content="2" /> </li> <li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a itemprop="item" href="http://www.provider.com/hotels/UK/London/"> <span itemprop="name">Hotels in London</span></a> <meta itemprop="position" content="3" /> </li> </ol>
The list element, <ol>, is defined with the itemscope attribute as an item and is assigned via itemtype to the schema, ‘BreadcrumbList’. For every ‘breadcrumb’ located within the site’s navigation structure, an individual itemscope with the itemtype, ‘ListItem’ is opened. Each of the breadcrumb trail’s list items is then assigned the itemprops ‘name’, ‘item’, and ‘position’ as properties. Marked up in machine-readable form, these contain the values for the name, URL, and the position of a given breadcrumb within the breadcrumb trail.