Microdata is an HTML5 spe­cific­a­tion from WHATWG (Web Hypertext Ap­plic­a­tion Tech­no­logy Working Group). The data format offers a meta syntax for marking up, or tagging, struc­tured data. By an­not­at­ing sematic contexts so that they’re machine readable, this HTML extension allows website operators to enrich content with metadata. Read out by programs, like web crawlers or browsers, this metadata comprises the basis of extended content present­a­tions and for pro­cessing web content through assistive systems, like screen readers. Struc­tured data is es­pe­cially relevant for search engine op­tim­isa­tion. This is due to the fact that semantic an­nota­tion makes indexing websites easier and allows search results to be enhanced with ad­di­tion­al in­form­a­tion. For data struc­tur­ing, microdata relies on a unified vocab­u­lary defined by Schema.org.

Microdata in com­par­is­on to other data formats

While the internet community is able to agree on the fact that HTML has to become more semantic somehow, choosing the right data format for marking up metadata still remains a con­tro­ver­sial topic. As a separate module for HTML, microdata was ori­gin­ally in­tro­duced as an al­tern­at­ive to the standard at the time, RDFa. The goal of this format was to achieve a sim­pli­fied syntax with com­par­able func­tion­al­ity to the status quo. One ad­di­tion­al advantage is its sim­il­ar­ity to the newest HTML version. Microdata came to prom­in­ence through the project, Schema.org, a joint ini­ti­at­ive organised by the leading search engine providers Google, Bing, Yahoo!, and Yandex; together they provide a microdata-based unified vocab­u­lary for semantic an­nota­tion. Whereas microdata was once the preferred data format of the market leader, today’s re­com­mend­a­tion from the Mountain Valley tech giant doesn’t seem as steadfast as it once did. In addition to microdata, the Schema.org-vocab­u­lary also continues to support markup with RDFa. JSON-LD is also beginning to come more into the limelight as a new script-based markup format. Be this as it may, this option isn’t supported by Google for all data types, making microdata a very current option.

Microdata Syntax

Microdata syntax is based on name-value pairs, known as items, and can best be described in three steps: first, an element is defined and labelled as an item. This item is then assigned to a certain type from Schema.org’s vocab­u­lary. Once the item type is defined, different prop­er­ties can then be allocated to it. Markup is carried out with the HTML5 at­trib­utes itemscope, itemtype, and itemprop:

  • itemscope: the HTML5 attribute, itemscope, is used as  part of a div tag in order to label a certain area as an item. This item is then further de­term­ined with the help of itemtype and itemprop.
  • itemtype: the HTML5 attribute, itemtype, can be applied to all elements tagged by the itemscope attribute; its purpose is to mark pre­defined types. This allows relevant elements of a website to be assigned to universal types according to Schema.org. These can then be read by all the con­ven­tion­al search engines.
  • itemprop: the HTML5 attribute, itemprop, indicates a property of a pre­vi­ously named itemtype. Just which prop­er­ties can be assigned to the re­spect­ive itemtype can be found on Schema.org’s website.   

In­cor­por­at­ing the at­trib­utes itemscope, itemtype, and itemprop into the HTML code can be carried out according to the following basic structure

Basic structure of microdata markups for an item:

<div itemscope itemtype="http://schema.org/type">
    <span itemprop="property">Item</span>
</div>

Microdata markup in practice

Like other formats for se­mantic­ally tagging HTML documents, microdata is supported by a set of classic HTML tags. In principle, microdata at­trib­utes are in­de­pend­ent of HTML tags. Therefore, se­mantic­ally empty HTML tags, like <div> and <span>, are es­pe­cially well suited to serve as support elements for microdata at­trib­utes.

<div></div> The div element defines a new text block; generally, an itemscope is both in­tro­duced and ter­min­ated with this tag.
<span></span> The span element defines a general inline area without any influence on browser rendering. For this reason, it’s used to markup an itemprop.

Tagging images with microdata

Embedding company logos con­sti­tutes a typical ap­plic­a­tion for se­mantic­ally an­not­at­ing website content. While human readers are able to identify a website graphic as a company logo, programs like web crawlers have to rely on microdata in order to ‘un­der­stand’ such contexts:

In the first line of code, a new div element is opened that en­com­passes both the URL in line two as well as the embedded image in line three. This un­spe­cif­ic div tag is labelled with the attribute, itemscope, as being an in­form­a­tion-bearing element. The itemtype attribute marks the type ‘Or­gan­isa­tion’ according to Schema.org. Web crawlers are hence able to infer that the in­form­a­tion located within the div tag refers to in­form­a­tion about a company. Ad­di­tion­ally, the itemtype is assigned the prop­er­ties ‘url’ and ‘logo’ along with their cor­res­pond­ing values. This allows search engines to identify the graphic as a company logo connected to the company in the process. Search engines, like Google, use this type of tagged graphic for creating knowledge graphs.

In the case of a brand logo, the following Schema.org markup is used:

Markup of brand logos:

<div itemscope itemtype="http://schema.org/Brand">
    <span itemprop="name">brand name</span>
    <img itemprop="logo" src="http://www.examplebrand.com/logo.png" />
</div>

The element within the itemscope has been labelled as a brand according to Schema.org. The brand’s name and logo (including its web location) are stated as prop­er­ties.

Markup eines Marken­lo­gos:

<div itemscope itemtype="http://schema.org/Brand">
<span itemprop="name">Name der Marke</span>
<img itemprop="logo" src="http://www.beispielmarke.de/logo.png" />
</div>

Labelling contact data with microdata

In addition to labelling images, se­mantic­ally an­not­at­ing contact data is also important for companies. This is because this in­form­a­tion comprises a basis for extended search engine results, like side links or the knowledge graph. An extensive microdata markup of contact in­form­a­tion can be seen in an example on Schema.org:

Markup of Google contact data:

<div itemscope itemtype="http://schema.org/Organization">
    <span itemprop="name">Google.org (GOOG)</span>
    Contact Details:
    <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
        Main address:
            <span itemprop="streetAddress">38 avenue de l'Opera</span>
            <span itemprop="postalCode">F-75002</span>
            <span itemprop="addressLocality">Paris, France</span>
    </div>
    Tel: <span itemprop="telephone">( 33 1) 42 68 53 00 </span>,
    Fax: <span itemprop="faxNumber">( 33 1) 42 68 53 01 </span>,
    E-mail: <span itemprop="email">secretary(at)google.org</span>
</div>

In the first line of code, the itemtype attribute defines the element in the div tag from line one to line 13 as ‘Or­gan­isa­tion’. With different itemprop at­trib­utes, these are assigned the prop­er­ties ‘name’, ‘address’, ‘telephone’, ‘faxNumber’, and ‘email’ with their cor­res­pond­ing values. So far, nothing different than what was explained in the previous examples. There is, however, an outlier in line four. The microdata syntax allows for prop­er­ties’ values to also be items. Here, the in­form­a­tion under ‘main address’ is nested via a second div element with its own itemscope and further defined as the itemtype, ‘PostalAd­dress’. This is then more spe­cific­ally defined through the prop­er­ties ‘streetAd­dress’, ‘postalCode’, and ‘ad­dress­Lo­c­al­ity’. 

Tagging website content for rich snippets with microdata

Rich snippets is a feature that allows excerpts of site content to be depicted in the search results. Se­mantic­ally tagging certain in­form­a­tion is crucial in order for it to be correctly processed by the search engine providers. With rich snippets, details like product in­form­a­tion, recipes, user reviews, events, software ap­plic­a­tions, videos, and news articles are all displayed directly in the search engine result pages (SERPS), provided their cor­res­pond­ing source in­form­a­tion has been tagged to be machine readable. The following example of a mock-hotel offer schem­at­ic­ally displays how such in­form­a­tion with Schema.org microdata syntax is tagged.

Normally, on travel and vacation portals, per­spect­ive guests are provided with in­form­a­tion on the hotel’s name, a photo, and a de­scrip­tion of the location. Here’s how the HTML code of this basic in­form­a­tion would look when marked up with microdata according to Schema.org:

Markup of basic in­form­a­tion for a hotel offer:

<div itemscope itemtype="http://schema.org/Hotel">
    <span itemprop="name">hotel name</span>
    <span itemprop="description">hotel description</span>
    <img itemprop="image" src="http://Images/hotel.jpg" />
</div>

The attribute, itemtype, in line one refers to the pre­defined type, ‘Hotel’. In lines two to four, the prop­er­ties ‘name’, ‘de­scrip­tion’, and ‘image’ are assigned to ‘hotel’, along with their cor­res­pond­ing values.

Ad­di­tion­al in­form­a­tion in the form of itemprops or sub­or­din­ate item­scopes can be added to this basic framework at users’ dis­cre­tion.  For this step, it’s important to pay attention that sub­or­din­ate div elements are placed within the higher-ranking itemscope’s div tag. The following code adds a price quotation to the semantic an­nota­tion for hotel offers.

Markup for hotel prices:

<div itemprop="makesOffer" itemscope itemtype="http://schema.org/Offer"> 
    <span itemprop="price">400 pounds</span>
</div>

The first line of code defines the property, ‘makeOffers’, as the itemtype, ‘Offer’. And as offers generally also require prices, this is added in line two with the itemprop, ‘price’, and is paired with the value, ‘400 pounds’.

Ad­di­tion­ally, in­form­a­tion on payment methods (itemprop="pay­mentAc­cep­ted"), locations (itemprop="map"), or user ex­per­i­ences (itemprop="reviews") can also be tagged with Schema.org’s vocab­u­lary. The markup now looks like this:

Extended markup for a hotel offer:

<div itemscope itemtype="http://schema.org/Hotel">
    <span itemprop="name">hotel name</span>
    <span itemprop="description">hotel description</span>
    <img itemprop="image" src="http://Images/hotel.jpg" />
    <div itemprop="makesOffer" itemscope itemtype="http://schema.org/Offer"> 
        <span itemprop="price">400 pounds</span>
    </div>
    <span itemprop="paymentAccepted">credit, debit, etc.</span> 
    <span itemprop="map">map URL</span> 
    <div itemprop="reviews" itemscope itemtype="http://schema.org/Review"> 
        <span itemprop="name">review title</span>
        <span itemprop="author">author</span>
        <span itemprop="reviewBody">review text</span>
        <span itemprop="datePublished">review date</span>
    </div>
</div>

In the first line, the item, ‘Hotel’ is defined as a higher-ranking type for the sub­sequent entries until line 15. After that, basic in­form­a­tion, name, de­scrip­tion, and image are labelled as prop­er­ties of the item, ‘Hotel’. The hotel price is then labeled in the lines 05 to 07 via the sub­or­din­ate itemscope attribute, ‘Offer”. After this, the payment term and location, which are assigned to item, ‘Hotel’, follow.  The sub­or­din­ate itemscope with the item, ‘Review’, labels the in­form­a­tion in lines 10 to 15 as belonging to a user report. This report contains the prop­er­ties title, author, review text, and date, each of which is labelled with their own itemprop.

In the case that a hotel user report should also appear with a star-rating scheme in the rich snippets, Google re­com­mends the following markup:

Markup for a user report with a rating:

<div itemscope itemtype="http://schema.org/Review">
    <div itemprop="itemReviewed" itemscope itemtype="http://schema.org/Hotel">
        <span itemprop="name">hotel name</span>
    </div>
    <span itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">
        <span itemprop="ratingValue">4</span>
    </span> stars -
    <b>"<span itemprop="name">title of review</span>"</b>
    <span itemprop="author" itemscope itemtype="http://schema.org/Person">
        <span itemprop="name">author</span>
    </span>
    <span itemprop="reviewBody">review text</span>
</div>

Bread­crumbs with microdata tagging

Bread­crumbs offer a further option for dis­play­ing ad­di­tion­al in­form­a­tion in the SERPS. This feature revolves around the markup of a site’s nav­ig­a­tion structure, which gives search engine users the ability to exactly locate the webpage within a given internet presence. Here is a typical example for adding bread­crumb trails to HTML code:

HTML markup of a bread­crumb trail:

<ol>
    <li>
        <a href="http://www.provider.com/hotels/">Hotels</a>
    </li>
    <li>
        <a href="http://www.provider.com/hotels/UK/">Hotels in the UK</a>
    </li>
    <li>
        <a href="http://www.providers.com/hotels/UK/London/">Hotels in London</a>
    </li>
<ol>

The example above shows a list element (ordered list, ol) that contains the links of a website’s various subpages. In order to markup this nav­ig­a­tion structure for programs, like web browsers or search engine crawlers, the following Schema.org microdata syntax is used:

Microdata markup for bread­crumbs with Schema-org:

<ol itemscope itemtype="http://schema.org/BreadcrumbList">
    <li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem">
        <a itemprop="item" href="http://www.provider.com/hotels/">
            <span itemprop="name">Hotels</span></a>
        <meta itemprop="position" content="1" />
    </li>
    <li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem">
        <a itemprop="item" href="http://www.provider.com/hotels/UK/">
            <span itemprop="name">Hotels in the UK</span></a>
        <meta itemprop="position" content="2" />
    </li>
    <li itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem">
        <a itemprop="item" href="http://www.provider.com/hotels/UK/London/">
            <span itemprop="name">Hotels in London</span></a>
        <meta itemprop="position" content="3" />
    </li>
</ol>

The list element, <ol>, is defined with the itemscope attribute as an item and is assigned via itemtype to the schema, ‘Bread­crum­b­List’. For every ‘bread­crumb’ located within the site’s nav­ig­a­tion structure, an in­di­vidu­al itemscope with the itemtype, ‘ListItem’ is opened. Each of the bread­crumb trail’s list items is then assigned the itemprops ‘name’, ‘item’, and ‘position’ as prop­er­ties. Marked up in machine-readable form, these contain the values for the name, URL, and the position of a given bread­crumb within the bread­crumb trail.

Go to Main Menu