The struggle to grab the top spot on Google’s (and other search engines) search results is an ongoing battle. It used to be con­sidered almost an SEO sport to use as many keywords as possible in their content, but now the art of search engine op­tim­isa­tion revolves around creating unique texts. Whether at the homepage, subpage, product page or category page of your website: exclusive, relevant content that differs in terms of copy­writ­ing and keyword usage from peer-to-peer reviewers is key when it comes to out­per­form­ing the com­pet­i­tion and being placing first in the results. A term that is in­creas­ingly used in this context is the WDF*IDF analysis or formula.

What is WDF*IDF?

WDF*IDF is an analysis method that can be used within the scope of search engine op­tim­isa­tion to determine keywords and terms that sus­tain­ably increase the relevance of published texts, and, therefore, the entire website. It’s a formula that mul­ti­plies the two values of Within Document Frequency (WDF) and Inverse Document Frequency (IDF). The result is the relative term frequency (also term weighting) of a document, relative to all other web documents which also contain the keyword included in the analysis. Before the WDF*IDF analysis can be run, you first need to determine the two factors mentioned.

How to determine the Within-Document-Frequency (WDF) value

The WDF describes how often a par­tic­u­lar term occurs in a document compared to all other terms it contains. To increase the validity of the de­term­ined value, the formula is based on a logarithm that prevents the central term from being weighted too heavily. The term was first mentioned in 1992 in the work of Donna Harman, in which her article “Ranking Al­gorithms” features the term WDF as a way to give words of a par­tic­u­lar document a weighting value useful for in­form­a­tion science. In Website Op­tim­isa­tion, the WDF value has been used for some time as an al­tern­at­ive to the less flexible keyword density value, which merely reflects the relative abundance of a key term. The formula for de­term­in­ing the Within Document Frequency is:

The in­di­vidu­al com­pon­ents of the equation can be explained as follows:

i Term that you are using the Within Document Frequency to determine the frequency of
j Document to be analyzed
Lj Total number of words in the "j" document
Freq(i,j) Frequency of the word"i" in the document "i"
log2 Logarithm of the number x to the power of 2

Therefore, the WDF value for a term “I” in the document “j” is de­term­ined by adding the frequency of the term and 1 and dividing it by the total number of words in that document. Both values use the logarithm “log2”, which gives you more mean­ing­ful results for the term than it does in de­term­in­ing pure keyword density or relative frequency. An example can il­lus­trate this:

An examined term that appears 50 times in a 1,000-word document has a Within Document Frequency of 0.57. The relative frequency, in this case, is 5 percent. If you increase the frequency of the term now for op­tim­iz­a­tion purposes, to say 500, you get a WDF value of 0.9 (rounded) – i.e. a value that is around 1.5 times higher than in the original text. On the other hand, if you choose the relative value (which has now risen to 50 percent) as the basis, you will see an increase 10 times the original value.

How to determine the Inverse Document Frequency (IDF) value

Inverse Document Frequency (IDF) is a value that measures the meaning of a term, not by its frequency in a par­tic­u­lar document, rather by its dis­tri­bu­tion and use through­out the body of the document: the more potential a concept has, the higher the Inverse Document Frequency. The optimal case is that a term is very common in just a few documents. On the other hand, words that appear in almost every document or appear only rarely are of minor im­port­ance. For example, the word “imprint” has a very low IDF value because it is used on almost every website.

To calculate the inverse document frequency value, the following formula is needed (it also uses a logarithm to adjust the results):

The different com­pon­ents of the IDF equation can be explained as follows:

i Term, that the Inverse Document Frequency is being de­term­ined for
log Logarithm of the number x to base 10 or to any basis b
ND Number of documents in the result sets (con­tain­ing relevant terms)
fi Number of documents where the term i occurs

Therefore, to determine the IDF value of a term "i", divide the total number of (relevant) documents contained in the result sets by the number of documents con­tain­ing the term and then add the number 1. Finally, take the logarithm “log” from the result of that cal­cu­la­tion.

How is the number of all relevant documents in the result set cal­cu­lated?

Adding ND means that the IDF formula cannot be uniformly de­term­ined. Instead, it is the result of the frequency of all mean­ing­ful words in the examined document, as well as the un­der­ly­ing absolute number of documents. However, when analysing web documents for SEO purposes, the potential results are huge, as all pages indexed by Google (or other search engines) are eligible. Nev­er­the­less, to obtain a specific value, the number of search results of all relevant terms in the document is de­term­ined and added. For example, in a highly sim­pli­fied document that only contains the words “Search Engine Op­tim­isa­tion” (17,300,00 search results, December 2017) and “Web Analytics” (2,200,000 search results, December 2017), has a Nvalue of 19,500,000.

WDF*IDF: The com­bin­a­tion of both formulae

Because Within Document Frequency rep­res­ents the relevance of a term within a par­tic­u­lar document and the Inverse Document Frequency can reflect the role of a term relative to all of the search result documents, merging both values provides deep insights into the actual term frequency and potential of the term to optimise existing text content. To this purpose, it is only necessary to multiply both values, which results in the following overall formula for the WDF*IDF analysis and help determine the most exact, usable term frequency:

In principle, it means bringing all the important com­pon­ents together and using them to determine the validity of terms used in webtexts. Of course, the bigger the database, the more mean­ing­ful the results are. However, to make the WDF*IDF analysis useful for search engine op­tim­isa­tion, it must be applied to all mean­ing­ful words within a document. This would simply be too much effort to do manually, which is why using the WDF*IDF tool is part of any serious rep­er­toire when cal­cu­lat­ing term weighting. On the one hand, these programs (see below) help to analyse the existing textual material. On the other hand, they also provide clues as to which concepts a document lacks in order to be as unique and relevant as possible.

Con­clu­sion

The frequency of the term "i" in the document "j" can be de­term­ined by mul­tiply­ing the Within Document Frequency of the term "i" in the document "j" by the inverse document frequency of the term "i" through­out the set results.

The benefits of WDF*IDF for Search Engine Op­tim­isa­tion

The ad­vant­ages of a com­pre­hens­ive WDF*IDF analysis are obvious: the values obtained for weighting key terms serve as perfect landmarks for writing texts so that:

  • they have high relevance for search engines
  • they cover topics which do not have a lot of com­pet­i­tion
  • they do not have any keyword spam
  • and are as unique as possible

Anyone who is dis­sat­is­fied with his or her own website rankings or strives for improved op­tim­isa­tion has helpful ally by utilising WDF*IDF values. Based on analysis data, copy­writers can create concrete guidelines for revising their content that aren’t just aimed at in­creas­ing the keyword density or in­cor­por­at­ing other keywords into the text.

Note

For all the use­ful­ness of a thorough WDF*IDF analysis, you should never forget that content is written primarily for readers and not for search engines. In addition, since the former is getting better and better at capturing texts se­mantic­ally, in the long run, there is simply no way around strong content in which keywords and other technical additions play just a minor part.

What are the weak aspects of WDF*IDF analyses?

Although WDF*IDF provides very valuable input for website op­tim­isa­tion, there are a few issues that should be con­sidered before analysing and eval­u­at­ing results. For example, a fun­da­ment­al problem is that a WDF*IDF analysis always includes all the textual elements of a document, whether they are headings, category/product de­scrip­tions, or captions. Dif­fer­en­ti­ation of the in­di­vidu­al com­pon­ents won’t take place. Even if only one paragraph is too keyword-heavy or contains too few ele­ment­ary terms, the analysis method won’t provide a sat­is­fact­ory answer, since the frequency weighting is always evaluated for the entire document.

Tip

Before con­sid­er­ing a WDF*IDF analysis for your own website, you should carefully check whether the embedded content is suitable for the term frequency analysis method. In addition, the results obtained should be carefully scru­tin­ised in order to detect potential fallacies (too small a database, for example) that need to be avoided.

Another weakness of the WDF*IDF formula is that it only gets really in­ter­est­ing with a high word count. For shorter passages like product de­scrip­tions, smaller blog entries or news articles, the analysis does not provide mean­ing­ful, usable results. This is why it’s often un­suit­able for certain websites like online stores or news portals. For sites that rely on heavy editorial work, the drawback is that WDF*IDF analysis is difficult to in­cor­por­ate into the standard workflow. Since fast response times and up-to-datedness are par­tic­u­larly in demand here, op­tim­ising the texts after pub­lish­ing would be a practical, if complex, solution.

An overview of the ad­vant­ages and dis­ad­vant­ages of the WDF*IDF analysis

Ad­vant­ages of the WDF*IDF analysis Dis­ad­vant­ages of WDF*IDF analysis
Provides a great op­por­tun­ity to expose existing keyword spam Always examines the complete text content of a document
Makes relevance and unique­ness crucial criteria for frequency weighting in the fore­ground Provides no in­form­a­tion about special para­graphs or passages that are worth op­tim­ising
Rates terms with lower com­pet­i­tion better than highly com­pet­it­ive ones Not suitable for short texts with few words
Unites the dis­cip­lines of document specific and cross-dis­cip­lin­ary analysis Hard to integrate into work processes which pri­or­it­ise timeli­ness and re­spons­ive­ness
Flattens results through log­ar­ithms for more mean­ing­ful results Precise number of all relevant documents is difficult to determine

What WDF*IDF tools are there?

There are several tools that can be used to perform a WDF*IDF analysis. It is important to dis­tin­guish between ap­plic­a­tions that are only part of an SEO suite and those that are available as stan­dalone solutions. Depending on the range of functions and the usage options, the in­di­vidu­al tools differ in terms of cost. To give a brief overview of the variety of ap­plic­a­tions, we have compiled some of the best WDF*IDF tools in the following list:

  • OnpageDoc: If you would like to analyse and optimise your websites‘ SEO status, you can use OnpageDoc, the complete package from SAC Solutions GmbH in Cologne, Germany. If you take out a monthly sub­scrip­tion, you’ll have access to a variety of features to review and improve keywords, meta tags, backlinks and more. A WDF*IDF tool for term weighting analysis and targeted com­pet­it­ive com­par­is­on is also part of the portfolio. Those who do not want to access the entire suite can also download the tool for free at wdfidf-tool.com. However, the problem is that the number of possible queries is limited to 100 queries per hour (common to all users).
  • SEOlyze: Semantic analysis and research based on the WDF*IDF principle can also be done with the paid content analysis section of SEOlyze. Helminger GmbH, which is based in Austria, focuses on helping clients perfect website content and offers various tools like a W-questions tool for research, a duplicate content checker or read­ab­il­ity analyses (Flesch/Wiener factual text formula) to achieve this. The center­piece, however, is the com­pre­hens­ive WDF*IDF analysis function, the results of which can be im­ple­men­ted directly into the SEOlyze interface, thanks to the in­teg­rated editor. In addition to the WDF*IDF tool, the SEO suite includes various rank-tracking features, as well as several other tools for general on-page op­tim­isa­tion (keyword analysis, metadata checker, images, links, etc.).
     
  • XOVI: XOVI, a sub­si­di­ary of Plesk since 2017, provides its customers with a SEO suite that leaves little to be desired. The chargeable XOVI Toolbox, which is available in multiple languages, has three different models on offer (Pro, Business and En­ter­prise). It also includes tools to keep track of ads, traffic, keywords, backlinks and social signals. The XOVI Tex­tOp­tim­izer also includes a WDF*IDF text tool that not only cal­cu­lates the relevance of terms used and suggests other terms based on the first ten Google search results pages, but also allows for direct editing.
     
  • Seobility: Seobility offers numerous SEO tools free of charge on their homepage – such as a simple WDF*IDF tool. The web ap­plic­a­tion allows users to parse the weighting of a term based on the WDF*IDF formula. In addition, the tool plays other terms (including frequency value) that match the word you are looking for. Access to the Seobility program is limited to five analyses per day per user. Users who create an account can have access to the advanced search settings in order to, for example, adjust the base of the logarithm, increase the number of con­sidered search results or select the platform (desktop/mobile) to optimise for. 
Go to Main Menu