Whether you’re sitting at a desktop PC, reading the news on a tablet, or operating a website on a server, there are many different processes taking place in the back­ground of these devices. Should an error occur, or should you simply just want to find out more about which actions a given operating system or program is executing, then log files can help you on this front. These are auto­mat­ic­ally recorded by virtually every ap­plic­a­tion, server, and database system.

Generally, log files are rarely read and evaluated — think of them as a virtual black box of sorts: only in the most urgent of cases are they inspected. Due to the manner in which they capture data, log files prove to be an excellent source for finding out more about program and system errors; they also lend them­selves par­tic­u­larly well to gathering in­form­a­tion on user behaviour. The ability to find out more about users makes this tech­no­logy es­pe­cially in­ter­est­ing for website operators, as they are able to gain useful data from the log files located on their web servers. 

What is a log file?

Log files, which are sometimes referred to as event files, generally deal with common text files. These contain in­form­a­tion on all processes that have been defined as being relevant by their cor­res­pond­ing pro­gram­mers. When it comes to a database’s log file, this shows all the changes made to correctly executed trans­ac­tions. If part of a database is deleted, e.g. in the course of a system shutdown, log files act as a basis for re­cov­er­ing the data set to its proper state.

Log files are auto­mat­ic­ally generated according to how they’ve been pro­grammed. It’s also possible to create your own files, provided you’re familiar enough with the technical aspects involved. Generally, a line within a log file contains the following in­form­a­tion:

  • Recorded events (e.g. program start)
  • Time stamp, which assigns a date and time to the event

Normally, the time is put on first in order to display the chro­no­lo­gic­al sequence of events.

Typical ap­plic­a­tion for log files

Operating systems generally create multiple protocol files by assigning the different process types to fixed cat­egor­ies. For example, Windows systems record in­form­a­tion on ap­plic­a­tion events, system events, security-related events, set-up events, and redirect events. This allows ad­min­is­trat­ors to get an insight into cor­res­pond­ing log file in­form­a­tion, which can assist them in their troubleshoot­ing; Windows log files also display which users have logged on and off the system. In addition to the operating system, the following programs and systems collect com­pletely different data:

  • Back­ground programs, like e-mails, databases, or proxy servers generate log files that are primarily used to record error and event messages as well as other notices. These functions help secure, and in the event of a crash, restore data.

  • Installed software, like official programs, games, instant mes­sen­gers, firewalls, or virus scanners, save many different types of data in log files. Different con­fig­ur­a­tions or chat messages may be involved in this process. Instances of program crashes are compiled and used to help speed up troubleshoot­ing efforts.

  • Servers (es­pe­cially web servers) record relevant network activity; this in­form­a­tion contains useful data on users and their behavior within networks. What’s more, au­thor­ised ad­min­is­trat­ors are granted in­form­a­tion on which users started an ap­plic­a­tion or requested a file, what time and for how long they did this, and which operating system was used. Web log analysis is one of the oldest web con­trolling methods and one of the best examples for show­cas­ing the many uses of log files. 

Web server log files: the textbook example for the potential of log files

Ori­gin­ally, log files of web servers, like Apache or Microsoft IIS, were the default options for recording and repairing pro­cessing errors. It was quickly dis­covered, however, that web server log files contain much more valuable data: in­form­a­tion on the usability and pop­ular­ity of websites hosted on servers as well as user data such as:

  • Time of page view
  • Number of page views
  • Session duration
  • IP address and user’s host name
  • In­form­a­tion on the re­quest­ing client  (usually the browser)
  • Search engine used, including search queries
  • Applied operating system

A typical entry of a web server log file looks as follows:

183.121.143.32 - - [18/Mar/2003:08:04:22 +0200] "GET /images/logo.jpg HTTP/1.1" 200 512 "http://www.wikipedia.org/" "Mozilla/5.0 (X11; U; Linux i686; de-DE;rv:1.7.5)"

Detailed overview of in­di­vidu­al para­met­ers:

Sig­ni­fic­ance Example value Ex­plan­a­tion
IP address 183.121.143.32 The re­quest­ing host’s IP address
Idle - Generally unknown RFC 1413 identity
Who? - Reveals user name, provided the HTTP au­then­tic­a­tion has taken place; otherwise, as is the case in this example, it remains empty
When? [18/Mar/2003:08:04:22 +0200] Time stamp con­sist­ing of date, time, and time offset in­form­a­tion
What? GET /images/logo.jpg HTTP/1.1 The occurred event, in this case an image request via HTTP
Ok 200 Confirms suc­cess­ful request (HTTP status code 200)
How much? 512 If ap­plic­able: the amount of trans­ferred data in bytes
From where? http://www.wikipedia.org/ The web address from which the files are requested
By which means? Mozilla/5.0 (X11; U; Linux i686; de-DE;rv:1.7.5) Technical in­form­a­tion about the client: browser, operating system, kernel, user interface, voice output, version

In order to ef­fect­ively evaluate the flood of in­form­a­tion, tools, like Webalizer have been developed. These take collected data and transform it into in­form­at­ive stat­ist­ics, tables, and graphics. Tend­en­cies regarding a website’s growth, the user friend­li­ness of in­di­vidu­al pages, or relevant keywords and themes can all be de­term­ined using this in­form­a­tion. Even if web server log file analyses continue to be carried out, this tried and true method has lost some of its former sheen due to in­creas­ingly popular methods of web analysis, like Cookies or page tagging. Some things pushing this trend include the error-prone nature of log file analysis when assigning sessions as well as the fact that website operators often aren’t able to access a web server’s log files. Despite this drawback, all error reports are im­me­di­ately re­gistered. Moreover, data collected from a log file analysis is kept directly within the company.

Go to Main Menu