Web servers auto­mat­ic­ally create log files that record every access. These files provide valuable insights into visitors, their origin, and their behaviour. With focused log file analysis, you can detect errors, identify bots, and optimise your SEO strategy.

What is log analysis?

Log analysis is the targeted eval­u­ation of log files—records auto­mat­ic­ally created by a web server or ap­plic­a­tion. It can be applied in many areas, including:

  • tracking database or email trans­mis­sion errors,
  • mon­it­or­ing firewall activity,
  • detecting security issues or attacks,
  • and analysing website visitor behaviour.

In the context of web analytics and search engine op­tim­isa­tion (SEO), log file analysis is es­pe­cially valuable. Reviewing server logs can provide details such as:

  • IP address and hostname
  • access times
  • browser and operating system used
  • referrer link or search engine, including keywords
  • ap­prox­im­ate session length (based on timestamps, though not exact)
  • number of pages viewed and their sequence
  • last page visited before exit

This data makes it possible to spot crawling problems, find error sources, and analyse mobile versus desktop usage. Because log files are often very large, manual eval­u­ation is im­prac­tic­al. Spe­cial­ised tools help by pro­cessing and visu­al­ising the in­form­a­tion—leaving the main task of in­ter­pret­ing results and taking action to improve SEO, security, or per­form­ance.

Common issues and solutions in web server log analysis

When analysing log files, you quickly run into meth­od­o­lo­gic­al limits. The main reason is that the HTTP protocol is stateless—each request is logged in­de­pend­ently. To still generate reliable insights, several ap­proaches are available.

Tracking sessions

By default, the server treats every page view as a separate request. To capture a visitor’s entire journey, session IDs can be applied. These are usually stored in cookies or added to the URL as para­met­ers. While cookies are not included in log files, URL para­met­ers require more pro­gram­ming effort and can cause duplicate content, which poses an SEO risk.

Uniquely identify users

Another approach is to link accesses by IP address. However, this method is limited since many users have dynamic IPs or share an address—for example, when using proxy servers. In addition, full IP addresses are clas­si­fied as personal data under the GDPR. For this reason, they should either be an­onymised or stored only for a short period.

Detect bots and crawlers

Server logs record not only visits from real users but also requests from search engine crawlers and bots. These can be spotted using User-Agent headers, specific IP address ranges, or distinct access patterns. For accurate results, it’s essential to identify bots and filter them out from genuine user activity.

Lim­it­a­tions due to caching and resources

Because of caching by browsers or proxy servers, not every user request reaches the web server. As a result, some visits only appear in the log as status code 304 (‘Not Modified’). In addition, log files from high-traffic websites can grow very large, taking up storage and pro­cessing resources. Tech­niques such as log rotation, data ag­greg­a­tion, or scalable solutions like the Elastic Stack (ELK) help manage these chal­lenges ef­fect­ively.

Missing metrics

Log files deliver valuable technical insights but don’t cover all metrics important for web analysis. Figures like bounce rate or precise time on site are either missing or can only be ap­prox­im­ated in­dir­ectly. For this reason, log file analysis is best used as a com­ple­ment to other analytics methods.

rank­ing­Coach
Boost sales with AI-powered online marketing
  • Improve your Google ranking without paying an agency
  • Reply to reviews and generate social media posts faster
  • No SEO or online marketing skills needed

How to analyse log files

To see how log file analysis works in practice, it helps to look at the structure of a typical log file. A common example is the Apache web server log (access.log), which is auto­mat­ic­ally generated in the Apache directory.

What in­form­a­tion does the Apache log contain?

Entries are saved in the Common Log Format (also known as the NCSA Common Log Format). Each entry follows a defined syntax:

%h %l %u %t "%r" %>s %b

Each component of the log entry rep­res­ents specific in­form­a­tion:

  • %h: IP address of the client
  • %l: Identity of the client. This is usually not de­term­ined and often appears as a dash (–), in­dic­at­ing missing in­form­a­tion.
  • %u: User ID of the client. Assigned when directory pro­tec­tion with HTTP au­then­tic­a­tion is used; typically not provided.
  • %t: Timestamp of the access
  • %r: Details of the HTTP request (method, requested resource, and protocol version)
  • %>s: Status code returned by the server
  • %b: Size of the response in bytes

An example of a complete entry in the access.log might look like this:

203.0.113.195 - user [10/Sep/2025:10:43:00 +0200] "GET /index.html HTTP/2.0" 200 2326

This entry shows the following: a client with the IP address 203.0.113.195 requested the file index.html via HTTP/2.0 on 10 September 2025 at 10:43 AM. The server returned status code 200 (‘OK’) and delivered 2,326 bytes.

In the extended Combined Log Format, ad­di­tion­al details can be recorded, such as the referrer (%{Referer}i) and the user-agent (%{User-agent}i). These reveal the page from which the request ori­gin­ated and the browser or crawler used. Beyond access.log, Apache also generates other log files, including error.log, which records error messages, server problems, or failed requests. Depending on the con­fig­ur­a­tion, SSL logs and proxy logs can also be available for analysis.

Initial eval­u­ations with spread­sheet

For smaller datasets, log files can be converted into CSV format and imported into tools such as Microsoft Excel or Lib­reOf­fice Calc. This allows you to filter entries by criteria like IP address, status code, or referrer. However, because log files can grow very large, spread­sheets are only practical for short-term snapshots.

Spe­cial­ised log file analysis tools

For larger projects or ongoing eval­u­ation, spe­cial­ised tools are more effective. Examples include:

  • GoAccess: An open-source tool that generates real-time dash­boards directly in the browser.
  • Matomo Log Analytics (Importer): Imports log files into Matomo, enabling analysis without page tagging.
  • AWStats: Delivers clear reports and stat­ist­ics with a strong focus on ef­fi­ciency.
  • Elastic Stack (ELK: Elast­ic­search, Logstash, Kibana): Designed for scalable storage, querying, and visu­al­isa­tion of large log datasets.
  • Grafana Loki + Promtail: Well-suited for cent­ral­ised log col­lec­tion and analysis using Grafana dash­boards.

For very large en­vir­on­ments, it’s also important to use log rotation. This process auto­mat­ic­ally archives or deletes older log files, freeing up storage space and main­tain­ing stable per­form­ance. When combined with solutions like the ELK Stack or Grafana, millions of log entries can be processed and analysed ef­fi­ciently.

Log file analysis and data pro­tec­tion

Analysing server log files always touches on data pro­tec­tion, since personal data is regularly processed. Two aspects are par­tic­u­larly important:

1. Storage and server location

A major benefit of log file analysis is that the data can remain entirely within your own in­fra­struc­ture. By storing and pro­cessing the logs on your own servers, you retain complete control over sensitive in­form­a­tion such as IP addresses or access patterns. This sig­ni­fic­antly reduces the risk of data leaks, un­au­thor­ised third-party access, or com­pli­ance breaches.

If you rely on external hosting providers, the server location becomes crucial. Data centres located in your own country or region usually make it easier to comply with local data pro­tec­tion laws and industry reg­u­la­tions. For example, U.S.-based companies must ensure that their provider complies with U.S. privacy reg­u­la­tions, while European companies are bound by GDPR and often prefer EU-based servers.

2. Handling IP addresses

IP addresses are generally clas­si­fied as personal data under data pro­tec­tion laws. Their pro­cessing must therefore have a legal basis—typically ‘le­git­im­ate interest’, such as ensuring IT security or troubleshoot­ing.

Best practices include:

  • an­onymising or trun­cat­ing IP addresses as early as possible,
  • limiting retention periods (e.g., 7 days),
  • im­ple­ment­ing clear deletion policies,
  • and trans­par­ently informing users in the privacy policy.

In addition, the Tele­com­mu­nic­a­tions-Telemedia Data Pro­tec­tion Act (TTDSG) applies whenever in­form­a­tion from a user’s device is accessed, for instance through cookies or pixels.

By col­lect­ing data sparingly, an­onymising it promptly, and being trans­par­ent with users, log file analysis can be performed in com­pli­ance with data pro­tec­tion laws—allowing you to benefit from its insights without legal risk.

Analyse server log files as a solid found­a­tion for your web analysis

Log file analysis is a reliable way to measure the success of a web project. By con­tinu­ously mon­it­or­ing traffic and user behaviour, you can adapt your content and services to better match your target audience’s needs. One advantage over JavaS­cript-based tracking tools like Matomo or Google Analytics is that log files still record data even if scripts are blocked. However, metrics such as bounce rate or exact time on site are missing, and factors like caching or dynamic IP addresses can reduce accuracy.

Even with these lim­it­a­tions, server log files provide a strong and privacy-friendly basis for web analysis. They are es­pe­cially useful for dis­tin­guish­ing between desktop and mobile access, detecting bots and crawlers, or identi­fy­ing errors such as 404 pages. When combined with other analytics methods, they offer a more complete picture of how your website is being used.

Go to Main Menu