What is log analysis and what does it reveal about your visitors?

Contents

Web servers automatically create log files that record every access. These files provide valuable insights into visitors, their origin, and their behaviour. With focused log file analysis, you can detect errors, identify bots, and optimise your SEO strategy.

What is log analysis?

Log analysis is the targeted evaluation of log files—records automatically created by a web server or application. It can be applied in many areas, including:

tracking database or email transmission errors,
monitoring firewall activity,
detecting security issues or attacks,
and analysing website visitor behaviour.

In the context of web analytics and search engine optimisation (SEO), log file analysis is especially valuable. Reviewing server logs can provide details such as:

IP address and hostname
access times
browser and operating system used
referrer link or search engine, including keywords
approximate session length (based on timestamps, though not exact)
number of pages viewed and their sequence
last page visited before exit

This data makes it possible to spot crawling problems, find error sources, and analyse mobile versus desktop usage. Because log files are often very large, manual evaluation is impractical. Specialised tools help by processing and visualising the information—leaving the main task of interpreting results and taking action to improve SEO, security, or performance.

Common issues and solutions in web server log analysis

When analysing log files, you quickly run into methodological limits. The main reason is that the HTTP protocol is stateless—each request is logged independently. To still generate reliable insights, several approaches are available.

Tracking sessions

By default, the server treats every page view as a separate request. To capture a visitor’s entire journey, session IDs can be applied. These are usually stored in cookies or added to the URL as parameters. While cookies are not included in log files, URL parameters require more programming effort and can cause duplicate content, which poses an SEO risk.

Uniquely identify users

Another approach is to link accesses by IP address. However, this method is limited since many users have dynamic IPs or share an address—for example, when using proxy servers. In addition, full IP addresses are classified as personal data under the GDPR. For this reason, they should either be anonymised or stored only for a short period.

Detect bots and crawlers

Server logs record not only visits from real users but also requests from search engine crawlers and bots. These can be spotted using User-Agent headers, specific IP address ranges, or distinct access patterns. For accurate results, it’s essential to identify bots and filter them out from genuine user activity.

Limitations due to caching and resources

Because of caching by browsers or proxy servers, not every user request reaches the web server. As a result, some visits only appear in the log as status code 304 (‘Not Modified’). In addition, log files from high-traffic websites can grow very large, taking up storage and processing resources. Techniques such as log rotation, data aggregation, or scalable solutions like the Elastic Stack (ELK) help manage these challenges effectively.

Missing metrics

Log files deliver valuable technical insights but don’t cover all metrics important for web analysis. Figures like bounce rate or precise time on site are either missing or can only be approximated indirectly. For this reason, log file analysis is best used as a complement to other analytics methods.

rankingCoach

Boost sales with AI-powered online marketing

Improve your Google ranking without paying an agency
Reply to reviews and generate social media posts faster
No SEO or online marketing skills needed

How to analyse log files

To see how log file analysis works in practice, it helps to look at the structure of a typical log file. A common example is the Apache web server log (access.log), which is automatically generated in the Apache directory.

What information does the Apache log contain?

Entries are saved in the Common Log Format (also known as the NCSA Common Log Format). Each entry follows a defined syntax:

%h %l %u %t "%r" %>s %b

Each component of the log entry represents specific information:

%h: IP address of the client
%l: Identity of the client. This is usually not determined and often appears as a dash (–), indicating missing information.
%u: User ID of the client. Assigned when directory protection with HTTP authentication is used; typically not provided.
%t: Timestamp of the access
%r: Details of the HTTP request (method, requested resource, and protocol version)
%>s: Status code returned by the server
%b: Size of the response in bytes

An example of a complete entry in the access.log might look like this:

203.0.113.195 - user [10/Sep/2025:10:43:00 +0200] "GET /index.html HTTP/2.0" 200 2326

This entry shows the following: a client with the IP address 203.0.113.195 requested the file index.html via HTTP/2.0 on 10 September 2025 at 10:43 AM. The server returned status code 200 (‘OK’) and delivered 2,326 bytes.

In the extended Combined Log Format, additional details can be recorded, such as the referrer (%{Referer}i) and the user-agent (%{User-agent}i). These reveal the page from which the request originated and the browser or crawler used. Beyond access.log, Apache also generates other log files, including error.log, which records error messages, server problems, or failed requests. Depending on the configuration, SSL logs and proxy logs can also be available for analysis.

Initial evaluations with spreadsheet

For smaller datasets, log files can be converted into CSV format and imported into tools such as Microsoft Excel or LibreOffice Calc. This allows you to filter entries by criteria like IP address, status code, or referrer. However, because log files can grow very large, spreadsheets are only practical for short-term snapshots.

Specialised log file analysis tools

For larger projects or ongoing evaluation, specialised tools are more effective. Examples include:

GoAccess: An open-source tool that generates real-time dashboards directly in the browser.
Matomo Log Analytics (Importer): Imports log files into Matomo, enabling analysis without page tagging.
AWStats: Delivers clear reports and statistics with a strong focus on efficiency.
Elastic Stack (ELK: Elasticsearch, Logstash, Kibana): Designed for scalable storage, querying, and visualisation of large log datasets.
Grafana Loki + Promtail: Well-suited for centralised log collection and analysis using Grafana dashboards.

For very large environments, it’s also important to use log rotation. This process automatically archives or deletes older log files, freeing up storage space and maintaining stable performance. When combined with solutions like the ELK Stack or Grafana, millions of log entries can be processed and analysed efficiently.

Log file analysis and data protection

Analysing server log files always touches on data protection, since personal data is regularly processed. Two aspects are particularly important:

1. Storage and server location

A major benefit of log file analysis is that the data can remain entirely within your own infrastructure. By storing and processing the logs on your own servers, you retain complete control over sensitive information such as IP addresses or access patterns. This significantly reduces the risk of data leaks, unauthorised third-party access, or compliance breaches.

If you rely on external hosting providers, the server location becomes crucial. Data centres located in your own country or region usually make it easier to comply with local data protection laws and industry regulations. For example, U.S.-based companies must ensure that their provider complies with U.S. privacy regulations, while European companies are bound by GDPR and often prefer EU-based servers.

2. Handling IP addresses

IP addresses are generally classified as personal data under data protection laws. Their processing must therefore have a legal basis—typically ‘legitimate interest’, such as ensuring IT security or troubleshooting.

Best practices include:

anonymising or truncating IP addresses as early as possible,
limiting retention periods (e.g., 7 days),
implementing clear deletion policies,
and transparently informing users in the privacy policy.

In addition, the Telecommunications-Telemedia Data Protection Act (TTDSG) applies whenever information from a user’s device is accessed, for instance through cookies or pixels.

By collecting data sparingly, anonymising it promptly, and being transparent with users, log file analysis can be performed in compliance with data protection laws—allowing you to benefit from its insights without legal risk.

Analyse server log files as a solid foundation for your web analysis

Log file analysis is a reliable way to measure the success of a web project. By continuously monitoring traffic and user behaviour, you can adapt your content and services to better match your target audience’s needs. One advantage over JavaScript-based tracking tools like Matomo or Google Analytics is that log files still record data even if scripts are blocked. However, metrics such as bounce rate or exact time on site are missing, and factors like caching or dynamic IP addresses can reduce accuracy.

Even with these limitations, server log files provide a strong and privacy-friendly basis for web analysis. They are especially useful for distinguishing between desktop and mobile access, detecting bots and crawlers, or identifying errors such as 404 pages. When combined with other analytics methods, they offer a more complete picture of how your website is being used.

Image: ION_UK_TVC_25-Q3_PUR_SMB_BAN_MyWebsite_960x320_DigitalGuide.png

Image: ION_UK_TVC_25-Q3_PUR_SMB_BAN_MyWebsite_1200x1200_DigitalGuide.png

Stay on top of AI!

What type of information is stored in a log file?

Log files contain a range of information on system processes, programs, and services. Whether it’s an operating system, database, or antivirus software: all of this information on relevant processes is stored in each respective log file. This enables extensive data recovery and…

Encyclopedia
Big Data
Data Analysis

sakkmesterkeShutterstock

How to anonymize Links

Privacy is one of the most important aspects of the Internet. Whoever wants to protect themselves will hopefully begin with one of the most basic elements of the World Wide Web: the hyperlink. References from one webpage to the other usually lead in one direction only –…

Encyclopedia
Tutorials

alphaspirit.itShutterstock

Referral spam: attack patterns and countermeasures

Do you find that lots of the traffic that arrives on your site comes from suspicious sources? There’s no need to worry since website operators encounter inconsistencies in reports from analysis tools (such as Google Analytics, Piwik, or etracker). The reason: referral spam…

Security