Home » Opinionated Tech 🌟 AI Research · Newsletter · ML Labs · About

Lean web analytics

I don't like Google Analytics. It is slow and invasive. So I'm removing it completely.

I'm still curious about the analytics, because it can answer questions like:

  • Which blog posts and articles are interesting to people?
  • Are there any broken links that I missed?
  • How do people find my content?

At the same time, I don't want to call 3rd party services, let companies set tracking cookies or store personal information for mining.

Here is the current setup:

  • Caddy - open source web server with automatic HTTPS
  • GoAccess - open source web log analyzer
  • Blog - custom static site generator similar to Pelican

Web server

Web traffic is served by caddy. It is a lean web server that handles SSL certificates out-of-the-box.

/etc/caddy/CaddyFile looks like this:

abdullin.com {
  root * /var/www/abdullin.com
  file_server
  encode zstd gzip

  handle_errors {
    @404 {
      expression {http.error.status_code} == 404
    }
    rewrite @404 /404.html
    file_server
  }

  log {
    output file /var/log/caddy/abdullin.com-access.json
  }
}

This serves contents of /var/www/abdullin.com. It also records structured access logs to /var/log/caddy/abdullin.com-access.json. These logs are rotated and eventually cleaned up.

Web Analytics

Analytics can be done with goaccess which has caddy plugin. Just install the latest version and execute:

goaccess abdullin.com-access.json --log-format CADDY --ignore-crawlers

goaccess_terminal_main.png

Or you can generate a html report:

goaccess access.json --log-format CADDY --ignore-crawlers -o report.html

goaccess_report.png

Goaccess configs are located at /etc/goaccess/goaccess.conf. You can enable referral details there. My overrides:

exclude-ip MY_IP

#comment these out
#ignore-panel REFERRERS
#ignore-panel KEYPHRASES

It is possible to download MaxMind GeoIP database and use it aggregate visits by country or city:

goaccess access.json --log-format CADDY --ignore-crawlers --geoip-database City.mmdb

Next

This approach is nice, but it stores IP addresses and doesn't display user interaction flows. We try to improve things in (Over) Designing privacy-first analytics.

Published: June 10, 2022.

Next post in Opinionated Tech story: Analyze website logs with clickhouse-local

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out