| |
Posted on
updated

Table of Contents

Hitman counts your hits, man.

Recently, someone in a community I'm part of asked the following:

I was thinking about how we used to have website hit counters in the 2000s and I was wondering -- has anyone put a hit counter on your personal website?

Some people had, it turns out, but many had not. Among the had-nots was me, and I decided to do something about it. The bottom line up front is that you can see it in action right now at the bottom of this very page, and if you want, check out the code here; it's called Hitman!

What's the problem?

Back in the day1, there was basically only one way to have a website: you have a Linux box, running the Apache webserver, with PHP enabled, and a MySQL database to hold state; this is your classic LAPM stack, obviously. If this is your website, adding a visible counter is trivial: you just use PHP to do server side rendering of the count after a quick SQL query. And because this was basically the only way to have a website, lots of "website operators" put hitcounters on their site, because why not?

But this is the year 2024, and we do things differently these days. This blog, for example, is built with a "static site generator" called Zola, which means that there's no server side rendering, or any other kind of dynamic behavior from the backend. It's served by a small Linux VPS that's running the Caddy webserver, and costs about five bucks a month to run. If I wanted to have a hitcounter, I'd have to do something non-traditional.

What's the solution?

For me, it turned out to be a sidecar microservice for counting and reporting the hits. As usual these days, my first instinct is to reach for Axum, a framework for building servers in Rust, and to use SQLite for a database. Caddy proxies all requests to the hit-counting URL to Hitman, which is listening only on localhost.

That sounds simple

Ha ha, it does, doesn't it? And in the end, it actually kinda is. But there are a few nuances to consider.

Privacy

The less I know the better, as far as I'm concerned, and I didn't see any reason to know more than I already did with this, but I'd need to track the IP of the client that was doing the request in order to de-duplicate views. Someone linked to this post about how the author uses a notional CSS load to register a hit, and also how they hash the IP with the date to keep the counts down to one per IP per day. They're doing quite a bit more actual "analytics" than I'm interested in, but I liked the other idea. They mention scrubbing the hashes from their DB every night in order to pre-emptively satisfy an overzealous GDPR regulator2, but I had a better idea, which was to hash the IP+date with a random number that is not disclosed, and is regenerated every time the server restarts.

I wound up hashing with the date + hour, along with the page, IP, and the secret. This buckets views to one per IP per page per hour, vs the once per day from the bearblog.

Security?

I spent some time on this, but ultimately realized that there's

  • not much I can do, but
  • not much they can do, either.

The server rejects remote origins, but the Origin headers can be trivially forged. On the other hand, the worst someone could do is add a bunch of junk to my DB, and I don't care about the data that much; this is all just for funsies, anyway!

Still, after writing this out, I realized that someone could send a bunch of junk slugs and hence fill my disk from a single IP, so I added a check against a set of allowed slugs to guard against that. Beyond that, I'd need to start thinking about being robust against a targeted and relatively sophisticated distributed attack, and it's definitely not worth it.

The front end

I mentioned that this blog is made using Zola, a static site generator. Zola has a built-in templating system, so the following bit of HTML with inlined JavaScript is enough to register a hit and return the latest count:

<div class=hias-footer>
    <p>There have been <span id="hitman-count">no</span> views of this page.</p>
</div>

<script defer>
    const hits = document.getElementById('hitman-count');
    fetch("/hit/{{ page.slug }}").then((resp) => {
        if (resp.ok) {
            return resp.text();
        } else {
            return "I don't even know how many";
        }
    }).then((data) => {
        hits.innerHTML = data;
    });
</script>

Putting it all together

OK, all the pieces are laid out, but here's the actual setup on the backend:

Caddy

The Caddy configuration has the following:

proclamations.nebcorp-hias.com {
    handle /hit/* {
        reverse_proxy localhost:5000
    }
    handle {
        <all the other routes on the site>
    }
}

This means that requests to, eg, https://proclamations.nebcorp-hias.com/hit/hitman will register a hit for this post, and return the number of views so far.

systemd

I created a system user for the service, hitman, with a homedir in /var/lib/hitman, and added the following systemd unit file into /etc/systemd/system/hitman.service:

Description=Hitman
After=network.target network-online.target
Requires=network-online.target

[Service]
Type=exec
User=hitman
Group=hitman
ExecStart=/var/lib/hitman/hitman -e /var/lib/hitman/.env
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full

[Install]
WantedBy=multi-user.target

This will ensure the hitman service is running after boot, and will be restarted if it crashes:

$ systemctl status hitman.service
● hitman.service - Hitman
     Loaded: loaded (/etc/systemd/system/hitman.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-03-31 12:12:14 PDT; 4h 0min ago
   Main PID: 46338 (hitman)
      Tasks: 2 (limit: 1018)
     Memory: 948.0K
        CPU: 53ms
     CGroup: /system.slice/hitman.service
             └─46338 /var/lib/hitman/hitman -e /var/lib/hitman/.env

Hitman

Inside the /var/lib/hitman directory there's a .env file with the following content:

DATABASE_URL=sqlite:///${HOME}/.hitman.db
DATABASE_FILE=${HOME}/.hitman.db
LISTENING_ADDR=127.0.0.1
LISTENING_PORT=5000
HITMAN_ORIGIN=https://proclamations.nebcorp-hias.com

Coda

When I got this working, a friend said, "Drat, that means I need to follow through on my goal to write a little web-ring server." Something like two hours later, she had a working webring, and indeed, if you look at the bottom of this very page, you'll see the webring links; as she says, this Web 1.0 stuff is fun!


1

I think of the hitcounter era as the 90s, but that's because I'm older than the person who asked the question.

2

They don't mention scrubbing IPs from their logs, but they do mention having logs, so clearly the job to scrub the hit DB of hashes is just privacy kabuki.


:: , , , , ,