One of the saddest trends in web dev over the past decade or so is the shift awa...

dewey · on March 14, 2023

> but website stats is one of the biggest losses in modern web dev

That seems a bit overly dramatic. If you look at your log files these days you'll see that a high percentage of it is just scrapers, bots, scripts trying to access wp-admin etc.

Collecting this information somewhere else (instrumented backend, client-side script) makes a lot more sense now where you can filter out noise more easily. There's also very light client-side scripts like https://plausible.io which are a nice trade-off between privacy and useful information while not being too heavy.

8organicbits · on March 14, 2023

Request logs are a source of truth for certain metrics. A tracker service may get you more metadata than the request line in the logs, but something like 25% of internet users use ad block, which often blocks trackers as well (I block plausible). Not seeing metrics for bots, scrapers, and x% of users can really mess up certain metrics.

dewey · on March 14, 2023

25% seems way too high if you are not talking about some tech audiences focused website. Especially on mobile not many people install an as blocker, I would be surprised if it’s more than 1% globally.

8organicbits · on March 14, 2023

I'm not sure about the number, but I was quoting https://www.statista.com/statistics/804008/ad-blocking-reach...

_jpys · on March 14, 2023

Plausible is not lightweight. Its codebase is large for the problem its solving. JavaScript is not the way for analytics.

ihucos · on March 14, 2023

I actually agree. And with more and more (Personal impression) users using blockers like uBlock origin the meaningfulness of data collected with JS gets eroded. But if there is no alternative then what some web analytics providers provide (counter.dev doesn't) is means to try to circumvent the blockers (For example suggesting proxies: https://plausible.io/docs/proxy/introduction). This leads to what I would call an unhealthy ecosystem.

That being said log files are not the ultimate solutions as non-techies would have more difficulties handling with them.

So I actually agree the current situation is suboptimal. Spinning this thought more I see "Honest" analytics players being pressured to circumvent the blockers to stay relevant but ultimately the blockers have the upper hand and its a game I don't want to play so yeahhh....

A consensus between blockers and web analytics providers that tracking really just the simplest metrics is important for e.G. a yoga studio with its website might actually be difficult because... ...users might just choose the most aggressive blockers because more is always better.

I am eager to see how it will play out. But generally it all goes slow.

_jpys · on March 14, 2023

"So I actually agree the current situation is suboptimal. Spinning this thought more I see "Honest" analytics players being pressured to circumvent the blockers to stay relevant but ultimately the blockers have the upper hand and its a game I don't want to play so yeahhh...."

This isn't entirely true. Blockers only work if you use a JavaScript trigger which isn't necessary for a self-hosted solution.

ihucos · on March 14, 2023

Web analytics solutions for self hosting without using javascript is not plug and play and as easy to integrate. So if you have log files maybe its easier. If not you need a middleware for your specific web framework. Or some kind of proxy to route your entire traffic which is not good for performance. I don't know, how would you do it?

_jpys · on March 14, 2023

"I don't know, how would you do it?"

Capture and process HTTP headers on-the-fly. Either at the server level or the (web) framework level. If done efficiently, it outperforms JavaScript.

ihucos · on March 14, 2023

Yep, I believe technically you can get better results with such an approach. It's just much more hard to do and the tooling for this approach is not as mature as client side tracking.

The main hurdle is that every code base would need a different kind of integration. Or with a proxy you loose overall performance. And complexity would also be increased. But as said more accurate tracking could be possible. Has all pros and cons :-)

_jpys · on March 14, 2023

I've built this solution from scratch and am currently pairing it with a custom/prototype OODB.

These are the tradeoffs as I see them and based on my experience building:

In a self-hosted configuration, advanced analytics can be captured without JavaScript. It's unblockable, transparent, and incredibly fast. This is a stack specific solution.

In a cloud configuration, it requires a JavaScript trigger. With JavaScript, each capture (on a dev machine) takes 70ms including storage (cloud db). Performance is good but blocking is now possible. This is a stack agnostic solution.

One of the above suits me as a user; the other as an entrepreneur.

ihucos · on March 14, 2023

I think mostly we are actually on the same page. One thing I would add in is that starting the whole web analytics endeavour from scratch needs a lot of work, testing and iterations. I am actually thinking that it would make sense for counter.dev to offer an API so that you can use that as a backend middleware or in some other way.

> One of the above suits me as a user; [...]

That might be true. But as roughly said let's say that for over 99% of people, just reading the term "tracking script" already switches off their brain. If you start with "code base specific middleware addition" or "deploy your own tracking http proxy" or something like this you only have knowledgeable techies which well, to be fair could be fair to a more specialised product.

> [...] the other as an entrepreneur.

With hat thinking for me it wouldn't make much sense to offer and maintain the service for basically free since some years already. At least I can cover the hosting costs. So yes, but also a little bit no :-)

_jpys · on March 14, 2023

"I think mostly we are actually on the same page."

Agreed.

I think the problem facing analytics is that the ideal solution and ideal business require fundamentally different products.

ojkelly · on March 14, 2023

Wasn’t part of the drive to client side analytics an effort to improve data quality, in particular to differentiate bots from humans, and measure actual human analytics without getting caught by caches along the way.

If you use something like Cloudflare you can also get some of that serverside logging back.

And netlify and Vercel both have first class analytics features.

ihucos · on March 14, 2023

> Wasn’t part of the drive to client side analytics an effort to improve data quality, [...]

Interesting, I did not know that narrative. But what I can tell from subjective experience is that bots is'nt so much a problem with client side analytics. counter.dev filters them out by not logging very short page views by the way. For me the bigger challenge with client side analytics is not being able to track clients which are using a tracking blocker. Which I guess is the end users right to use (I even use uBlock origin myself). But if you start missing roughly 50% of page visits it starts getting an issue for web site owners. The data does not need to be detailed, just accurate enough.

Web analytics on hosters... yeah if they fit your use case then great, but for me that is vendor lock in and I would avoid it if possible and web analytics is more or less a topic for itself that I'd prefer to leave with a specialised solution. But obviously I am biased haha.

marginalia_nu · on March 14, 2023

Since then most bots would have abandoned libcurl and moved on to use something like headless chrome to get around bot mitigation techniques, so the playing field has evened significantly.

wongarsu · on March 14, 2023

And ubiquitous https has dramatically cut down on caches that sit in the middle, so you only really have to worry about the impact of the browser cache on your analytics.

leros · on March 15, 2023

I've thought about using Cloudflare workers to build a proxy that would do user tracking. Not sure if that is something people would want to do but it effectively gives you a JavaScript free way to track your visitors while still using whatever CDN host like Netlify they want. The challenge would be getting users to change their websites DNS records to a visitor tracking service.

theK · on March 14, 2023

I guess at some point most of us will have to yell at clouds. Whether that changes anything or not remains debatable.

I think the biggest challenge we have today is that building on SaaS and huge dependency charts is very attractive to somebody eager to get an idea off the ground. It has a typically low barrier to entry (as long as you have a PayPal account) and you don’t have to deal with “low level stuff”. Unfortunately it typically also comes with vendor lock in issues but when you realise that you are already way too far down the road.

The fun thing is, anybody with a little bit of guy and docker know how can have a better developer experience in hosting their web projects on a VM so maybe this is an issue of fundamentals?

ponytech · on March 14, 2023

For those still hosting websites themself, is there any modern web stats analyzer you could recommend? AWStats or Webalizer look so dated.

leroy-is-here · on March 14, 2023

I've been using GoAccess because of this exact line of thinking (logs over js pixel tracker). GoAccess comes with a really nice TUI, a built-in web server, and can export to csv and other formats. It's pretty robust. You just pipe logs right into it and it starts crunching.

sireat · on March 14, 2023

I had a manager ask to include Google Analytics on a government funded site a couple week ago.

Instead I sent nice HTML reports generated by GoAccess from access logs.

Highly recommend.

ihucos · on March 14, 2023

Somebody mentioned GoAccess. I haven't tried it but it looks good imo.

blowski · on March 14, 2023

What’s the difference between this and my Grandad’s nostalgia for 1950s tractors, or his Grandad’s nostalgia for 1880s ploughs.

onion2k · on March 14, 2023

None whatsoever. In fact that's a great analogy. Modern tractors are incredibly unfriendly to consumers - they're closed source, not open to repairs, expensive, and massively over-engineered. They offer some superficial UX improvements but nothing you can't live without. John Deere's shenanigans get posted to HN regularly. As a company they're one of the reasons the US government is considering laws to protect consumers access to things they've bought. I don't think anyone would find it controversial that someone might want a simpler, easier, repairable, accessible tractor in the light of that.

A little nostalgia for a time when things were actually better for the end user is exactly where my post is coming from. Going back to analysing traffic using server logs without stuffing more JS into every website seems reasonable to me.