Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Aren't there self-hosted analytics anyway? Piwik[1] comes to mind first, but I'm sure there are many.

1. https://piwik.org/



Piwik is incredible. But it should be noted that it does provide a scaling challenge for high traffic use cases (> hundred million actions per month), and hosting your own analytics is expensive.

I bring this up because people had been slamming moot for using GA on 4chan instead of piwik without understanding why.


We have much lower traffic than that and our Piwik servers, with paid support from the Piwik team, often struggles to generate reports etc. Not convinced Piwik is that easy to scale.


People have scaled it to over a billion actions per month. No clue how much of that includes customizations though... It sounds way past the out of box limit.

Look at the comments from sandfox and afterlastangel in this thread. afterlastangel is pushing a billion, sandfox is around 300 MM per month.

http://forum.piwik.org/t/high-traffic-piwik-servers-database...


I'm looking into replacing GA Premium ever since Easylist blocked GA tracking for Adblocked users and self-hosted Piwik seems like the best solution. I'd be well into the billions.


With that kind of traffic hopefully you have the resources to pull it off. Good luck!


Do any of the Google Analytics alternatives scale to that size?


Free alternatives? Not really. Paid? Yes, SiteCatalyst and Webtrekk come to mind.

People seem to ignore that the tracking JavaScript is not what you're paying for. It's the backend + servers.


People have taken piwik to 300MM up to over 1 billion actions per month. But it certainly isn't "set it and forget it."

http://forum.piwik.org/t/high-traffic-piwik-servers-database...


See https://news.ycombinator.com/item?id=10697045

Piwik is still using (unsalted) MD5 for passwords in 2015, and probably will still be using unsalted MD5 in 2016.


This is pretty bad. Piwik could be a high value target depending on the nature of the site it is used to analyze.

I can't believe unsalted MD5 is "by design" (https://github.com/piwik/piwik/issues/8753).


https://theintercept.com/2014/02/18/snowden-docs-reveal-cove...

Considering Piwik is used by the GCHQ, I find it hilarious.


They're using an open source analytics software package to analyse the very data it was designed to analyse.

I don't find it using poorly implemented hashing in the administrative interface to be at all relevant to what they're doing, or why they shouldn't be using it.


Information on who visits WikiLeaks - and what they read and upload - is an incredibly high value target. I don't see how you can argue otherwise, when Britain's top intel agency has an expensive line item in their budget just to get at that info.

Given these known security flaws, it's not a stretch to assume anyone who can see the GCHQ's Piwik server can have that data too, regardless of whether they are authorized.

See below for a small preview of what an attacker could exfiltrate (dissident IPs redacted for a reason):

https://firstlook.org/wp-uploads/sites/1/2014/02/piwik2.png

While we're talking about poor security practices: the privileged username in the screenshot is apparently still the default ("admin"), so I hope the password isn't still "changeMe" ... http://piwik.org/faq/how-to/faq_191/



Wikipedia's love of lists is absolutely amazing: https://en.wikipedia.org/wiki/List_of_lists_of_lists



Strangely Microsoft's one is missing: Application Insights.

Pretty much works like Google Analytics but utilises both client JavaScript and embedded runtime code to generate a richer picture of what is going on.

Too bad the interface on the Azure Portal is terrible. They spent too much time making it look fancy, and not enough time getting the 101s of usability right (which is a criticism I'd lay at the feed of the new Azure portal in general).


Who makes these lists?!



Good question!

Probably the vendors of the software concerned. Perhaps it started out as a list of three with a major bias towards a particular product. And then the competitors responded, moderators did their things and eventually an accurate list was evolved.


Does the adblock/ublock etc block this as well?

Am looking to use this in lieu of Google Analytics.


Self-hosted means that it will be served from your own servers, and thereby your own domain. So unless your domain is on a block list, it will be loaded.

EDIT: Sorry, I've been dealing with uBlock Matrix for too long, and forgot how advanced the other blockers pattern matching is. See the many responses to this for better information.


(my apologies for the tone - I have edited the post to try to keep it purely fact based)

From EasyPrivacy[1]

    /piwik-$domain=~piwik.org
    /piwik.$script,domain=~piwik.org
    /piwik.php
    /piwik/js/*$domain=~piwik.org
    /piwik1.
    /piwik2.js
    /piwik_
    /piwikapi.js
    /piwikC_
    /piwikTracker.
This doesn't include any renamed versions, nor does it include the numerous domain-specific variations.

[1] https://easylist-downloads.adblockplus.org/easyprivacy.txt


Slow down there guy, it was a simple mistake. I've been using uBlock Matrix for too long is all.


The EasyPrivacy block list contains an entry that will block the piwik.js file. Of course, when you're self-hosting, it's trivial to serve that file with a non-default name.


That's an interesting choice. I mean, it's not like you can hide from the web server that you are making the request. But then again, I'm assuming -- by the sheer necessity of having a JS file -- that they are collecting some additional metrics not available to the server in the request.


Those filters could be in place to block the Piwik cloud service: https://piwik.pro/cloud/?pk_source=Piwik.org&pk_medium=Cloud...


It will probably take awhile, but trackers will move to aggregating log files, and blockers will move to TOR. And the arms race continues...


Piwik for example already can import log files: http://piwik.org/log-analytics/ as an alternative to JavaScript tracking


No it parses web server logs, however as mentioned above it doesn't work well for very high traffic sites.


Piwik relies on client-side JavaScript for tracking, not log analysis.



They have both, most users use the client side javascript. I'm not familiar with how well the log analysis works.


Sorry, I was not aware of that feature.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: