Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cloudflare Radar (cloudflare.com)
226 points by manigandham on Sept 30, 2020 | hide | past | favorite | 90 comments


The trending/popular domains list is interesting [0]. Since Cloudflare effectively serves as part of the backbone for so many sites and services, I'm assuming their ranking is more able to count usage of sites that are mostly accessed by smartphone app, e.g. Tiktok #3, Facebook #4. Unlike Comscore [1], which seems to overly represent media sites.

Too bad Radar omits pornography sites, I was curious whether they'd actually be among the top sites. FWIW, SimilarWeb [2] (which has Tiktok at #46), does include porn sites (xvideos and pornhub are #8 and #9, respectively).

Though I don't quite understand how mozilla.org could rank as #10 on Radar's sites, just below Instagram and above Youtube, Whatsapp, and Twitter.

edit: It's also interesting to compare its Top Browser rankings with analytics.usa.gov [3]. It appears that Radar's measured users are significantly more on Chrome – 55% (desktop+mobile) vs 47.6% on usa.gov sites; whereas usa.gov users are significantly more on Safari (iOS+desktop): 35.2% vs 13% on Radar. I suppose this reflects the assumption that iOS has a bigger proportion of U.S. users than it does worldwide users.

[0] https://radar.cloudflare.com/#trending-domains

[1] https://www.comscore.com/Insights/Rankings

[2] https://www.similarweb.com/top-websites/

[3] https://analytics.usa.gov/


Some domains are up there because they're being used by applications, for exemple no one goes directly to microsoft.com but Windows does it for many things like update, monitoring, telemetry ect ...

*.mozilla.org is probably resolved when you use Firefox.

Cloudflare can't know if the resolve was made by a user or by internal application logic.


I imagine microsoft.com received quite a bit of direct traffic last Tuesday when preorders opened for the next Xbox consoles. Also, .NET Framework, .NET Core, Entity Framework, Entity Framework Core, and Azure documentation is all hosted at microsoft.com (to name just a few).

I don't mean to discredit your main point, that a lot of domain use can be credited to application or OS logic, but to say no one goes to microsoft.com is a bit weird.


Edge and IE are still a decent chunk of desktop traffic as well. And opening them with home page will probably kick off traffic to microsoft.com as well. Not sure how much that will be, but probably factors in.

Edit: Although their metrics have the browser share lower than others. They have Edge at <4% and IE doesn't register in top 10. Whereas something like https://netmarketshare.com/ has Edge at 7% and IE 5%. So there is seemingly a bias towards other browsers in their methods. Which makes sense, seems unlikely someone uses 1.1.1.1 and IE.


Edge and IE are probably loading msn.com way more than they are pulling things from microsoft.com.


You should compare usa.gov data to data on https://radar.cloudflare.com/US, in which case it would be much closer.

Btw, one piece of feedback: the interface for selecting a certain country is rather confusing. A (non-tiny) dropdown selector at the top would be much more intuitive.


I think the fact that this list has microsoft in the #2 position and mozilla at #10 casts serious doubt upon the method and conclusion.


Wouldn't this include all traffic? As long as it's enabled Windows and Firefox are constantly sending statistics in the background. If this is measured by # of requests, then this would make sense.


Does this mean that Firefox is sending statistics in far greater frequency than Windows? Either these stats are wrong or Firefox is sending telemetry far too frequently to rival these sites (as Firefox usage is simply not high enough otherwise).


No, it's that Windows doesn't just use *.microsoft.com for telemetry, but also Bing.com, some .ms domains, live.net, windowsupdate.com, akadns.net, and many more: https://docs.microsoft.com/en-us/windows/privacy/manage-wind...


There's no chance that telemetry on the browser with 4% market share would get you anywhere near the top 10 busiest sites.


The popular domains metric is probably measured through the 1.1.1.1 dns service and firefox recently defaulted to resolving domains through cloudflares DoH service, so popular domains may be biased toward firefox users. Browser popularity is probably less biased as it would have to be measure by HTTP requests to sites using cloudflare.


Microsoft being high up wouldn't surprise me if it included Microsoft 365 and Outlook traffic. But I just can't imagine that surpassing Google search+Drive+GMail+Maps


(I didn't have coffee and forgot that Google is indeed the #1 domain)


There's a lot of data here to digest, but one thing that stood out to me is:

> Bot traffic: 41% of total

So of all the traffic that CF sees, it attributes 41% of it to automated bots. This is way higher than I would have guessed!


> So of all the traffic that CF sees, it attributes 41% of it to automated bots. This is way higher than I would have guessed!

Bear in mind that that's what CF attributes to bots. As a human who gets flagged as a bot sometimes, I'm a little skeptical of their numbers.


Around ten years ago I built the analytics system for an Alexa 100 company. The company had a policy of supporting IE6 until its share of traffic was below 1% and it was sitting at around 4% at the time IIRC. This was the figure that was reported industry-wide and this was also what we were seeing internally in our stats. However, partway through the project we decided to switch from an image pixel to javascript tracking. At that point we saw the share of IE6 traffic drop immediately to below 1% and it dawned on me what was happening: A lot of bot creators had been using an IE6 user agent to disguise their activity. This was in the days before headless browsing so the vast majority of them weren't downloading / processing javascript. I thought 4% was high then for bot activity but it wouldn't surprise me if the share has exploded since then.


Oh, I don't doubt that bot activity is rampant. The question is what Cloudflare's false-positive and false-negative rates are; I know from personal experience that they sometimes are convinced that humans are bots, and I assume sometimes bots get by them and therefore get counted as humans. But between those, my personal conclusion is that CF's claims about bot activity should be read with error bars.


Yup. Now imagine how many bot frameworks are smart enough to keep up to date with user agent strings.

Pro-tip in Google Analytics you can filter by some device color depth = 0 bits and find (what I assume) are all bots that way.


Also keep in mind it was probably a subset of bots using that user agent, so the total bot traffic could have been higher than 4%.


Sure and no doubt some bots pass as human.

But keep in mind CF has a lot of options for filtering traffic. Just because you got a CAPTCHA on a CF site doesn't necessarily mean considered you a bot.


> Just because you got a CAPTCHA on a CF site doesn't necessarily mean considered you a bot.

What else it thinks I am?


I used to work at a webhosting company, crawlers were an enormous amount of our traffic. Some of our users, their only traffic was crawler traffic.


Were these crawlers for big well-known things like google/bing/archive.org? I'm wondering if there are other types of crawlers that I don't even know about.


A site for one of my clients was, at one point, getting absolutely hammered by a bot that crawls for plagiarised content. It was ignoring our robots.txt so I ended up blocking it via CloudFlare. There are bots for pretty much anything you can think of out there.

edit: replaced "scrapes" with "crawls" for accuracy.


Traders crawl for financial information. Businesses (e.g. airlines) crawl for competitor pricing. Etc. etc.


Why do bots need to crawl so much? Any idea what kind of bots they are, is it mainly search engines?


> Why do bots need to crawl so much?

The Earth has a (mostly) free society. Therefore, many programmers find a web crawler to be an advantageous endeavor. From search engines, to social media companies, to ad networks, to government agencies, to financial institutions, to students learning about programming, to nefarious actors, and even shadow versions of everything above. Bots are neat. One can do a lot with a freely accessible public database of information, only limit is one's creativity.

(edited: for formatting)


> only limit is one's creativity

On the consumer side. There is a hard limit on bandwidth, though, on the provider side that can be exhausted by an overwhelming amount of bot traffic. In effect, that's what a DDoS is, even if unintentional.


That's actually lower that I would have guessed. And lower than what I see on my servers. It's more than 1/2 for sure, and it's hard to guess how much of that other (less than half) part is real people, and not just bots pretending to be people. My 1/2 educated guess is about 40% of traffic to the sites I run is real people on any given day. These are all smallish websites for non-profits, so maybe not an "average" website, whatever that might mean.


Filter by your country. The US for example has 57% bot activity.


Aha! That sounds about right. Interesting, I didn't notice you can filter!


How did you filter by country?


There is a search box at top of center graph.


Hadn't noticed that but that's a good point. "Way higher": I would have guessed the same as you but, upon further thought, I guess I'm not surprised: creating a crawler is 100 lines of Python so I would guess that there are actually tens of thousands of bots running around grabbing web pages. Humans are an ever diminishing portion of the "web", especially given the (old) focus on "semantic web" and machine-friendly-ish formats...


Really interesting numbers. I just wish it was possible to see all the other statistics in relation to letgit vs bot traffic.

eg if mobile traffic is 35% of total, what percentage of legit vs bot traffic is mobile? TLS 1.2 vs 1.3 is a 38%/61% split, but how much bot traffic is encrypted?


This is just the beginning. We are going to be building out more and more functionality. My hope is that we deploy jupyter and let people crunch numbers in the ways they choose.


A single bot can generate thousands of times more traffic than a person. Even a small number of bots could easily generate a huge percentage of traffic.


I wonder if there is a correlation between Cloudflare customers and internet properties who'd be targeted by bots.


They probably flag a lot of Tor users as bots.



Out of curiosity, what's the tech stack (eg backend, frontend) used for radar.cloudflare.com?


Frontend: React

Serverless: Cloudflare Workers and Workers KV

Data: ClickHouse


I love how you're using workers there. I've been on workers for years (only for static content mind you) and it's super cool.


Cool, thanks. :)


Wow, tiktok.com steadily ranking at #3, more popular than facebook.com, amazon.com, apple.com and netflix.com, only behind google.com and microsoft.com. Really curious to see detailed stats.


I can see tiktok being more popular than apple, people watch videos much more often than they buy phones. But microsoft being more popular than tiktok? What's so popular on that site?


Every single Windows PC calling home to see what ad to play next or report on its user's metrics. Multiplied by whatever frequency that occurs.


If they're just using DNS lookups, probably windows update, telemetry, crash reports and other automated traffic.


Also, verizonmedia.com at 8th place? What's on that site? Some type of CDN?


Cloudflare says they exclude "content servers"[1], I assume that means CDNs. https://www.verizonmedia.com/ says it is an ad network, including video ads, with brands like Yahoo, HuffPost and TechCrunch.

[1] https://radar.cloudflare.com/glossary#trending-domains


Which is why I was curious - it seems that a domain mostly serving videos and images would be classified as a content server.


Verizon Media afaik is EdgeCast or related, but I’m not 100% sure.


XBox + related properties?


Not sure if I am reading it wrong, but I'm not sure the data is correct (at least for some non Cloudflare sites) example:

https://radar.cloudflare.com/domain/just-eat.co.uk

The leading UK food delivery company, for popularity I'm not sure if a higher or lower number means more popular - but to me it looks like the site has more traffic on a weekday than a weekend. And also most traffic comes from Australia, where this site wouldn't even work.

Is this just bots, or bad data?


They are at least consistent; just-eat's AU site is mostly UK traffic https://radar.cloudflare.com/domain/menulog.com.au


That's odd. I'll get the team to investigate.


I just checked a project of mine, which uses CloudFlare and gets reasonable traffic, it seems Australia is disproportionately represented for it too.


Really weird results, I have a popular site on Cloudflare and the domain popularity by visitor country shown on this public dashboard doesn't match traffic statistics I see in my account at all.


This is a direct replacement for Alexa Ranking. IMHO the traffic is way more accurate.


How much internet traffic goes through Cloudflare?

Last I heard it was somewhere around 5%. Does anyone know what % it is today?


According to Cloudflare's website:

> Internet requests for ~15% of the Fortune 1,000 run through Cloudflare's network

A 2018 Wired article said "roughly 5 to 10 percent of web traffic." It is likely higher now, perhaps closer to 15%.

https://www.cloudflare.com/insights/ and https://www.wired.com/story/cloudflare-spectrum-iot-protecti...


In 2018 it was 10%, iirc now it is 13-14%


More surprising for me is Mobile vs Desktop at 35% to 65%. I thought it was closer, or more like 50/50.


I'm guessing that it depends on the kind of traffic taken into account? If we count all requests to and from computers and not only web requests, you end up with a whole lot of computers calling home all day


How is this data collected? I can see how they would be able to get statistics from clients hitting CloudFlare servers, but how can they report on the traffic hitting google.com or microsoft.com?


I think that might just be based on their public dns service (https://1.1.1.1)


They can tell bots and DoS traffic from the DNS request?


I'm guessing the different metrics are collected through different means. Their blog mentions HTTP request to sites that use cloudflare as well as DNS [0]. It would make sense that domain popularity is meaused through DNS and bots and DOS measured by the systems built for handling those things.

[0]: https://blog.cloudflare.com/introducing-cloudflare-radar/


Any idea why the "worldwide change in internet traffic" graph is not flatter and has daily dips ? Is this graph normalized on the actual time, disregarding time-zones?


I would say likely due to the uneven population distribution around the world, combined with day-of-week trends due to people being off work/school.

The more interesting datapoint to me is the hour-by-hour breakdowns of HTTP vs HTTPS, likely due to particular countries that discourage (and have ISPs that block) encrypted traffic.


It would be great if the site rankings had a longer time period than 30 days, such as 1 year. Though totally understand if they only started tracking such a ranking recently.


Yes. We only started doing these calculations recently. We'll make available more data as we have it.


What is the source of the data?

Is the ranking of domains by number of requests or by traffic volume? Why don't we see any major CDN domains in the top ten list?


They don't really say how they are ranking things, it appears to be multiple metrics. They also removed porn, and I would not be surprised if they removed something like amazonaws.com since that just sits in front of s3 bucket.

https://radar.cloudflare.com/glossary#trending-domains


Smells artificial/trimmed


Anyone know why there appears to be a steady daily +-5% wave on the HTTP/1.x vs HTTP/2 chart?


It seems to reflect the cycle of robot traffic.


Does anyone else find it odd Zoom not in top 15? Is it only because it's not webtraffic?


That sounds right to me. I doubt a DNS request is made for each packet.


Developers: The page must be fast! Google found an extra .5 seconds in search page generation time dropped traffic by 20%. Also developers: Let's use Cloudflare - 'Checking your browser before accessing content - please allow up to 5 seconds'.


I believe most of the time the protection is invisible. It only activates when it thinks you might be a bot, or if the site is under heavy load


There is not-insignificant, if not huge group of site admins that take a "aha! more security therefore better!" approach and will just enable under attack mode (the 5s JS challenge) for everyone all the time.

Some are a bit more savvy and will take a still rather ham-fisted approach of doing so for most countries. That's not uncommon with retailer websites that realistically only serve a single country, and therefore have good enough reason to challenge all others to deter automated traffic from compromised machines outside their market.


I see it ALL of the time. I am not sure if it is due to Safari and ITP blocking whatever protection tracking stuff Cloudflare does, or because my DNS is routed through AWS vs some other provider (as I am on a split tunnel VPN, and need to resolve internal DNS entries).

It's frustrating, especially when I open a couple of tabs on the same site after Googling a particular problem.


It could be the ITP blocking or your DNS setup. If I had to guess it'd be the AWS setup you have as that is not seen as a client IP for a home network.

The interstitial page you're seeing attempts to validate IP location, HTTP headers, and the query string at the very least. So the fact that your requests look like they are coming directly from an AWS IP would, based on my gut feeling having troubleshot tons of these issues at CF, cause this issue.

If you're ever curious you might want to try and hit a site you know serves you these pages regularly without the VPN setup just to isolate the problem.


My traffic hitting Cloudflare does NOT go through an AWS IP. It is still my home IP address.

My DNS request to request the site is through the AWS DNS in my VPC.

When I hit the same sites without VPN, and thus use my local DNS resolver on my network, no issue.

The only thing going over AWS is the DNS resolution.


What is a Facebook web browser?


This is the in app browser.


Had the same question. Is it the in app web view? Wouldn't that be safari/chrome depending on platform?


Well, yes and no. They have their own user agent, so they can get their own classification. e.g.

Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 [FBAN/FBIOS;FBDV/iPhone11,8;FBMD/iPhone;FBSN/iOS;FBSV/13.3.1;FBSS/2;FBID/phone;FBLC/en_US;FBOP/5;FBCR/]

Identifies as "Facebook Browser".

See: https://developers.whatismybrowser.com/useragents/explore/so...

I think Google Analytics picks this up as "Safari (In-App)"

If this classification was not possible, Edge would identify as Chrome. (since it uses Chromium engine)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: