Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder where are you getting your data. According to wikipedia russian is #7 https://en.wikipedia.org/wiki/Languages_used_on_the_Internet

Only place where russian is in top 5 is in Wikipedia views. Russian part of internet steadily goes down, as russian imperialism crumbles.



> The largest portion of all languages in Common Crawl

https://commoncrawl.github.io/cc-crawl-statistics/plots/lang...


Thanks!

I wonder where this discrepancy comes from


probably under-indexing of non-english sources by these crawlers.

would be interesting if yandex opened some data sets!


And lots of people write on the web using English as a second language, which both reduces the presence of their native language and increases the presence of English.


yep not a native english speaker here and yet my online footprint is mostly english due to software pushing me to learn it


My guess is that reference counting at depth=1 only capture non-$LANG content which text parts don't matter a lot, e.g. photo galleries.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: