Is there not yet a Source where the web has already been scraped and souped down...

		JKCalhoun 7 months ago \| parent \| context \| favorite \| on: ETH Zurich and EPFL to release a LLM developed on ... Is there not yet a Source where the web has already been scraped and souped down to just the text? It would seem someone would have created such a thing in order to save LLM training from having to reinvent the wheel. I understand the web is a dynamic thing but still it would seem to be useful on some level.

Common Crawl, maybe?