There is no reason to believe GPT-4 had more(or higher quality) data than Google etc. has now. GPT-4 was entirely trained before the Microsoft deal. If OpenAI could pay to acquire data in 2023, >10 companies could acquire similar quality by now, and no one has similar quality model in a year.
The key questions are around "fair use". Part of the US doctrine of fair use is "the effect of the use upon the potential market for or value of the copyrighted work" - so one big question here is whether a model has a negative impact on the market for the copyrighted work it was trained on.
I don’t think the New York Times thing is that much about training, than it is about the fact that ChatGPT can use Bing and Bing has access to New York Times articles for search purposes.
Having used both Google's and OpenAI's models, the kind of issue they have are different. Google's models are superior or at least on par in knowledge. It's the instruction following and understanding where OpenAI is significantly better. I don't think pretraining data is the reason of this.