1) This is by any source I can find, incorrect. Twitter had ~8,000 employees when Musk bought it. After layoffs that was trimmed to a low of around 1,500 employees (19%), and today it has around 2,800 employees.
Also worth mentioning that a lot of Twitter's products are built on X.ai which has 1,200 core employees on Grok with 3,000+ on the Datacenter build-out side.
Reading such obvious LLM-isms in the announcement just makes me cringe a bit too, ex.
> We optimize for speed users actually feel: responsiveness in the moments users experience — p95 latency under high concurrency, consistent turn-to-turn behavior, and stable throughput when systems get busy.
To think I used to log in to Facebook every day, scroll friends' posts until it said "You're caught up!" then leave.
That's almost unimaginable now, but I deeply wish I could return to that experience. Unfortunately as the suggested content got turned up, friends stopped posting, so even with all the browser extensions in the world I can't get that same experience back.
If you're heating water, the heating is just "talk" while boiling is "action" -- but boiling takes a long time even once you've reached the boiling point!
Strange that you say that because the general consensus (and my experience) seems to be the opposite, as well as the AA-Omniscience Hallucination Rate Benchmark which puts 3.0 Pro among the higher hallucinating models. 3.1 seems to be a noticeable improvement though.
Google actually has the BEST ratings in the AA-Omniscience Index:
AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer.
Gemini 3.1 is the top spot, followed by 3.0 and then opus 4.6 max
Yes and no. The hallucination rate shown there is the percentage of time the model answers incorrectly when it should have instead admitted to not knowing the answer. Most models score very poorly on this, with a few exceptions, because they nearly always try to answer. It's true that 3.0 is no better than others on this. By given that it does know the correct answers much more often than eg. GPT 5.2, it does in fact give hallucinated answers much less often.
In short, its hallucination rate as a percentage of unknown answers is no better than most models, but its hallucination rate as a percentage of total answers in indeed better.
I can only speak to my own experience, but for the past couple of months I've been duplicating prompts across both for high value tasks, and that has been my consistent finding.
> the AA-Omniscience Hallucination Rate Benchmark which puts 3.0 Pro among the higher hallucinating models. 3.1 seems to be a noticeable improvement though.
As sibling comment says, AA-Omniscience Hallucination Rate Benchmark puts Gemini 3.0 as the best performing aside from Gemini 3.1 preview.
Also worth mentioning that a lot of Twitter's products are built on X.ai which has 1,200 core employees on Grok with 3,000+ on the Datacenter build-out side.
reply