It's like claims of room temperature superconductors or millenium prize solutions. Earth shattering if true. It'd be such a black swan. Terrible for Nvidia.
> this is where the taylor expression would fail to represent the values well.
"In practice, we find that four Taylor terms (P = 4) suffice for recovering conventional attention with elementwise errors of approximately the same magnitude as Float16 resolution"
I read that too, but I wondered whether elementwise error is the right metric. Surely the actual error metric should be to evaluate model performance for a conventional transformer model and then the same model with the attention mechanism replaced by this 4th order Taylor approximation?
To spell it out for myself and others: approaching equivalent calculations for each individual attention block means we also approach equivalent performance for the combination of them. And with an error bar approaching floating point accuracy, the performance should be practically identical to regular attention. Elementwise errors of this magnitude can't lead to any noteworthy changes in the overall result, especially given how robust LLM networks seem to be to small deviations.
There was some credible analysis that I don't have the link to which estimated 50% gross margins for OpenAI, largely eaten up by operational expenses. So not awful unit economics, but not good either.
Assuming that's even true, the big asterisk is uncertainty around efficiency gains in the future. The intelligence divided by cost ratio is changing very quickly. It is hard to make confident predictions more than 3 months out.
In Table 1, the cost of cooling of a terrestrial data centre is listed as $7M. The cost of cooling in space is assigned a value of $0 with the claim:
"More efficient cooling architecture taking advantage of higher ΔT in space"
My bold claim: The cost of cooling will not be $0. The cost of launching that cooling into space will also not be $0. The cost of maintaining that mechanically complex cooling in space will not be $0.
They then throw in enough unrealistic calculations later in the "paper" to show that they thought about the actual cost at least a little bit. Apparently just enough to conclude that it's so massive there's no way they're going to list it in the table. Table 1 is pure fantasy.
I will not re-read them, but from what I recall from those threads is numbers don't make sense. Something like:
- radiators the multiple square kilometers in size, in space;
- lifting necessary payloads to space is multiples of magnitudes more than we have technology/capacity as the whole world now;
- maintanence nightmare. yeah you can have redundancy, but no feasable way to maintain;
- compare how much effort/energy/maintenance is required to have ISS or Tiangong space stations - these space datacenters sound ridiculous;
NB: I would be happy to be proven wrong. There are many things that are possible if we would invest effort (and money) into it, akin to JFK's "We choose to go to the Moon" talk. Sounded incredible, but it was done from nearly zero to Moon landing in ~7 years. Though as much as I udnerstand - napkin math for such scale of space data centers seem to need efforts that are orders or magnitude more than Apollo mission, i.e. launching Saturn V for years multiple times per day. Even with booster reuse technology this seems literally incredible (not to mention fuel/material costs).
A giant space datacenter with square kilometers of solar panels doesn't make sense. A cluster of Starlink-sized satellites, which orbit near each other(1) and which are connected using laser-links might make sense.
(1) There are orbital arrangements that allow satellites to stay close together with minimal orbital corrections. Scott Manley mentioned this in one of his videos.
They do not at any point outline how cooling will be done, they simply say "it will be more efficient than chillers due to the larger delta T" which is incorrect because it's about dT not delta T
I've seen Pinker's arguments dismantled too. The blog whose post we're commenting on even has a piece dismantling the totally made up GDP numbers coming out of Africa.
I want to see regulation of the algorithm. Something like forcing a chronological feed, or somehow nerfing the recommendation engine. Figure out a way to make it boring, bypass the whole censorship debate.
reply