I have to play devils advocate here because for every one of these cases, there is probably a dozen of similar stories where the ambitious new guy actually did nuke the system with a risky friday release and then logged off for the weekend xD
Let me hit you with the extra spicy take - since the department doesn't really produce business value anyway, taking the system offline for a week would have dropped the bill to 0 and been the greatest possible cost saving.
Yes, management would have killed me, but I would have been absolved at the Pearly Gates.
Another’s devils advocate is that part of being a senior engineer is writing good public facing one pagers that communicate the problem and the solution.
Sure you shouldn’t have to do that for every single change. But when something significant comes along it’s wise to document it and socialize it.
If OP had done this they may not have been asked to make PowerPoints or dread conversations or hide anything. They could just point at the public doc that explains it all in technical non political language that also educates other teams how they can improve too. This is leadership and mentorship opportunity.
It's also largely dependent on the embeddings model as others have mentioned. Even if wikipedia doesn't have any words specifically referring to that monkey as "weird", the model itself would know to correlate this monkey's embeddings with the "weird" concept. The main issue with this particular implementation is the model used (all-minilm-l6-v2) which is designed for speed and efficiency over accuracy.
So they actually made the intentional decision to change it back to the much more accusatory "you broke it", I'm not sure exactly when that occurred but I suspect it was roughly the time spez took over.
How feasible would it be out crowdsource the training? I.e. thousands of individual macbooks training a small part of the model and contributing to the collective goal
Currently, not at all. You need low latency, high bandwidth links between the GPUs to be able to shard the model usefully. There is no way you can fit an 1T (or whatever) parameter model on a MacBook, or any current device, so sharding is a requirement.
Even if it that problem disappeared, propagating the model weight updates between training steps poses an issue in itself. It's a lot of data, at this size.
You could easily fit a 1T parameter model on a MacBook if you radically altered the architecture of the AI system.
Consider something like a spiking neural network with weights & state stored on an SSD using lazy-evaluation as action potentials propagate. 4TB SSD = ~1 trillion 32-bit FP weights and potentials. There are MacBook options that support up to 8TB. The other advantage with SNN - Training & using are basically the same thing. You don't have to move any bytes around. They just get mutated in place over time.
The trick is to reorganize this damn thing so you don't have to access all of the parameters at the same time... You may also find the GPU becomes a problem in an approach that uses a latency-sensitive time domain and/or event-based execution. It gets to be pretty difficult to process hundreds of millions of serialized action potentials per second when your hot loop has to go outside of L1 and screw with GPU memory. GPU isn't that far away, but ~2 nanoseconds is a hell of a lot closer than 30-100+ nanoseconds.
What if you split up the training down to the literal vector math, and treated every macbook like a thread in a gpu, with just one big computer acting as the orchestrator?
You would need each MacBook to have an internet connection capable of multiple terabytes per second, with sub millisecond latency to every other MacBook.
FWIW there are current devices that could fit a model of that size. We had servers that support TBs of RAM a decade ago (and today they're pretty cheap, although that much RAM is still a significant expense).
I once used a crowdsourcing system called CrowdFlower for a pretty basic task, the results were pretty bad.
Seems like with minimal oversight the human workers like to just say they did the requested task and make up an answer rather than actually do it (The task involved entering an address in Google maps, looking at the street view and confirming insofar as possible if a given business actually resided at the address in question, nothing complicated)
Edit: woops, mixed in the query with another reply that mentioned the human element XD
Hey HN! I recently faced a challenge of generating vector embeddings within my Swift app, so I built this library as a "good enough" solution for on-device similarity search without relying on an API. I've got some exciting plans for future development, but I'd love to hear your thoughts and any interesting use cases you can think of!