More

znagengast · on Nov 13, 2023

This did not age well https://twitter.com/Shopify/status/1724079230663979066

znagengast · on Oct 30, 2023

I have to play devils advocate here because for every one of these cases, there is probably a dozen of similar stories where the ambitious new guy actually did nuke the system with a risky friday release and then logged off for the weekend xD

ludicity · on Oct 30, 2023

Let me hit you with the extra spicy take - since the department doesn't really produce business value anyway, taking the system offline for a week would have dropped the bill to 0 and been the greatest possible cost saving.

Yes, management would have killed me, but I would have been absolved at the Pearly Gates.

therealdrag0 · on Oct 31, 2023

Another’s devils advocate is that part of being a senior engineer is writing good public facing one pagers that communicate the problem and the solution.

Sure you shouldn’t have to do that for every single change. But when something significant comes along it’s wise to document it and socialize it.

If OP had done this they may not have been asked to make PowerPoints or dread conversations or hide anything. They could just point at the public doc that explains it all in technical non political language that also educates other teams how they can improve too. This is leadership and mentorship opportunity.

znagengast · on Sept 8, 2023

The thumbs up icon killed me

znagengast · on Sept 1, 2023

It's also largely dependent on the embeddings model as others have mentioned. Even if wikipedia doesn't have any words specifically referring to that monkey as "weird", the model itself would know to correlate this monkey's embeddings with the "weird" concept. The main issue with this particular implementation is the model used (all-minilm-l6-v2) which is designed for speed and efficiency over accuracy.

znagengast · on Aug 8, 2023

Found the cake https://noclip.website/#Portal/escape_02;ShareData=AZZCH9W7f...[

znagengast · on June 17, 2023

Isn't it crazy that the entirety of human knowledge can be condensed down to fit on an SD card.

cjtrowbridge · on June 17, 2023

That's not what this is.

lionkor · on June 18, 2023

Isn't it crazy that the entirety of human stupidity can be condensed down to "isnt it crazy that <wildly wrong fact>"

PaulHoule · on June 17, 2023

A CD-ROM

znagengast · on June 12, 2023

The page has changed over the years, there used to be a much more apologetic version "Reddit broke (sorry)" https://github.com/reddit-archive/reddit/commit/4778b17e939e...

So they actually made the intentional decision to change it back to the much more accusatory "you broke it", I'm not sure exactly when that occurred but I suspect it was roughly the time spez took over.

znagengast · on June 5, 2023

Compared to how much progress the iPhone made from initial launch to now, the potential for this product line is very exciting.

znagengast · on May 31, 2023

How feasible would it be out crowdsource the training? I.e. thousands of individual macbooks training a small part of the model and contributing to the collective goal

Filligree · on May 31, 2023

Currently, not at all. You need low latency, high bandwidth links between the GPUs to be able to shard the model usefully. There is no way you can fit an 1T (or whatever) parameter model on a MacBook, or any current device, so sharding is a requirement.

Even if it that problem disappeared, propagating the model weight updates between training steps poses an issue in itself. It's a lot of data, at this size.

bob1029 · on May 31, 2023

You could easily fit a 1T parameter model on a MacBook if you radically altered the architecture of the AI system.

Consider something like a spiking neural network with weights & state stored on an SSD using lazy-evaluation as action potentials propagate. 4TB SSD = ~1 trillion 32-bit FP weights and potentials. There are MacBook options that support up to 8TB. The other advantage with SNN - Training & using are basically the same thing. You don't have to move any bytes around. They just get mutated in place over time.

The trick is to reorganize this damn thing so you don't have to access all of the parameters at the same time... You may also find the GPU becomes a problem in an approach that uses a latency-sensitive time domain and/or event-based execution. It gets to be pretty difficult to process hundreds of millions of serialized action potentials per second when your hot loop has to go outside of L1 and screw with GPU memory. GPU isn't that far away, but ~2 nanoseconds is a hell of a lot closer than 30-100+ nanoseconds.

Edit: fixed my crappy math above.

treprinum · on May 31, 2023

That's been done already. See DeepSpeed ZeRO NVMe offload:

https://arxiv.org/abs/2101.06840

znagengast · on May 31, 2023

What if you split up the training down to the literal vector math, and treated every macbook like a thread in a gpu, with just one big computer acting as the orchestrator?

Filligree · on May 31, 2023

You would need each MacBook to have an internet connection capable of multiple terabytes per second, with sub millisecond latency to every other MacBook.

AnthonyMouse · on May 31, 2023

FWIW there are current devices that could fit a model of that size. We had servers that support TBs of RAM a decade ago (and today they're pretty cheap, although that much RAM is still a significant expense).

tmaly · on June 1, 2023

I have an even more of a stretch question.

What pieces of tech would need to be invented to make it possible to carry a 1T model around in a device the size of an iPhone?

OOPMan · on May 31, 2023

I once used a crowdsourcing system called CrowdFlower for a pretty basic task, the results were pretty bad.

Seems like with minimal oversight the human workers like to just say they did the requested task and make up an answer rather than actually do it (The task involved entering an address in Google maps, looking at the street view and confirming insofar as possible if a given business actually resided at the address in question, nothing complicated)

Edit: woops, mixed in the query with another reply that mentioned the human element XD

ted_dunning · on May 31, 2023

It seems only fair that the humans charged with doing the grunt work to build an automated fabulist would just make stuff up for training data.

Tit for tat and all that.

khimaros · on May 31, 2023

https://github.com/bigscience-workshop/petals seems to have some capabilities in that area, at least for fine-tuning.

webworker · on May 31, 2023

Yes, someone revive Xgrid!

znagengast · on May 31, 2023

Whoa didn't know about this, cool

znagengast · on May 6, 2023

Hey HN! I recently faced a challenge of generating vector embeddings within my Swift app, so I built this library as a "good enough" solution for on-device similarity search without relying on an API. I've got some exciting plans for future development, but I'd love to hear your thoughts and any interesting use cases you can think of!