Hacker Newsnew | past | comments | ask | show | jobs | submit | gwern's commentslogin


Linux isn't a strong brand at all. Even nerds would struggle to tell you what a 'Linux' is (or as I prefer to call it, since that's every non-smartphone Linux I've ever used, 'GNU/Linux'). And I've never even heard of 'Portuguese tarts', and I've heard plenty about Portugal and briefly considered going there last year for tourism.

Pebble went out of business, incidentally.


Some of the typos and poor typography might be accidental simply because that is so common, but the rest of it is surely deliberate.

Remember that _American Psycho_ is repeatedly hinting to you throughout the movie that most, or even all, of what you see is just Bateman's own delusions and hallucinations, which are inferior imitations of reality due to his own impoverished mind and lack of substance. So there's a lot of little tricks all throughout it to subtly destabilize you: https://www.moviemaker.com/american-psycho-anniversary-oral-...


It's definitely very valuable, but for what AI model? How does any of that lead to AGI, or even just a good coding agent?

It doesn't need to lead to AGI or a good coding agent. Some of the only people who are actually profitable in the LLM industry are the people making actual chatbots. There are several bootstrapped startups that run open-weight models with a $10 or $20 monthly sub and make millions in profit off of inference from people just talking to the things, usually for character roleplay / "AI boyfriend/girlfriend" stuff etc. Some of them even took those profits and invested it into training their own bespoke models from scratch, usually on the smaller side although finetunes/retrains of Llama 70b, GLM, and Deepseek 670b have also been done. Grok could probably be profitable if it targeted this space, as the most "intelligent" conversational/uncensored model.

This is already presupposing that profit even matters, though. Musk already burned some $50 billion dollars to control messaging on political discourse with his acquisition of Twitter. It was not about money, but power. After you already have infinite money, the only thing left to spend it on is acquiring more power, which is achieved through influencing politics. LLMs represent a potentially even better propaganda tool than social media platforms. They give you unprecedented access to people's thoughts that they would probably not share online otherwise, and they allow you to more subtly influence people with deeply-personalised narratives.


> but for what AI model?

Sentiment analysis. Working out what words lead to what outcomes, and then being able to predict on new data is super useful.

For coding or "AGI" no, its not useful. For building a text based (possibly image based) recategorisation system top class.



List of examples: https://gwern.net/turing-complete

It was probably unintentional, yeah, I don't recall any mentions of early printf being overloaded to do stuff, nor is it clear why you would do that since you're using it in a much more convenient Turing-complete language already (C).


The solution here seems to be to impose some constraint or requirement which means that literal copying is impossible (remember, copyright governs copies, it doesn't govern ideas or algorithms - that would be 'patents', which essentially no open source software has) or where any 'copying' from vaguely remembered pretraining code is on such an abstract indirect level that it is 'transformative' and thus safe.

For example, the Anthropic Rust C compiler could hardly have copied GCC or any of the many C compilers it surely trained on, because then it wouldn't have spat out reasonably idiomatic and natural looking Rust in a differently organized codebase.

Good news for Rust and Lean, I guess, as it seems like everyone these days is looking for an excuse to rewrite everything into those for either speed or safety or both.


> copyright governs copies, it doesn't govern ideas or algorithms

The second part is true. The first is a little trickier. The copyright applies to some fixed media (text in this case) rather than the idea expressed, but the protections extend well beyond copies. For example, in fiction, the narrative arc and "arrangement" is also protected, as are adaptations and translations.

If you were to try and write The Catcher in the Rye in Italian completely from memory (however well you remember it) I believe that would be protected by copyright even if not a single sentence were copied verbatim.


Also just differing levels of relevance. You don't talk with a businessman or investor or famous people in general because of their writing; if you made a list of relevant skills, 'proper spelling when quickly texting from a phone' surely doesn't crack even the top ten thousand skills. In academia, on the other hand, writing a formal application properly is a core skill.

If you were applying to YC, would you capitalize the answers to their questions?

I would have to consider carefully if I thought I was a high-enough quality candidate that it would be interpreted as a countersignal rather than a signal.

If I, gwern, specifically, were to apply, I might; because I know I am widely read on HN and I've talked with any number of YC partners etc, and they all know I take care in writing, and so me not capitalizing is a deliberate message rather than laziness or incompetence. They may or may not appreciate the message, but they won't infer the usual things, at least.

If I were anyone else and my application just one of thousands in the flood? You'd better believe I'd capitalize and spellcheck my YC application: https://gwern.net/blog/2023/good-writing


Adding, swapping, or duplicating layers has a long history (eg. StyleGAN, upcycling), and it was pointed out at least as far back as He et al 2015 (Resnets) that you could ablate or add more layers because they functioned more as just doing some incremental compute iteratively, and many of them were optional. (Or consider Universal Transformers or heck, just how BPTT works.) So this idea is not far out of distribution, if at all, especially if you're a LLM who knows the literature and past approaches (which most humans would not because they only just got into this area post-ChatGPT).

I don’t disagree, but it’s worth having a look at the changes the LLM did apply.

https://github.com/karpathy/autoresearch/blob/master/progres...

My opinion is you’d have to go pretty far down the x axis to get to anything that’s not things like tinkering with bs, lr, or positional encodings. There are so many hyperparameter knobs already exposed that duplicating layers is unlikely to be proposed for a long time.

I also just noticed that the last change it applied was changing the random seed. Lol.


My understanding was that Autoresearch was defined as training from scratch (since it's based on the nanogpt speedrun), not using any pretrained models. So it couldn't do anything like upcycling a pretrained model or the Frankenmerge, because it's not given any access to such a thing in the first place. (If it could, the speedrun would be pointless as it would mostly benchmark what is the fastest fileserver you can download a highly compressed pretrained model checkpoint from...) It can increase the number of layers for a new architecture+run, but that's not the same thing.

> They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.

People use Base64 to store payloads of many arbitrary things, including web pages or screenshots, both deliberately and erroneously, and so they have almost certainly seen regular conversations in Base64 in their 10tb+ text training sets scraped from billions of web pages and files and mangled emails etc.


Yes, thats true.

But that points again to the main idea: The model has learnt to transform Base64 into a form it can already use in the 'regular' thinking structures.

The alternative is that there is an entire parallel structure just for Base64, which based on my 'chats' with LLMs in that format seems implausible; it acts like the regular model.

If there is a 'translation' organ in the model, why not a math or emotion processing organs? Thats what I set out to find, and are illustrated in the heatmaps.

Also, any writing tips from the Master blogger himself? Huge fan (squeal!)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: