Hacker Newsnew | past | comments | ask | show | jobs | submit | asb's commentslogin


The situation is very confusing, but the tweet that went out with the announcement indicates it's not full 131k context yet and that is coming "soon"https://xcancel.com/CerebrasSystems/status/19437653011094202...


Note the announcement at the end, that they're moving away from the non-commercial only license used in some of their models in favour of Apache:

We’re renewing our commitment to using Apache 2.0 license for our general purpose models, as we progressively move away from MRL-licensed models


Note that this seems to be about the weights themselves, AFAIK, the actual training code and datasets (for example) aren't actually publicly available.

It's a bit like developing a binary application and slapping a FOSS license on the binary while keeping the code proprietary. Not saying that's wrong or anything, but people reading these announcements tend to misunderstand what actually got FOSS licensed when the companies write stuff like this.


It's not the same as slapping an open source license on a binary, because unencumbered weights are so much more generally useful than your typical program binary. Weights are fine-tunable and embeddable into a wide range of software.

To consider just the power of fine tuning: all of the press DeepSeek have received is over their R1 model, a relatively tiny fine-tune on their open source V3 model. The vast majority of the compute and data pipeline work to build R1 was complete in V3, while that final fine-tuning step to R1 is possible even by an enthusiastic dedicated individual. (And there are many interesting ways of doing it.)

The insistence every time open sourced model weights come up that it is not "truly" open source is tiring. There is enormous value in open source weights compared to closed APIs. Let us call them open source weights. What you want can be "open source data" or somesuch.


> The insistence every time open sourced model weights come up that it is not "truly" open source is tiring. There is enormous value in open source weights compared to closed APIs. Let us call them open source weights. What you want can be "open source data" or somesuch.

Agree that there is more value in open source weights than closed APIs, but what I really want to enable, is people learning how to create their own models from scratch. FOSS to me means being able to learn from other projects, how to build the thing yourself, and I wrote about why this is important to me here: https://news.ycombinator.com/item?id=42878817

It's not a puritan view but purely practical. Many companies started using FOSS as a marketing label (like what Meta does) and as someone who probably wouldn't be a software developer without being able to learn from FOSS, it fucking sucks that the ML/AI ecosystem is seemingly OK with the term being hijacked.


It's not just a marketing label. The term is not being hijacked. Open source models, open source weights, the license chosen, these are all exetrmely valuable concepts.

The thing you want, open source model data pipelines, is a different thing. It's existence in no way invalidates the concept of an open source model. Nothing has been hijacked.


We call software FOSS when you can compile (if needed) and build the project yourself, locally, granted you have the resources available. If you have parts that aren't FOSS attached to the project somehow, we call it "Open Core" or similar. You wouldn't call a software project FOSS if the only thing under a FOSS license is the binary itself, or some other output, we require at least the code to be FOSS for it to be considered FOSS.

Meta/Llama probably started the trend, and they still today say "The open-source AI models" and "Llama is the leading open source model family" which is grossly misleading.

You cannot download the Llama models or weights without signing a license agreement, you're not allowed to use it for anything you want, you need to add a disclaimer on anything that uses Llama (which almost the entire ecosystem breaks as they seemingly missed this when they signed the agreement) and so on, which to me goes directly against what FOSS means.

If you cannot reproduce the artifact yourself (again, granted you have the resources), you'd have a really hard time convincing me that that is FOSS.


The data pipeline to build the weights is the source. The weights are a binary. The term is being hijacked. Just call it open weights, not open source models. The source for the models is not available. The weights are openly available.


Meta’s LLaMa 2 license is not Open Source – Open Source Initiative: https://opensource.org/blog/metas-llama-2-license-is-not-ope...

If it would not be hijacked, then such articles would not exist.

META is falsely and deceptively, but also carefully, pretending to be Open Source.

The Open Source Definition – Open Source Initiative https://opensource.org/osd

What is Free Software? - GNU Project - Free Software Foundation https://www.gnu.org/philosophy/free-sw.html

Word "Open" as in "Open Source" - Words to Avoid (or Use with Care) Because They Are Loaded or Confusing https://www.gnu.org/philosophy/words-to-avoid.html#Open

Please refrain from using "open" or "open source" as a synonym for "free software." These terms originate from different perspectives and values. The free software movement advocates for your freedom in computing, grounded in principles of justice. The open source approach, on the other hand, does not promote a set of values in the same way. When discussing open source views, it's appropriate to use that term. However, when referring to our views, our software, or our movement, please use "free software" or "free (libre) software" instead. Using "open source" in this context can lead to misunderstandings, as it implies our views are similar to those of the open source movement.


Your concern about Meta's license is fair, I have no useful opinion on that. I certainly wish they would use a freer license, though I am loath to look a gift horse in the mouth.

My concern in this thread is people rejecting the concept of open source model weights as not "true" open source, because there is more that could be open sourced. It discounts a huge amount of value model developers provide when they open source weights. You are doing a variant of that here by trying to claim a narrow definition of "free software". I don't have any interest in the FSF definition.


I'm in favor of FOSS, and I'd like to see more truly open models for ideological reasons, but I don't see a lot of practical value for individuals in open-sourcing the process. You still can't build one yourself. How does it help to know the steps when creating a base model still costs >tens of millions of dollars?

It seems to me that open source weights enable everything the FOSS community is practically capable of doing.


> How does it help to know the steps when creating a base model still costs >tens of millions of dollars?

You can still learn web development even though you don't have 10,000s of users with a large fleet of servers and distributed servers. Thanks to FOSS, it's trivial to go through GitHub and find projects you can learn a bunch from, which is exactly what I did when I started out.

With LLMs, you don't have a lot of options. Sure, you can download and fine-tune the weights, but what if you're interested in how the weights are created in the first place? Some companies are doing a good job (like the folks building OLMo) to create those resources, but the others seems to just want to use FOSS because it's good marketing VS OpenAI et al.


Learning resources are nice, but I don't think it's analogous to web dev. I can download nginx and make a useful website right now, no fleet of servers needed. I can even get it hosted for free. Making a useful LLM absolutely, 100% requires huge GPU clusters. There is no entry level, or rather that is the entry level. Because of the scale requirements, FOSS model training frameworks (see GPT-NeoX) are only helpful for large, well-funded labs. It's also difficult to open-source training data, because of copyright.

Finetuning weights and building infrastructure around that involves almost all the same things as building a model, except it's actually possible. That's where I've seen most small-scale FOSS development take place over the last few years.


This isn't true. Learning how to train a 124M is just as useful as a 700B, and is possible on a laptop. https://github.com/karpathy/nanoGPT


To clarify my point:

Learning how to make a small website is useful, and so is the website.

Learning how to finetune a large GPT is useful, and so is the finetuned model.

Learning how to train a 124M GPT is useful, but the resulting model is useless.


> Finetuning weights and building infrastructure around that involves almost all the same things as building a model

Those are two completely different roles? One is mostly around infrastructure and the other is actual ML. There are people who know both, I'll give you that, but I don't think that's the default or even common. Fine-tuning is trivial compared to building your own model and deployments/infrastructure is something else entirely.


It wouldn't cost tens of millions of dollars to create a 500m or 1b, and the process of learning is transferrable to larger model weights.


Its not the exact same since you can still finetune it, you can modify the weights, serve it with different engines, etc.

This kind of purity test mindset doesn't help anyone. They are shipping the most modifiable form of their model.


Agree that it's not exactly the same, all analogies have holes, they're simplifications after all.

I guess I'm vary of the messaging because I'm a developer 99% thanks to FOSS, and being able to learn from FOSS projects how to build similar stuff myself. Without FOSS, I probably wouldn't have been able to "escape" the working-class my family was "stuck in" when I grew up.

I want to do whatever I can to make sure others have the same opportunity, and it doesn't matter if the weights themselves are FOSS or not, others cannot learn how to create their own models based on just looking at the weights. You need to be able to learn the model architecture, training and what datasets models are using too, otherwise you won't get very far.

> This kind of purity test mindset doesn't help anyone. They are shipping the most modifiable form of their model.

It does help others who might be stuck in the same situation I was stuck in, that's not nothing nor is it about "purity". They're not shipping the most open model they can, they could have done something like OLMo (https://github.com/allenai/OLMo) which can teach people how to build their own models from scratch.


Keep fighting the good fight. Saying Llama is open source is straight up lying. It's open weights.


Thank you, sometimes it feels weird to argue against people who are generally pro-FOSS but somehow for LLMs are fine with misleading statements. I'm glad at least one other person can see through it, encouraging I'm on the right track :)

I'm not sure I'd even call Llama "open weights". For me that would mean I can download the weights freely (you cannot download Llama weights without signing a license agreement) and use them freely, you cannot use them freely + you need to add a notice from Meta/Llama on everything that uses Llama saying:

> prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation.

https://www.llama.com/llama3_2/license/

Not sure what the correct label is, but it's not open source nor open weights, as far as I can tell.


> Note that this seems to be about the weights themselves, AFAIK, the actual training code and datasets (for example) aren't actually publicly available.

Like every other open source / source available LLM?


Like every other Open Source LLM weights, yes. But looking around, there are models that are 100% FOSS, like OLMo (https://github.com/allenai/OLMo).

Also, I don't buy the argument that because many in the ecosystem mislabel/mislead people about the licensing, makes it ethically OK for everyone else to do so too.



While I hope the HuggingFace is successful here, a plan for building a model is a long way from releasing a model. Mistral has models out there - they allow you to modify them. Yeah, it’s now like what we’re used to. It probably needs something else, but people are doing some great things with them.


Binaries can do arbitrary things, like report home to a central server. Weights cannot.


Why is that relevant regarding the FOSS aspects of weights / binaries? If I run a binary within a VM and only consider its output and prevent any side-effect host, just like I could just consider the output of an LLM, my binary is still not any closer to being FOSS, is it?


Depending on format, they might.


Virtually all models are now distributed as Safetensors/gguf/etc. (which are just metadata + data), not pickled Python classes. Many libraries also don't even load pickled checkpoints anymore unless you add an argument explicitly stating that you want to load an unsafe checkpoint.


But the weights can be modified. Also the real key is that you can host it yourself, fine tune and make money from it without restriction. That's what it's really about. No one (well, few) cares about recreating it because if they could they'd simply have made one from scratch themselves.


The same is true for FOSS in general. You're arguing that because no one (almost) builds their own X from scratch, there is therefore no value in having resources available for how to build your own X from scratch.

For someone who basically couldn't become a developer with FOSS, this way of thinking is so backwards, especially on Hacker News. I thought we were pro-FOSS in general, but somehow LLMs get a pass because "they're too complicated and no one would build one from scratch".


They get a pass because we know what these companies train on (proprietary or private data) but they can't admit it, but they're still giving away multi million dollar models for free.

Yes, it'd be nice if it was open and reproducible from start to finish. But let's not let perfect be the enemy of good.


> Yes, it'd be nice if it was open and reproducible from start to finish. But let's not let perfect be the enemy of good.

"Let's not let companies exploit well-known definitions for their own gain" is what I'm going for, regardless if we personally gain from it or not.


I still don't understand why they have to mix definitions to confuse developers, and we on top of that apparently have to give up on the true meaning of FOSS. What's so hard about using the term "open weights" or some new term instead of trying to reuse FOSS terms they don't abide to?


The binary comparison is a bit bad, since binary can have copyrights. Weights cannot.


Has that actually been tried in court, or is that your guess? Because you seem confident, but I don't think this has been tried (yet)


It is a guess (not the same author) but it'd make sense: weights are machine output so if the output of AI is not under copyright because it is machine output (which seems to be something that is pretty much universally agreed upon), then the same would apply for the weights themselves.

I'm not sure how someone would argue (in good faith) that training on copyrighted materials does not cause the weights to be a derivative of those materials and the output of their AI is not protected under copyright but the part in the middle, the weights, does fall under copyright.

Note that this would be about the weights (i.e. the numbers), not their container.


Photographs are machine output too and are famously subject to copyright. The movie Toy Story is also machine output, but I'm confident Disney is enforcing copyright on that.

The opinion that AI output isn't copyrightable derives from the opinion of the US Copyright Office, which argues that AI output is more like commissioning an artist than like taking a picture. And since the artist isn't human they can't claim copyright for their work.

It's not at all obvious to me that the same argument would hold for the output of AI training. Never mind that the above argument about AI output is just the opinion of some US agency and hasn't been tested in court anywhere in the world.


As mentioned by the other comment, the difference lies in how much human effort was put - something by its nature is not possible to ask in a black and white manner that applies everywhere. But in cases which are close to the extremes, it is easy to answer: even though the Toy Story renders are machine output, these renders are the result of a lot of human effort by the human artists that made the 3D scenes, materials, models, animation sequences, etc for the purpose of being used in those renders. So Disney can claim copyright on that sort of "machine output".

Similarly, claiming copyright on AI output is like claiming copyright on something like `init_state(42, &s); for (int i=0; i < count; i++) output[i] = next_random(&s);`. While there is a bit of (theoretical) effort involved into choosing 42 as a starting input, ultimately you can't really claim copyright on a bunch of random numbers because you chose the initial seed value.

Of course you can claim copyright in the code, but doing the same on the output makes no sense: even the if the idea of owning random numbers isn't absurd enough, consider what would happen if -say- 10000 people did the same thing (and to make things even more clear, what if `init_state` used only 8bits of the given number, therefore making sure that there would be a lot of people ending up with the same numbers).

AI is essentially `init_state` and `next_random`, just with more involved algorithms than a random number generator.


A photograph is often subject to copyright - but there's actually some nuance here; some countries also require a certain level of creative input by a human.

https://en.wikipedia.org/wiki/Threshold_of_originality

Areas of dispute include photographs of famous paintings (is it more in the character of a photocopy?), photographs taken by animals (does the human get copyright if they deliberately created the situation where the animal would take a photograph?), and videos taken automatically (can a CCTV video have an author?)

Historically, the results are all over the place.


The copyright office is actively figuring it out. From yesterday: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...


If you make them copyrightable, then it means they are a derivative of the training dataset.

The only defense these AI companies have is making the weights machine output and thus not copyrightable.

But then again that's the theory, the copyright system follows money and it wouldn't be surprising to have contradicting ideas being allowed.


I guess since they're not ahead anymore they decide to go back to open source.


They must have realized they were becoming irrelevant... I know I forgot about them and have been using other models locally. Openness is a huge win, even if I am using Mistral's hosting service I want to know I can always host it myself too, to protect my business against rug pulls and the like.

No one's going to pay for an inferior closed model...


Happy to seem them back to releasing OSS models, we used a lot of their OSS models early last year before they were eclipsed by better models and never bothered to try any of their large models which IMO weren't great value.


I wonder if that's a consequence of the Deepseek distill release: fine-tuned Qwen and Llama were both released by Deepseek, but not Mistral, and that's was a missed PR opportunity for them for no good reason.


What does an Apache licence even mean in this context? It's not software. Is it even copyrightable?


I think this is a really interesting area. I wrote a command line took for web reading with some similar motivations. In my case, you queue up the articles to read the next day.

https://muxup.com/pwr


Nice!

P.S. I love the colour choices and the colour stripe interactions on your website. So cool!


If you're playing on the Steam Deck, mapping 'touch' on the left and right trackpads to the left and right bongo, and a touch on either joystick (they have capacitive touch) as a clap works rather well.


I used this trick for a quick favicon on my site https://muxup.com/ - though unfortunately Google/Bing/DuckDuckGo doesn't like svg favicons and so it's not displayed in search results. I should really add a proper favicon...


Favicons have always been weirdly special. I don’t know why.

SVG support feels like it should be pretty straightforward to implement.

Last time I looked at how to implement favicons “correctly” I had to make some weird XML file with the word apple in it and it lists all the size of icons I have, and then use a favicon generator to create all the “correct” sizes.

What a mess.


at least for google it does render my svg favicon in their search result page, Bing and DuckDuckGo didn't


Oh that's interesting, I wonder what's different about your SVG favicon vs mine.


I think they should probably set LoopMicroOpBufferSize to a non-zero value even if its not microarchitecturally accurate. This value is used in LLVM to control whether partial and runtime loop unrolling are enabled (actually only for that). Although some targets override this default behaviour, AArch64 only overrides it to enable partial and runtime unrolling for in-order models. I've left a review comment https://github.com/llvm/llvm-project/pull/91022/files#r16026... and as I note there, the setting seems to have become very divorced from microarchitectural reality if you look at how and why different scheduling models set it in-tree (e.g. all the Neoverse cores, set it to 16 with a comment they just copied it from the A57).


I somehow hadn't come across this library, but have a whole blog post on the various ways people store data in pointers (and when+why it's safe) https://muxup.com/2023q4/storing-data-in-pointers


Awesome, been looking for this information for ages already.


One source of stats is the 2023 Annual Rust Survey. The question on text editors allowed multiple responses (so percentages naturally don't add up to 100%), but 5.5% of respondents reported using Emacs for Rust vs ~30% for Vim/Neovim, vs ~61% for VS Code. I was shocked that Emacs and Vim weren't closer. https://blog.rust-lang.org/2024/02/19/2023-Rust-Annual-Surve...

Obviously, it's possible this is a quirk of the Rust community. Though the Go survey shows similarly small Emacs usage numbers https://go.dev/blog/survey2023-h2-results (3% Emacs vs 16% Vim/Neovim).


Why would emacs be larger? It’s anecdotal, but I’ve gone thru my entire life (I’m 25) without ever seeing it installed on a machine I have used or mentioned in any of the learning materials, when I was younger I only heard it as a flame war topic.


> I was shocked that Emacs and Vim weren't closer.

I don't really find this surprising. VS Code is just very easy to use and full featured without spending much time on configuration. It took me a while but I had to admit I was just making my life harder by using vim instead of vs code


I read the quoted section as surprise that they aren’t closer to each other.


Yes that's what I meant. Just an assumption I'd had, perhaps the joking about Emacs vs Vim preference in programmers made me assume the groups were of similar size.


Ah, my mistake, I think you're correct


It's been stable like this for a long, long time. 5% emacs, 20% vim, the rest to the editors of the day.


let's see next year with all the new rust based text editors :) lapce zed and similar


(Article author here). I'm typically executing the same compile command from shell history (or via an alias) so missing off the `'; bell` isn't really a concern, but I agree that automatically triggering it after commands of a certain duration is a nicer way of avoiding that mental overhead if you're executing a wider variety of commands.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: