Hacker Newsnew | past | comments | ask | show | jobs | submit | jspisak's commentslogin

Partner integrations will follow. For now we just have the weights available.

But don't worry, this community moves fast!


Probably superseded (by y’all) within a week!


It would be interesting to understand if a ~30B Llama-2 model would be interesting and for what reasons.


Better reasoning and general performance than 13b by far (if llama1 was any indication), and like the other user said, can fit on a single 24gb vram gaming card, and can be peft fine-tuned with 2x 24gb cards.


Llama-1-33B was trained on 40% more tokens than LLama-1-13B; this explained some of the disparity. This time around they both have the same data scale (2T pretraining + 500B code finetune), but 34B is also using GQA which is slightly more noisy than MHA. Furthermore, there have been some weird indications in the original LLama-2 paper that 34B base model is something… even more special, it's been trained on a separate internal cluster with undervolted/underclocked GPUs (though this in itself can't hurt training results), its scores are below expectations, it's been less "aligned". Here, Code-Llama-Instruct-13B is superior to 34B on HumanEval@1. So yes, it's desirable but I wouldn't get my hopes up.


Llama 34B is just big enough to fit on a 24GB consumer (or affordable server) GPU.

Its also just the right size for llama.cpp inference on machines with 32GB RAM, or 16GB RAM with a 8GB+ GPU.

Basically its the most desirable size for AI finetuning hobbyists, and the quality jump from llama v1 13B to llama v1 33B is huge.


It would fit on the 24GB top-end consumer graphics cards with quantization.


If you watch the Connect talks, I'll be speaking about this..


I wish that Meta would release models like SeamlessM4T[0] under the same license as llama2, or an even better one. I don't understand the rationale for keeping it under a completely non-commercial license, but I agree that is better than not releasing anything at all.

There seem to be opportunities for people to use technology like SeamlessM4T to improve lives, if it were licensed correctly, and I don't see how any commercial offering from smaller companies would compete with anything that Meta does. Last I checked, Meta has never offered any kind of translation or transcription API that third parties can use.

Whisper is licensed more permissively and does a great job with speech to text in some languages, and it can translate to English only. However, it can't translate between a large number of languages, and it doesn't have any kind of text to speech or speech to speech capabilities. SeamlessM4T seems like it would be an all-around upgrade.

[0]: https://github.com/facebookresearch/seamless_communication


Yeah - different projects have different goals and licenses aren't one size fits all. Depending on the project, type of technology, goals, etc.. we will select or even develop the right license that aligns with those goals. Hope this helps :)


Excited! I hope your talks are just as informative as this comment. Keep rocking!


Sorry—who are you and what are the Connect talks? I haven't heard of them and you don't have a bio.



That'll be Joseph Spisak, Head of Generative AI Open Source at Meta AI.


what is that?


Facebook Connect is what used to be called Oculus Connect. Kinda their equivalent of Apple's WWDC, I guess. It's when and where the Quest 3 will be officially unveiled in full, for example.


Yep - here is the site: https://www.metaconnect.com/en/home


thanks for posting this - you can learn more here: https://ai.facebook.com/blog/protein-folding-esmfold-metagen...

Cheers!


It's really awesome to finally see this landed. Congrats to the team and to the PyTorch community!


PyTorch Mobile is a start and is available for iOS and Android. Given folks like PFN and Microsoft are (or will be heavy contributors) i would expect support for more devices to broaden. Have you tried it out yet? No need for a separate set of op semantics or framework.. :) https://pytorch.org/mobile/home/


Anything that can't use mobile GPU (or DSP/TPU for quantized inference) is pretty useless IMO, because it's just not energy efficient enough to be practical in a battery powered device, even if it's fast enough.


Once pytorch is updated to use XNNPACK (being worked on right now) I think it should be fine to use. That plus QNNPACK makes inference quite low on power usage in my (admittedly limited, just integrated XNNPACK) experience.


As a rule, CPU burns at least 5x the energy per FLOP. So no, CPU is not a viable option on mobile if you need to do inference constantly. For "every now and then" cases, sure.


Interesting, thanks.


We have been working with G on TPU support for PyTorch. Have you tried it out? https://github.com/pytorch/xla


Finally the solution to all of your PyTorch citation problems! :)



It is surprising to me that they don't put this in the README.

(I do know it is in the root directory[0], it is just common practice to have it in the README. To be stupidly obvious)

[0] https://github.com/pytorch/pytorch/blob/master/CITATION


Yeah, generally a cloud based approach to training models is better than on your mac. AMD folks have been working on RoCM for some time and it works reasonably well for common models but the AMD GPU HW isn't pervasive like Nvidia GPUs in places like AWS.

If you are looking at using Colab for prototyping, you can also try TPUs which are now supported for PyTorch. Here is the link some additional info including some Colab NBs: https://github.com/pytorch/xla


Amazon backing MXNet is pretty awesome and having it be a 'real' open project is great for the community..


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: