Better reasoning and general performance than 13b by far (if llama1 was any indication), and like the other user said, can fit on a single 24gb vram gaming card, and can be peft fine-tuned with 2x 24gb cards.
Llama-1-33B was trained on 40% more tokens than LLama-1-13B; this explained some of the disparity. This time around they both have the same data scale (2T pretraining + 500B code finetune), but 34B is also using GQA which is slightly more noisy than MHA. Furthermore, there have been some weird indications in the original LLama-2 paper that 34B base model is something… even more special, it's been trained on a separate internal cluster with undervolted/underclocked GPUs (though this in itself can't hurt training results), its scores are below expectations, it's been less "aligned". Here, Code-Llama-Instruct-13B is superior to 34B on HumanEval@1. So yes, it's desirable but I wouldn't get my hopes up.
I wish that Meta would release models like SeamlessM4T[0] under the same license as llama2, or an even better one. I don't understand the rationale for keeping it under a completely non-commercial license, but I agree that is better than not releasing anything at all.
There seem to be opportunities for people to use technology like SeamlessM4T to improve lives, if it were licensed correctly, and I don't see how any commercial offering from smaller companies would compete with anything that Meta does. Last I checked, Meta has never offered any kind of translation or transcription API that third parties can use.
Whisper is licensed more permissively and does a great job with speech to text in some languages, and it can translate to English only. However, it can't translate between a large number of languages, and it doesn't have any kind of text to speech or speech to speech capabilities. SeamlessM4T seems like it would be an all-around upgrade.
Yeah - different projects have different goals and licenses aren't one size fits all. Depending on the project, type of technology, goals, etc.. we will select or even develop the right license that aligns with those goals. Hope this helps :)
Facebook Connect is what used to be called Oculus Connect. Kinda their equivalent of Apple's WWDC, I guess. It's when and where the Quest 3 will be officially unveiled in full, for example.
PyTorch Mobile is a start and is available for iOS and Android. Given folks like PFN and Microsoft are (or will be heavy contributors) i would expect support for more devices to broaden. Have you tried it out yet? No need for a separate set of op semantics or framework.. :) https://pytorch.org/mobile/home/
Anything that can't use mobile GPU (or DSP/TPU for quantized inference) is pretty useless IMO, because it's just not energy efficient enough to be practical in a battery powered device, even if it's fast enough.
Once pytorch is updated to use XNNPACK (being worked on right now) I think it should be fine to use. That plus QNNPACK makes inference quite low on power usage in my (admittedly limited, just integrated XNNPACK) experience.
As a rule, CPU burns at least 5x the energy per FLOP. So no, CPU is not a viable option on mobile if you need to do inference constantly. For "every now and then" cases, sure.
Yeah, generally a cloud based approach to training models is better than on your mac. AMD folks have been working on RoCM for some time and it works reasonably well for common models but the AMD GPU HW isn't pervasive like Nvidia GPUs in places like AWS.
If you are looking at using Colab for prototyping, you can also try TPUs which are now supported for PyTorch. Here is the link some additional info including some Colab NBs: https://github.com/pytorch/xla
But don't worry, this community moves fast!