You can ultra fine tune those models ... look at vicune 13B, if you know how to ...

wongarsu · on May 6, 2023

Sure, a 13B model can be fine-tuned to be pretty decent, which is quite remarkable compared to GPT3's 175B paramters. But a 3B model has 1/4th as many parameters as Vicune-13B, or about twice as many as GPT2. Can you really fine-tune that to do anything useful that wouldn't be better handled by a more specialized open-source model?

cced · on May 6, 2023

How can someone get into using these models? How does ‘tuning’ work? How might I go about using these models for doing things like say summarizing news articles or video transcriptions? When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model?

examplary_cable · on May 6, 2023

(I'm not an expert)

> How can someone get into using these models

You can use gradio(online) or download(git will not download, it's too big, do it manually) the weights at https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main and then load the model in pytourch and try inference(text generation). But you'll need either a lot of RAM(16GB,32GB+) or VRAM(Card).

> How might I go about using these models for doing things like say summarizing news articles or video transcriptions Again, you might try online or setup a python/bash/powershell script to load the model for you so you can use it. If you can pay I would recommend runpod for the shared GPUs.

> When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model? From my view ... not much ... "fine-tuning" means training(tuning) on a specific dataset(fine, as in fine-grained). As I believe(I'm not sure) they just run more epochs on the model with the new data you have provided it until they reach a good loss(the model works), that's why quality data is important.

You might try https://github.com/oobabooga/text-generation-webui they have a pretty easy setup config. Again, you'll need a lot of RAM and a good CPU for inference on CPU or a GPU.

https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main

chaxor · on May 6, 2023

A newer but much better system actually reduces the model size while reducing the functionality of the system - similar to training a NN for a very specific task (as was typical several years ago), but now it can happen with far less data. https://arxiv.org/pdf/2305.02301.pdf This paper is quite fantastic, and will likely shape up to be a quite important glue task for LLM models to generate.

ym555 · on May 6, 2023

While I recognize that this only one example of what you can do, you can just ask chatgpt to program you a traditional program that does something like this and not have to run a (pretty big/power-intensive/slow on most hardware) 3B/7B parameter model for simple tasks like these.

Yeah it wouldn't be as flexible as a LLM (for example synonyms won't work), but I doubt that for this particular task it'll be that big of problem, and you can ask it to tweak the program in various ways (for example introducing crude spaced-repetition) making it arguably better than the AI solution which takes sometime to prompt engineer and will never be "perfect".

I don't really know how much better fine-tuning makes these models, so I can't think of anything that they can actually be used for where they aren't worse than traditional programs, maybe as an AI in games? for example making them role-play as a historical figure in Civilization 6.

examplary_cable · on May 6, 2023

My example here was silly and I admit. But the point was that this simple task cab become more "nuanced"(Aside from ChatRWVK-raven, no other model quite "works" like Vicuna or "tuned LLama"), it can, given the correct prompt act as someone in a fictional work which might help you learn the language better by increase conversational time(most important metric, I'm talking comprehensible input here) by the virtue of being more enjoyable.

Overall I like the progress: LLama releases -> LLama fine turned on larger models gets similar performance to ChatGPT on lower parameters(more efficient) -> People can replicate LLama's model without anything special, effectively making LLMs a "Commodity" -> You are Here.