Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

this is super cool! I wish there was an easy to understand and follow guide on how to make your own embedding, for llama2 for example. All I can find are various guides that already assume you know everything there is to training an embedding.

I just want to make an embedding between a conversation of me and my friend and simulate talking to them. Is this a hard thing to train to begin with?

If anyone knows or could help me with this, I would be very grateful!



I will butcher this so if any experts see this please don't flame me. I think you might be conflating ideas? You could definitely fine-tune existing embedding models or train your own from scratch but the goals of embeddings models are different than a LLM conversation. Embedding models are used for things like, classifying, search, image captioning...maybe at a high level anything where you have high dimensionality that you need to condense?

What you are asking for sounds like fine tuning an existing LLM...where the data will be tokenized but the outcomes are different? There is a lot of writeups on how people have done it. You should especially follow some of the work on Huggingface. To replicate talking to your friend though, you will need a very large dataset to train off of I would think and its unclear to me if you can just fine-tune it or you would need to train a model from scratch. So a dataset with 10s of thousands of examples and then you need to train it on a GPU.

https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...


Thank you for sending this. It's still quite puzzling to me if it's actually possible or not. Maybe what I want to train is a style? But then again, it should also remember other important things related to the friend..


Parent comment is on the right track. It sounds like you want to fine tune an llm to mimic the conversation style between you and your friend. Then you can use a general embedding model to implement RAG so that the application can "recall" pieces of your conversation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: