Thank you so so much, this was the only way to get LLaMA running on my desktop's GPU! Everything else was plagued by everything from compile errors to version mismatches to miscompiled wheels to weird contradictions or whatever. I'm so happy that this works. I can finally use an LLM to my heart's content without relying on OpenAI and their stupid server load and phone number requirement