I work on compilers. A friend of mine works on webapps. I've seen Cursor give him lots of useful code, but it's never been particularly useful on any of the code of mine that I've tried it on.
It seems very logical to me that there'd be orders of magnitude more training data for some domains than others, and that existing models' skill is not evenly distributed cross-domain.
This. Also across languages. For example, I suppose there is a lot more content in python and javascript than Apple script, for example. (And to be fair not a lot of the python suggestions I receive are actually mindblowing good)
i'm still patiently waiting for an easy way to point a model at some documentation, and make it actually use that.
My usecase is gdscript for godot games, and all the models i've tried so far use godot 2 stuff that's just not around anymore, even if you tell it to use godot 4 it gives way too much wrong output to be useful.
I wish i could just point it at the latest godot docs and have it give up to date answers. but seeing as that's still not a thing, i guess it's more complicated than i expect.
There's llms.txt [0], but it's not gaining much popularity.
My web framework of choice provides these [1], but they're not easily injected into the LLM context without much fuss. It would be a game changer if more LLM tools implemented them.
It's definitely a thing already. Look up "RAG" (Retrieval Augmented Generation). Most of the popular closed source companies offer RAG services via their APIs, and you can also do it with local llms using open-webui and probably many other local UIs.
It seems very logical to me that there'd be orders of magnitude more training data for some domains than others, and that existing models' skill is not evenly distributed cross-domain.