A noob question? Do you all intend to work on LLM’s or watching the content for the curious mind.I am asking how anyone like me as a software generalist can make use of this amazing content.Anyone with insights on how to transition from a generalist backend engineer to an AI engineer ? Or its a niche and the only path is the route of PHD …
Speaking for myself, and except for just being curious, it's mostly for similar reasons as to why you'd want to read, for example, CLRS, even though you'll probably never implement an algorithm like that in a real production environment yourself. It's not so much about learning how, but rather why, because it'll help you answer your why's in the future (not that the how can't also be important, of course).
I was not really interested in LLMs till a month back. I had an earlier product where I wanted a no-code app for business insights on any data source. Plug in MySQL, PostgreSQL, APIs like Stripe, Salesforce, Shopify, even CSV files and it would be able to generate queries from user's GUI interactions. Like Airtable but for own data sources. I was generating SQLs including JOINs, or HTTPS API calls.
Then I abandoned it in 2021. This year, it struck me that LLMs would be great to infer business insights from the schema. I could create reports and dashboards automatically, surface critical action points straight from the schema/data and users chatting with the app.
So for the last couple weeks, I have been building it, running test on LLMs (CodeLlama, Zephyr, Mistral, Llama 2, Claude and ChatGPT). The results are quite good. There is a lot of tech that I need to handle: schema analysis, SQL or API calls, and the whole UI. But without LLMs, there was no clear way for me to infer business insights from schema + user chats.
To me, this is not a niche anymore now that I have found a problem I wanted to tackle already.
I would compare it to when I was taught how to build my own compiler. Taking away the magic was empowering. Later on I saw many opportunities to use the some of the basic compiler techniques, even though I'm not out there writing the JDK.
If you had to pick, building a project using off the shelf tech would better prepare you to work your first AI engineering job. However, the knowledge in these videos could help you land that first job, and is a useful base for concepts that aren't going away any time soon.
Also, please let us know if you figure out the secret. I would love to also switch from generalist backend to ML/AI.
There’s a (or soon to be) market for software people that can evaluate a use case and apply an LLM if warranted. You don’t need a PhD but do need a good working knowledge of the nuts/bolts to speak truth to hype. Karpathy has a YouTube titles something like “a busy persons guide to LLM” and in it he describes the model as an operating system kernel with tools and utilities surrounding it. You can build and understand those valuable tools and utilities without having a PhD in AI. I think that’s the way to break into the AI market as a traditional developer.
Just a guess, but understanding how LLMs are built may also help you if you want to fine-tune a model. Someone who knows more may confirm or contradict this.