I’m working on a language learning tool where it uses LLMs to generate stories at your ideal level. The idea is the user would be provided stories that are 95% comprehensible, with the other 5% being a mix of brand new words or words you are still learning. As you read the story you click on words that you still don’t fully understand. I am only working on Spanish right now, since I want to optimize for each language. It’s been fun designing my databases, coming up with calculation ideas, designing story validation, creating an estimation system of a user’s knowledge for onboarding. I know there is some debate about LLMs in language learning and I don’t think they should be trusted to explain grammar but if you validate it’s output it can be such a great tool to learn at your perfect level.
That sounds really interesting. I have a similar project, albiet pretty small. I want to generate comprehsnible input stories for the user say with 98% known words and 2% unknown words. Instead of rewriting stories though, I thought of having compiled list of books with say a book's top 1000 common but unique words, then you can add it to your desk and have those be generated in stories. That way once you complete the deck, it will be a lot easier to read your target book. I was looking into using numPy for that, not sure if you are using Python but it might be worth looking into.
reply