idk what it is about them that every "tech bro" type guy around me follows them, but I never followed them myself, so I was surprised to know they only have 300k on Twitter.
https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.
New features shipped last month:
- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting
i was delighted to see your comment at top... I am working on the exact same thing, generating concept DAGs from books and letting a tutor agent use it for structure and textbook reference.
can we discuss this somewhere else?
https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.
New features shipped last month:
- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting
The IRT angle is interesting — most adaptive learning tools just do basic spaced repetition, but using Item Response Theory to estimate ability level in real-time is a much more honest approach to "personalized." The JSXGraph integration for gradable math graphs is a nice touch too, that's a hard problem. Quick question: how do you handle subjects where the "right answer" is more ambiguous? Does the LLM grading struggle with open-ended questions outside of math?
yeah we use an LLM for the grading .. (for the free form questions)
the flow is basically:
When practice questions are generated, the model generates the question + the reference answer together, but the user only sees the question. then on submit, a smaller model grades the learner answer against that reference answer + the grading criteria.
I benchmarked a bunch of judge models for this on a small multi-subject set, and `gpt-oss-20b` ended up being a very solid sweet spot for quality/speed/structured-output reliability. on one of the internal benchmarks it got ~98.3% accuracy over 60 grading cases, with ~1.6s p50 latency, so it feels fast enough to use live.
for math, it’s not just LLM grading though:
- `SymPy` for latex/math expressions, so if the learner writes an equivalent answer in a different form, it still gets marked correct; so `(x+2)(x+3)` and `x^2 + 5x + 6` can both pass. (but might remove that one since it might be easily replaced by an LLM? And it's a niche use that add some maintenance cost)
- tolerance-based checks for the JSXGraph board state stuff; so on the graph if you plotted x = 5.2 instead of 5.3 it will be within the margin of error to pass but will give you a message about it
I also tried embedding/similarity checking early on, but it was noticeably worse on tricky answers, so I didn’t use that as the main path.
Google Workspace API(s) keys and Roles was always confusing to me at so many levels .. and they just seem to keeping topping that confusion, no one is addressing the core (honestly not sure if that is even possible at this point)
I switched from using YouTube to invidious mainly because they don't support shorts and blocked YouTube on the DNS level, it's a bit slower, but I know I won't be sucked into doom-scrolling
VSCode/GH copilot -> windsurf -> Zed/Claude code -> Zed/codex -> Zed/opencode -> Antigravity/opencode
I'm only using antigravity cause they have good limits for now .. (but it we be matter of time before it will go away and then go back to Zed)
reply