Hacker Newsnew | past | comments | ask | show | jobs | submit | nicoloboschi's commentslogin

If you've been following agent memory evaluation, you know LoComo and LongMemEval. They're solid datasets. The problem isn't their quality; it's when they were designed.

Both come from an era of 32K context windows. Back then, you physically couldn't fit a long conversation into a single model call, so needing a memory system to retrieve the right facts selectively was the premise. That made those benchmarks meaningful.

That era is over.

State-of-the-art models now have million-token context windows. On most LoComo and LongMemEval instances today, a naive "dump everything into context" approach scores competitively, not because it's a good architecture, but because the window is large enough to hold the whole dataset. These benchmarks can no longer distinguish a real memory system from a context stuffer. A score on them no longer tells you much.


I hate to copy the same Dockerfile all over my projects. Also creating production-optimized Dockerfile is hard.

So I created a plugin. No config in the pyproject.toml needed for most of the use cases.

>poetry self add poetry-dockerize-plugin

>poetry dockerize


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: