Noddy, what’s “fair game” for this benchmark? e.g. do you wish to provide fronti... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		vessenes 9 months ago \| parent \| context \| favorite \| on: Show HN: Factorio Learning Environment – Agents Bu... Noddy, what’s “fair game” for this benchmark? e.g. do you wish to provide frontier models with a text goal, tooling info, and leave it at that? Or do you wish to have agent architectures compete? It seems to me like tiering the goal setting, layout and implementation are all separate tasks that would benefit from different agents.

noddybear 9 months ago [–]

The idea is for us to track all frontier models using the basic agent (goal, tooling info), and then offer another leaderboard for different agent architectures (with retrieval etc).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact