shash42's comments

shash42 · 2026-02-02T08:21:01 1770020461

For multi-agent jailbreak propagation research only

shash42 · 2025-11-21T19:41:17 1763754077

New blogpost: Why I think automated research is the means, not just the end, for training superintelligent AI systems.

In pointing models at scientific discovery, we will have to achieve the capabilities today's LLMs lack: - long-horizon palnning - continual adaptation - reasoning about uncertainty - information-efficient learning - and creative exploration.

Some of these capabilities may emerge from large-scale training. Others will will require changes in how we implement and train AI systems. I don't yet know how exactly such a training loop would look. So consider this post a conjecture.

But science offers a few unique properties at its foundation: - large open data - verifiability - truth-seeking (instead of power-seeking) incentives.

And thus I think scientific discovery is the ideal successor to internet-scale pretraining. It's not just an application, it maybe the means to building what we're missing. Maybe that's why we have @openai @GoogleDeepMind @periodiclabs @futurehouse etc. all focusing on it.

shash42 · 2025-09-18T15:29:59 1758209399

Where do learning signals come from when there is no ground truth in post-training?

New paper shows how to convert inference-time compute into high quality supervision for RL training.

Up to 30% rel. improvement on a realistic non-verifiable tasks (HealthBench), with the models own self-synthesised rubrics!

shash42 · 2025-09-13T01:53:24 1757728404

Does continued scaling of large language models (LLMs) yield diminishing returns? Real-world value often stems from the length of task an agent can complete. We start this work by observing the simple but counterintuitive fact that marginal gains in single-step accuracy can compound into exponential improvements in the length of a task a model can successfully complete. Then, we argue that failures of LLMs when simple tasks are made longer arise from mistakes in execution, rather than an inability to reason. We propose isolating execution capability, by explicitly providing the knowledge and plan needed to solve a long-horizon task. We find that larger models can correctly execute significantly more turns even when small models have 100\% single-turn accuracy. We observe that the per-step accuracy of models degrades as the number of steps increases. This is not just due to long-context limitations -- curiously, we observe a self-conditioning effect -- models become more likely to make mistakes when the context contains their errors from prior turns. Self-conditioning does not reduce by just scaling the model size. In contrast, recent thinking models do not self-condition, and can also execute much longer tasks in a single turn. We conclude by benchmarking frontier thinking models on the length of task they can execute in a single turn. Overall, by focusing on the ability to execute, we hope to reconcile debates on how LLMs can solve complex reasoning problems yet fail at simple tasks when made longer, and highlight the massive benefits of scaling model size and sequential test-time compute for long-horizon tasks.

shash42 · 2025-06-29T20:18:59 1751228339

This is a living document where I'll track my evolving thoughts on what remains on the path to building generally-intelligent agents. Why does this matter? Three compelling reasons:

Top-down view: AI research papers (and product releases) move bottom-up, starting from what we have right now and incrementally improving, in the hope we eventually converge to the end-goal. This is good, that’s how concrete progress happens. At the same time, to direct our efforts, it is important to have a top-down view of what we have achieved, and what are the remaining bottlenecks towards the end-goal. Besides, known unknowns are better than unknown unknowns.

Research prioritisation: I want this post to serve as a personal compass, reminding me which capabilities I believe are most critical for achieving generally intelligent agents—capabilities we haven't yet figured out. I suspect companies have internal roadmaps for this, but it’s good to also discuss this in the open.

Forecasting AI Progress: Recently, there is much debate about the pace of AI advancement, and for good measure—this question deserves deep consideration. Generally-intelligent agents will be transformative, requiring both policymakers and society to prepare accordingly. Unfortunately, I think AI progress is NOT a smooth exponential that we can extrapolate to make predictions. Instead, the field moves by shattering one (or more) wall(s) every time a new capability gets unlocked. These breakthroughs present themselves as large increases in benchmark performance in a short period of time, but the absolute performance jump on a benchmark provides little information about when the next breakthrough will occur. This is because, for any given capability, it is hard to predict when we will know how to make a model learn it. But it’s still useful to know what capabilities are important and what kinds of breakthroughs are needed to achieve them, so we can form our own views about when to expect a capability. This is why this post is structured as a countdown of capabilities, which as we build out, will get us to “AGI” as I think about it.

*Framework* To be able to work backwards from the end-goal, I think it’s important to use accurate nomenclature to intuitively define the end-goal. This is why I’m using the term generally-intelligent agents. I think it encapsulates the three qualities we want from “AGI”:

Generality: Be useful for as many tasks and fields as possible.

Intelligence: Learn new skills from as few experiences as possible

Agency: Planning and performing a long chain of actions.

Click and read the blog for:

Introduction

…. Framework

…. AI 2024 - Generality of Knowledge

Part I on The Frontier: General Agents

…. Reasoning: Algorithmic vs Bayesian

…. Information Seeking

…. Tool-use

…. Towards year-long action horizons

…. …. Long-horizon Input: The Need for Memory

…. …. Long-horizon Output

…. Multi-agent systems

Part II on The Future: Generally-Intelligent Agents [TBA]

shash42 · on Feb 7, 2025

New paper finds that: 1) Model mistakes are getting similar with increasing capabilities, pointing to a risk of correlated failures. 2) LLM-as-a-judge is biased towards more similar models 3) Training using another LLM benefits from complementary knowledge