Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Sentrial (YC W26) – Catch AI agent failures before your users do (sentrial.com)
31 points by anayrshukla 7 days ago | hide | past | favorite | 14 comments
Hey HN! We're Neel and Anay, and we’re building Sentrial (https://sentrial.com). It’s production monitoring for AI products. We automatically detect failure patterns: loops, hallucinations, tool misuse, and user frustrations the moment they happen. When issues surface, Sentrial diagnoses the root cause by analyzing conversation patterns, model outputs, and tool interactions, then recommends specific fixes.

Here's a demo if you're interested: https://www.youtube.com/watch?v=cc4DWrJF7hk. When agents fail, choose wrong tools, or blow cost budgets, there's no way to know why - usually just logs and guesswork. As agents move from demos to production with real SLAs and real users, this is not sustainable.

Neel and I lived this, building agents at SenseHQ and Accenture where we found that debugging agents was often harder than actually building them. Agents are untrustworthy in prod because there’s no good infrastructure to verify what they’re actually doing.

In practice this looks like: - A support agent that began misclassifying refund requests as product questions, which meant customers never reached the refund flow. - A document drafting agent that would occasionally hallucinate missing sections when parsing long specs, producing confident but incorrect outputs. There’s no stack trace or 500 error and you only figure this out when a customer is angry.

We both realized teams were flying blind in production, and that agent native monitoring was going to be foundational infrastructure for every serious AI product. We started Sentrial as a verification layer designed to take care of this.

How it works: You wrap your client with our SDK in only a couple of lines. From there, we detect drift for you: - Wrong tool invocations - Misunderstood intents - Hallucinations - Quality regressions over time. You see it on our platform before a customer files a ticket.

There’s a quick mcp set up, just give claude code: claude mcp add --transport http Sentrial https://www.sentrial.com/docs/mcp

We have a free tier (14 days, no credit card required). We’d love any feedback from anyone running agents whether they be for personal use or within a professional setting.

We’ll be around in the comments!

 help



Interesting gap to explore: Sentrial catches drift and anomalies -- failures that happen by accident. What's the defense against failures that happen by design?

Prompt injection is the clearest example: an attacker embeds instructions in content your agent processes. The agent does exactly what it's told. No wrong tool invocations, no hallucinations in the traditional sense -- just an agent successfully executing injected instructions. From a monitoring perspective it looks like normal operation.

Same with adversarial inputs crafted to stay inside your learned "correct" patterns: tool calls are right, arguments are plausible, outputs pass quality checks. The manipulation is in what the agent was pointed at, not in how it behaved.

Curious whether your anomaly detection has a layer for adversarial intent vs. operational drift, or whether that's explicitly out of scope for now.


Congrats on the launch! The production monitoring angle is genuinely underserved. Most teams only realize AI agent failures exist once users are complaining.

The most common failure mode we see: AI agents write code that passes all existing tests and looks fine in review, but has subtle IDOR issues, hardcoded secrets, or hallucinated package imports with vulnerable versions. Those don't surface at runtime until conditions are just right.


Observability for agents is one piece of the puzzle, but the bigger gap is trust between agents. When agent A delegates work to agent B, how does A know B's track record? Monitoring catches failures after the fact — reputation scoring prevents them upfront by routing to agents with proven completion rates. Both layers needed.

This is an AI agent.

How do you identify "wrong tool" invocations (how is the "wrong tool" defined)?

Good question. We don’t define “wrong tool” in some universal way, because that really depends on the workflow.

What we do in practice is let the team mark a few tool calls as right or wrong in context, then use that to learn the pattern for that agent. From there, we can flag similar cases automatically by looking at the convo state, the tool chosen, the arguments, and what happened next.

So we’re learning what “correct” looks like for your workflow and then catching repeats of the same kind of mistake.


That sounds like a critical challenge—identifying failures early can save a lot of headaches. I’ve seen teams get stuck when issues pop up, unsure of the root cause. Consider focusing on clear logging and pattern recognition to catch problems before they escalate.

That sounds like an AI written response. I’ve seen your last two posts follow the same pattern. Consider stopping your astroturf campaign.

The landing page design reminds me of Perplexity's ad campaigns. It's a clean look. I'd find your product more enticing if you framed your offerings more around evaluation + automatic optimization of production agents. There's real value there. The current selling points — trace sessions, track tool calls, measure token usage, and calculate costs — seem easily implementable at home with a bit of vibe coding.

I know your homepage isn't your business, but I'm bet Claude could fix the janky horizontal overflow on mobile in a prompt. Makes for a very distracting read

Will fix ASAP.

There's some serious irony in this thread.

The github link is also going to a 404.

I built a tool to check for these issues, was curious if it would find it all, but yes.

https://pagewatch.ai/s-bm6jq1qs6y1x/b560hmfx/dashboard/previ...


Agreed - fix fast. No way to take a tool seriously about taking care of production that has such a blatant production issue



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: