Hacker Newsnew | past | comments | ask | show | jobs | submit | pkiv's commentslogin

I'm working on rebuilding Playwright from the ground up, but focused on automation and self healing using LLMs.

It's called Stagehand (https://github.com/browserbase/stagehand) and we just released v3, which is a total rewrite.


Interesting.

I rewrote Playwright to run completely in a Chrome Extension without CDP or chrome.devtools for no practical reason at all. I started to do it like Forest Gump started running. It can't get past bot protection so pretty worthless from a browser automation point of view. [0]

What I don't understand is why the need to rewrite Playwright instead of just patching it. Playwright (or Puppeteer) has addressed every edge case that has come -- especially race conditions which are a monster to deal with -- up over the years and by the time you do the same you will have Playwright.

Why is rewriting or rebuilding Playwright from the ground up needed?

[0] https://github.com/adam-s/cordyceps/tree/main/pages/side-pan...


Very cool. I make a consulting business out of packaging selenium scripts into windows apps for small businesses, do you have any desire to turn this into a saleable product?


Stagehand is our open source project, but the company behind it is called Browserbase - https://browserbase.com/ where we run headless browser infrastructure as a service. So no interest at this point, Browserbase drives the revenue that funds Stagehand!


What type of stuff do people pay you to do?


It's almost 100% secretarial workflow automation. Someone is currently being paid to click, drag, etc. I collapse that into a single button and charge for the button.


To explain better, there's basically 3 jobs avaliable in this kind of work right now, all of which I've interviewed for and been not selected with varying degrees of frusturation.

(1) build a bunch of automation scripts that we can reliably run as part of our product. This is closest to my day-to-day, you're (mis?)using selenium or playwright to make a known set of "click here, fill that, click there" scripts, and then expose some interface that calls them. At a company usually this will be a fastAPI / microservice, and your set of automation scripts is part of what makes the product possible. In my case, I'm currently working on registering my Tkinter/Pyinstaller app with windows so people can run my .exe and click the "do the script" button. There's also a slightly different approach to the selenium/playwright jam that runs curl requests which mimic the network requests of performing some action. You'll be able to see what I mean by clicking into your browser devtools "network" tab and clicking around the web. Imagine having a library of known API calls, you save a users cookie, and run script to achieve result.

(2) build a browser agent that we can use to run all of our (1) scripts. This is stuff that I iterate on as needed, and in my experience is best done by having a bunch of "helper functions" that spawn/kill/season different selenium webdrivers. It will depend on whether your (1) scripts are dealing with JS-heavy websites, bot detection, or just constantly changing UI's, but the deep rabbit holes here end up in places where you're building your own browser agent on top of webkit/chromium, implementing some kind of captcha solver, or trying to automagically discover what buttons exist by fuzzing the API or DOM.

(3) use a (2) to get us every PDF we need for our upcoming RAG chatbot. This is something I do by executing on (1), and I only bother to note the difference because it's a great example of the kind of actualy end product goal that all of this leads to.

Academically, the problems happening in web automation broadly fall into discovery and reproduceability. Discovery, meaning API fuzzing (how do we get that library of known API calls? / the equivalent buttons in a DOM?); Reproduceability, meaning running-without-errors. (How do we wait for the target's server to be ready to send the next command? How do we avoid getting blocked? How do we detect/recover when the target updates their website?) The most interesting opportunity IMO is inserting an LLM to build a self-healing scraper, and the edges of your tes/prod environments will be defined by your product's tolerance for wrong/nondeterministic behavior. I've got a great blogpost draft about a "railroad model of software development", where an LLM is a hammer nondeterministically pounding in railway ties, and an end product is a deterministic piece of code that can have trains run over it all day long. (effectively, LLM as test-environment devtool thesis, I don't think I'm saying anything that hasn't been said before.)

Practically, the problems that are facing me as an engineer are in packaging these tools for sale/distribution. My current state-of-the-art is to wrap up a .exe with Pyinstaller, build a GUI with Tkinter, and register with Windows so I'm not showing scary "This program is made of evil" messages when people try to run it. From there the plan is to give away free trials and after a month of people clicking their magic buttons they all disable and demand you purchase a license to re-enable (like if winRar was evil, but sorry yall I gotta eat). I'm also trying to sell building these tools as a service but that's very word of mouth, I haven't found a viable web/storefront model for that yet.

IM-Practically, most companies with a browser automation component are struggling with the same HR/Onboarding issues everyone else is. In the past ~2 months I've Interviewed at ~4 serious companies that profit from their browser automation, and every process has been unique: 1) firecrawl.dev; a black mirror hourlong AI-chatbot-zoom-interview, followed by a human call scheduled 3 weeks out, only to then be told that really they want someone fully specialized on breaking captchas, with a vaugely condescending suggestion that I'm "customer facing" and no followup when I lean into that 2) atomic.financial; actual-human-zoom-calls with engineers who I get along with great only to have no idea why I'm getting a rejection email the next week 3) sheer.health; a very contentious first call that demands I name a salary, followed by a trivially easy take-home test, followed by my 3rd round, 1/2 hour call with the CEO being cancelled the morning of because they filled the role. 4) Mozilla; where a principal/staff engr cold DM'd me, to schedule a call with an HR rep that told me they're paying 350k base 420k total and another staff eng who's leaving to start a startup, only to then tell me I'm not senior enough but could maybe come on as a contractor, only to then tell me they're using internal resources for the contractors.

Overall, I think the best opportunity in the space is going to look something like https://ui.vision/, which is an open-source tool!


Hey there! Founder of Browserbase here. If you're seeing any abuse from one of our customers we'd love to hear more. Mind sending more info to [email protected]?


Congrats on the launch guys!


Really love the decoupling of the logic and the runtime for the actual tool calls.


Browserbase | Multiple Roles | San Francisco | ONSITE

We're building infrastructure that enables developers and LLMs to programmatically interact with the web using our hosted headless browsers.

A headless browser is just like the browser you're using right now, but running on a server. Running a single one isn't too bad, but running many of them becomes a complex exercise in stateful, distributed systems. We handle that, as well as provide great observability and other helpful features (like cookies management) to make developer's lives easier.

While the infrastructure product is how we make money, we also maintain Stagehand (https://github.com/browserbase/stagehand), the AI-powered successor to Playwright. We built it to show how you can use LLMs to build dynamic web automations that don't depend on deterministic code.

We're looking for developers who are interested in working on products that cater to other developers. You're an especially good fit if you're experienced in cloud infrastructure or distributed systems. We're a team of 10, full time in-person in-SF. You can learn more about our work culture here: https://x.com/pk_iv/status/1860762063490158642. We have product-market-fit, and have raised $27M from Kleiner Perkins, CRV, and Okta Ventures.

I've hired several people from HN and I'm excited to continue meeting great people like yourself!

Open roles are here: https://browserbase.com/careers You can also email at [email protected]


I’d recommend checking out Stagehand if you want to use something that’s more AI first! It’s like the AI powered successor to playwright: https://github.com/browserbase/stagehand

(I am one of the authors!)


If you're open to it, I'd love to hear what you think of what we're building at https://browserbase.com/ - you can run a chrome extension on a headless browser so you can do the semantic markdown within the browser, before pulling anything off.

We even have an iFrame-able live view of the browser, so your users can get real-time feedback on the XPaths they're generating: https://docs.browserbase.com/features/session-live-view#give...

Happy to answer any questions!


This is super neat and I think I've seen your site before :)

Do you handle authentication? We have lots of users that want to automate some part of their daily workflow but the pages are often behind a login and/or require a few clicks to reach the desired content.

Happy to chat: [email protected]


You must get a lot of test emails to that FANTASTIC gmail address. Funny how it might even be worth some decent money.


That's not literally his e-mail :D. He means that you have to replace it with his HN username. It would have been better to write it like this: [HN username]@gmail.com


Personally I thought it was a LLM reply to a LLM marketing post to fake engagement. Lol


Instructions unclear, here's a haiku about faking engagement:

Beneath the deep waves,

False likes in shadows do dance,

Submarine ploys drift.


Hahaha okay I feel dumb now.


Well if you're dumb then we're dumb.


I'm also curious about this! I've been learning about scraping, but I've had a hard time finding good info about how to deal with user auth effectively.


You login and grab the session and save it. Then you mount the session to the requests.


Am I correct that the use case of doing this is 1. Scale and 2. Defeating Cloudflare et. al?

I do scraping, but I struggle to see what these tools are offering, but maybe I'm just not the target audience. If the websites don't have much anti-scraping protection to speak of, and I only do a few pages per day, is there still something I can get out of using a tool like Browserbase? I wonder because of this talk about semantic markdown and LLMs, what's the benefit between writing (or even having an AI write) standard fetching and parsing code using playwright/beautifulsoup/cheerio?


Awesome product!

I was just a bit confused that the sign up buttons for the Hobby and Scale plans are grey, I thought that they are disabled until randomly hovering over them.


Good feedback! We'll take a look.


I don't see any difference than browserless?


The price and the dashboard are a great start :)


Romania is missing from the list of phone number countries on signup, not sure if on purpose or not.


Congrats on the launch!! Collaborative browsing is something I've been looking for a few use cases of mine. Excited to try it out.


The supabase team always delivers. Excited to give this a try!


If you want to build it yourself, you could try using https://browserbase.com/. We offer managed headless browsers work everywhere, every-time. It costs $0.10 per browser session/hour (billed minutely). Feel free to shoot me an email if you want access! [email protected]


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: