Hacker Newsnew | past | comments | ask | show | jobs | submit | gen220's commentslogin

This is genuinely an incredible proof-of-concept; the business implications of this demo to the AI labs and all the companies that derive a ton of profit from inference is difficult to understate, really.

I think this is how I'm going to get my dream of Opus 3.7 running locally, quickly and cheaply on my mid-tier MacBook in 2030. Amazing. Anthropic et al will be able to make marginal revenue from licensing the weights of their frontier-minus-minus models to these folks.


I do like the idea of an aftermarket of ancient LLM chips that still have tons of useful life on text processing tasks etc. They don't talk about their architecture much, I wonder how well power can scale down. 200W for such a small model is not something I see happening in a laptop any time soon. Pretty hilarious implications for moat-building of the big providers too.

Yea I mean this is the first publishable draft of a startup cooking on this.

I'm confident there are at least 1-2 OOMs of improvement to come here in terms of the (intelligence : wattage) ratio.

I really thought we were going to need to see a couple of dramatic OOM-improvement changes to the model composition / software layer, in order to get models of Opus 3.7's capability running on our laptops.

This release tells me that eventual breakthrough won't even be strictly necessary, imo.


The way I imagine it in 2-4 years we're going to be hit with a triple glut of better architecture, massive oversupply of hardware and potentially one or two hardware efforts like this really taking off. It's pretty crazy we're already 4 years in and outside of very niche / low availability solutions, it's still either GPU or bust

That's interesting! How do you see "oversupply of hardware" playing out?

Is it because we stop doing ~2024-style, large-scale training (marginal returns aren't worth it)? Or because supply way outpaces the training+inference demand?

AFAIU if the trend lines /S-curves keep chugging along as they are, we won't hit hardware oversupply for a long, long time without some sort of AI training winter.


I totally agree! Interacting with LLMs at work for the past 8 months has really shaped how I communicate with them (and people! in a weird way).

The solution I've found for "un-loading" questions is similar to the one that works for people: build out more context where it's missing. Wax about specifically where the feature will sit and how it'll work, force it to enumerate and research specific libraries and put these explorations into distinct documents. Synthesize and analyze those documents. Fill in any still-extant knowledge gaps. Only then make a judgement call.

As human engineers, we all had to do this at some point in our careers (building up context, memory, points of reference and experience) so we can now mostly rely on instinct. The models don't have the same kind of advantage, so you have to help them simulate that growth in a single context window.

Their snap/low-context judgements are really variable, generalizing, and often poor. But their "concretely-informed" (even when that concrete information is obtained by prompting) judgements are actually impressively-solid. Sometimes I'll ask an inversely-loaded question after loading up all the concrete evidence just to pressure-test their reasoning, and it will usually push back and defend the "right" solution, which is pretty impressive!


Yes, great you're sharing this in a bit of detail! I think I've been using a similar approach to getting solid decisions.

I opened up a claude code session using opus-4.6 medium thinking.

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Drive — you need the car at the car wash.

but it's so close

It is close, but you still need the car there to wash it! Drive it over, and enjoy the short 50-meter walk back if you want to stretch your legs while it's being cleaned.

I tried the "upside-down" cup question brought up in another comment in this thread, and it also nailed it:

Flip it upside down. The sealed top becomes the bottom (holding your drink), and the open bottom becomes the top you drink from.

IDK, maybe the web versions are not as good at logical reasoning as whatever they're using to power Claude code, or you were unlucky and I was lucky?


Same. Claude nailed both questions, with the slightest hint of "... u serious?"

I pay for the $100 Opus 4.6 plan... maybe that makes a difference?


For people trying to understand the product (so far), it seems that entire is essentially an implementation of the idea documented by http://agent-trace.dev.


Oaxaca and its surroundings are still identifiably Zapotec! The idea of an urban landscape that’s still culturally and aesthetically indigenous to such an extent is super mindbending to this gringo.

Mexico’s historical relationship to indigenous groups is incredibly complicated and problematic in its own ways, but it’s completely and frankly unimaginably different from the analogous relationships in the U.S. or Canada.


In my (admittedly conflict-of-interest, I work for graphite/cursor) opinion, asking CC to stack changes, and then having an automated reviewer agent help a lot with digesting and building conviction in otherwise-large changesets.

My "first pass" of review is usually me reading the PR stack in graphite. I might iterate on the stack a few times with CC before publishing it for review. I have agents generate much of my code, but this workflow has allowed me to retain ownership/understanding of the systems I'm shipping.


If you wanted a better version of GitHub Actions/CI (the orchestrator, the job definition interface, or the programming of scripts to execute inside those jobs), it would presumably need to be more opinionated and have more constraints?

Who here has been thinking about this problem? Have you come up with any interesting ideas? What's the state of the art in this space?

GHA was designed in ~2018. What would it look like if you designed it today, with all we know now?


I've been working for the last 5 years on an alternative called Dagger. Ended up building a company around it.

We started from the ideal state of CI, and set out to build the platform that would support that.

For us this ideal state of CI boils down to 4 things:

- local-first (local execution should be a first-class citizen, with no exception)

- repeatable (the same inputs should yield the same output, with affordances for handling inevitable side effects in an explicit and pragmatic way)

- programmable. my workflows are software. I want all the convenience of a modern software development experience: types, IDE support, a rich system API , debugging tools, an ecosystem of reusable components, etc.

- observable. I want all the information in one place about everything that happened in my workflow, with good tooling to get the information I need quickly, and interop with the existing observability ecosystem (eg. open telemetry)

So Dagger is our best effort at a CI platform focused on those 4 things.

Sorry if this comes across as a sales pitch. When you're building solves a problem you're obsessed with, it's hard to discuss the problem without also mentioning the solution that seems the most obvious to you :)


I've been watching Dagger with great interest, although have not moved production workloads to it (nor, admittedly, even committed an afternoon to setting up any workflows/graphs).

Passive comment readers should be aware that ^shykes here cofounded Docker (my gratitude), so it's really worth a look.

Can anyone comment on the ergonomics of Dagger after using it for a while?

I was just looking at the docs earlier this week to consider a migration but got confused by the AI sections...


> got confused by the AI sections...

You're not the only one... At some point last year, we discovered that CI/CD workflows and so-called "AI agent workflows" have a lot in common, and Dagger can in theory be used as an execution engine for both. We attempted to explain this - "great for CI/CD and for Agents!". But the feedback was mostly negative - it came across as confusing and lacking focus. So, we are rolling back this messaging and refocusing on CI/CD again. If you haven't checked our docs in the latest 12 hours, it's worth checking again: you'll see clear signs of this refocusing (although we are not done).

In doubt, I recommend joining our public discord server (https://discord.com/invite/dagger-io) it is basically a support group for CI/CD nerds who believe that a better way is possible, and want to discuss it with like-minded people.

Thanks for the kind words!


Do you have to use discord? All that information is locked away in a vendors system. Why not choose an open source chat app?


What alternatives would you recommend?


I've been working on this problem for the past couple of years. State of the art:

- local CLI instead of git push to run

- graph-based task definitions with automatic distributed execution, instead of the job/step abstraction

- automatic content-based caching to skip unnecessary executions (happens a lot in CI pipelines)

- container-based runtime (instead of proprietary base images) without using docker directly (too slow)

There are a lot of other ways to improve the developer experience. Happy to chat with anybody interested, I'm [email protected]


This is a core part of systemantics [0]! People are going to do what they’re going to do, as a manager the most you can do to help is to put people in the right teams and to get distractions out of their way.

It’s a difficult idea to accept but once you accept it, it’s kind of liberating. It follows that hiring and then work-assignments during roadmapping are the two points of highest leverage in making a mutually-successful employee-manager relationship.

The problem you’re solving there is a search problem. You’re trying to discover if the employee’s motivation landscape peaks in any dimensions that align with the roadmap. They can be the most skilled person in the world, but if the peaks don’t overlap, the project will never run smoothly. It also follows that in extreme cases where you have a tenured employee that you want to retain for future work, you should absolutely let them drive and shape the roadmap.

[0]: https://en.wikipedia.org/wiki/Systemantics


I read that book as I was 8 or 10! Must be still in my head!


Churchill was right!

We’ll try everything except for a land value tax, so that we can eventually prove once and for all that LVT is the right thing to do! :)

But actually, it’s good to see movement on the underlying problem (affordability of home ownership). This is The Domestic American Problem of our times, and it deserves to be closer to the center of the Overton window of our politics and policy-making.

Even if we think this step is kind of meaningless, it draws more attention to the problem, which is a good thing.


Can someone please explain to me a practical way to apply the LVT? Vancouver used to have an LVT, it was too low and there was a housing speculation bubble in the early 1900s, since property was appreciating much faster than the tax rate. And if the LVT is too high, then you will have very little new development. This isn't even mentioning how you determine the value of the land.

Denmark has an LVT and copenhagen affordability is... not good.


As far as I can tell, LVT only achieves what it sets out to do if it’s equivalent to market rent.

As in, you never really “own” your land, you’re just renting it from the sovereign. If you can’t make good enough use out of it to afford that rent, you should move on. You can find comments on this thread that make this argument explicitly in terms of “maximizing land use efficiency”.

This was the economic structure of feudalism. It … wasn’t great. Private ownership of land has its own tradeoffs but a few centuries of historical experimentation in both directions has been fairly decisive.


How is that LVT "rent" different from any other traditional property tax being "rent"?

As near as I can tell, it is just a different way of deciding how the property tax burden is levied.

Downtown property gets taxed much more. Un-developed speculation property that doesn't contribute to the community (and derives value from other people's contributions) get taxed at the same rate as nearby developed property.


Property taxes have to be set high enough to fund services: Voters want more services, they pay more property taxes. The policy goal is delivering services the voters want to households and businesses.

LVT is designed to achieve a different policy goal: Maximize the efficiency of land use. So its rates have to be set to achieve that goal and, for example, force grandma to move out of that condo in a newly revitalized downtown so a young tech kid who can pay more & benefit from it more can move in.


LVT is a tax on the value of the land specifically, not a traditional property tax. This encourages development on valuable land that is currently being put to unproductive uses.

For example, if you own a lot in a downtown metro which is a parking lot you pay low property taxes because parking lots have low property values. You are disincentivised to develop it because your property tax would go up. Opposite incentives with a LVT.


I understand that, but what should the actual rate of the LVT be? If the LVT rate is too high, nobody will want to develop that parking lot at all because the taxes outweigh the possible profit. And if they are lower than land appreciation, speculation is encouraged.


FYI, there's a .gov-maintained portal where healthcare companies in the U.S. are legally obliged to publish data breaches. It's an interesting dataset!

https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf


This is a suboptimal characterization of this site.

I think it would be less wrong to say this is where covered entities that discover reportable breaches of PHI (whether their own or that of a BA) that trigger the immediate reporting obligation report them.

This is a narrower scope of coverage and shallower depth of epistemic obligation than you implied.


One of my favorite HIPAA stories is about a doctor who utilized his patient list when sending out campaign-related information when he was running for local office. Over 2 decades of schooling and still didn't understand how stupid this was.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: