This is most likely because getting a SaaS software to conform to federal regulations and to promise the security needed by the US military is difficult and expensive. FedRAMP is onerous.
And LLM products Are new-ish. It suggests that Anthropic made federal government contracts a priority while OpenAI, Alphabet, AWS didn’t.
They always focused on the safety. (Their own safety). They only backed off from us military once they were in the bad press. As usual, they are not an ethical company. I can’t say it’s bad as all corporations are the same. Just don’t look at the illusion they create.
If you look at my post history you can see I’m always calling them out about how sketchy they are.
It's a little weird, too, because Claude definitely isn't the only one approved for use on classified systems in general; both Grok and OpenAI have models approved, at the very least.
The US is no longer a reliable partner. Once the current administration is gone the likelihood of US support is less than guaranteed. Even with this administration in place support is less than guaranteed. All it takes is the right moment to set off a tantrum and friends become enemies. Israel really doesn't have allies so much as accomplices and that type of friend only sticks around when it helps them.
This is why gifts to government are problematic. They are never gifts, they are end-runs around accountability and should have exceptionally high scrutiny. It is hard to say they should be outright illegal since participating in government often blurs the line between gifting government and just normal participation. This though is clearly just an end-run around democracy.
This particular gift is also problematic because Horowitz is an investor in Flock, and Horowitz's family foundation is spending money on Flock that will in turn increase its value, which benefits him as an investor. Of course lawyers will have looked at all this to make sure it doesn't run afoul of self-dealing rules, but that just means the rules verge on uselessly weak.
I'm a dreamer, and a bit of a sucker for the underdog story. This checks all the boxes for me so I am rooting for them for many reasons. If this is real then the world just jumped 10 years ahead in battery development overnight. If not then we will probably remember this in the same way we remember cold fusion and superconductor claims. There is a place for hope with skepticism, and right now this story is it in that place!
I think the key here is engagement is based a lot on content quantity, not quality. If your feed doesn't have a lot of natural quantity associated with it then FB will find something to stuff in there. The reality is that most people don't have a lot of quantity on their feeds from their friends so that means they get the AI slop to fill the void. At least that is my complete guess on a root cause of (some) of the FB slop. I haven't logged in for 6 months and I am now checking it 1-2 times a year because the last few times I logged on it was pushing hate content at me.
You will never agree 100% with someone else when it comes to decisions like this, but clearly there is a lot of history behind these decisions and they are a great starting point for conversations internally I think.
The more intelligent something is, the harder it is to control. Are we at AGI yet? No. Are we getting closer? Yes. Every inch closer means we have less control. We need to start thinking about these things less like function calls that have bounds and more like intelligences we collaborate with. How would you set up an office to get things done? Who would you hire? Would you hire the person spouting crazy musk tweets as reality? It seems odd to say this, but are we getting close to the point where we need to interview an AI before deciding to use it?
My argument is that we are getting closer, not that we know exactly what AGI will be. That is clearly part of it right? If we had some boolean definition I suspect we would already be there. Figuring it out is a big part of getting there. I think my points still stand based on this. We aren't there yet but it is hard to deny that these things are growing from a complexity/capability standpoint. On a spectrum from rock to human level intelligence, these are getting closer to human and further from rock and getting further from rock every day.
I fled SF and I know a bunch of similar people. Startups are still founded there for the address, not the local talent pool. The address is there because of inertia, not because of inherent advantage. If I were to create a startup I wouldn't even consider doing it in SF now. It is a waste of money that could be put towards the idea. The US is clearly on an ant-intellectual path. People default to here because of inertia but every attack on immigrants, every high level decision based on quack science and personal gain and every attack on our institutions supporting the development of the next generation is putting inertia elsewhere. It is clear as day that the US is only keeping any kind of advantage right now due to inertia and threat and not innovation and effort.
Once a wise man twice a fool....I'm a fool :) My most humbling experience was making it to the top, exhausted but happy only to see a jogger reach the top, look at his smart watch briefly and start back down again. Jerk.
> only to see a jogger reach the top, look at his smart watch briefly and start back down again
Oh hey, I met that guy. He stopped his run down to point me back to the trail after I wondered off having lost the markers somewhere on my evening ascent between the 7th and 8th station on the Subashiri trail. Made a joking comment about his not staying for the sun rise -- he already caught it on his first run up earlier that day...
A good wind-breaker and glasses/face covering are pretty nice to haves. Even a little wind accelerates as it hits the mountain and picks up the tephra and turns it into a sand blaster. I just took water and yen personally since the numerous huts along the way will sell you food (and burn your stick for you). Both times I started in shorts and a t-shirt and by 8? 8.5? I switched into pants with wind/rain gear over it. There isn't anywhere to change, I just put things over my shorts (no bad American moments I hope!). I ended up blowing out my sneakers on the way down one time though. That tephra is seriously like sandpaper and it ripped the tread off one shoe. I was lucky the rest survived long enough to make it down. Honestly, down was in many ways harder than up. No huts, you are tired and it is still very steep. Totally worth it though!
I think I was wearing vibram fivefingers (it was 2012ish, they were cool!) when I did it, from 0 meters, as I started at the ocean. I had a little hip-bag with some rice balls and water. It was about 23 hours to reach the top. My accompanying friend did the whole thing in barefeet.
This reminds me of the old brain-teaser/joke that goes something like 'An airplane crashes on the boarder of x/y, where do they bury the survivors?' The point being that this exact style of question has real examples where actual people fail to correctly answer it. We mostly learn as kids through things like brain teasers to avoid these linguistic traps, but that doesn't mean we don't still fall for them every once in a while too.
That’s less a brain teaser than running into the error correction people use with language. This is useful when you simply can’t hear someone very well or when the speaker makes a mistake, but fails when language is intentionally misused.
> This is useful when you simply can’t hear someone very well or when the speaker makes a mistake
I have a few friends with pretty heavy accents and broken English. Even my partner makes frequent mistakes as a non native English speaker. It's made me much better at communicating but it's also more work and easier for miscommunication to happen. I think a lot of people don't realize this also happens with variation in culture. So even within people speaking the same language. It's just that the accent serves as a flag for "pay closer attention". I suspect this is a subtle but contributing problem to miscommunication on the and why fights are so frequent.
I'm actually having a hard time interpreting your meaning.
Are you criticizing LLMs? Highlighting the importance of this training and why we're trained that way even as children? That it is an important part of what we call reasoning?
Or are you giving LLMs the benefit of the doubt, saying that even humans have these failure modes?[0]
Though my point is more that natural language is far more ambiguous than I think people give credit to. I'm personally always surprised that a bunch of programmers don't understand why programming languages were developed in the first place. The reason they're hard to use is explicitly due to their lack of ambiguity, at least compared to natural languages. And we can see clear trade offs with how high level a language is. Duck typing is both incredibly helpful while being a major nuisance. It's the same reason even a technical manager often has a hard time communicating instructions. Compression of ideas isn't very easy
[0] I've never fully understood that argument. Wouldn't we call a person stupid for giving a similar answer? How does the existence of stupid mean we can't call LLMs stupid? It's simultaneously anthropomorphising while being mechanistic.
I was pointing out humans and LLMs have this failure mode so in a lot of ways it is no big deal/not some smoking gun that LLMs are useless and dangerous, or at least no more useless and dangerous than humans.
I personally would stay away from calling someone, or an LLM, 'stupid' for making this mistake because of several reasons. First, objectively intelligent high functioning people can and do mistakes similar to this so a blanket judgement of 'stupid' is pretty premature based on a common mistake. Second, everything is a probability, even in people. That is why scams work on security professionals as well as on your grandparents. The probability of a professional may be 1 in 10k while on your grandparents it may be 1/100 but that just means that the professional needs to get a lot more phishing attempts thrown at them before they accidentally bite. Someone/something isn't stupid for making a mistake, or even systemically making a mistake, everyone has blind spots that are unique to them. The bar for 'stupid' needs to be higher.
There are a lot of 'gotcha' articles like this one that point out some big mistake an LLM made or systemic blind spot in current LLMs and then conclude, or at least heavily imply, LLMs are dangerous and broken. If the whole world put me under a microscope and all of my mistakes made the front page of HN there would be no room left for anything other than documentation of my daily failures (the front page would really need to grow to just keep up with the last hour worth of mistakes more than likely).
I totally agree with the language ambiguity point. I think that is a feature and not a bug. It allows creativity to jump in. You say something ambiguous and it helps you find alternative paths to go down. It helps the people you are talking to also discover alternative paths more easily. This is really important in conflicts since it can help smooth over ill intentions since both sides can try to find ways of saying things that bridge their internal feelings with the external reality of dialogue. Finally, we often really don't know enough but we still need to say something and like gradient descent, an ambiguous statement may take us a step closer to a useful answer.
> I personally would stay away from calling someone, or an LLM, 'stupid' for making this mistake because of several reasons.
I wouldn't. Because there's a difference between calling someone's action stupid and saying that someone is stupid. These are entirely dependent upon the context of the claim. Smart people frequently do stupid stuff. I have a PhD and by some metric that makes me "smart" but you'll also see me do plenty of stupid stuff every single day. Language is fuzzy...
But I think responses like yours are entirely dismissive at what's being attempted to be shown. What's being shown is how easily they are fooled. Another popular example right now being the cup with a sealed top and open bottom (lol "world model"?).
> There are a lot of 'gotcha' articles
The point isn't about getting some gotcha, it is about a clear and concise example of how these systems fail.
What would not be a clear and concise example is showing something that requires domain subject expertise. That's absolutely useless as an example to everyone that isn't a subject matter expert.
The point of these types of experiments is to make people think "if they're making these types of errors that I can easily tell are foolish then how often are they making errors where I am unable to vet or evaluate the accuracy of its outputs?" This is literally the Gell-Mann Amnesia Effect in action[0].
> I totally agree with the language ambiguity point. I think that is a feature and not a bug.
So does everybody. But there are limits to natural language and we've been discussing them for quite a long time[1]. There is in fact a reason we invented math and programming languages.
> Finally, we often really don't know enough but we still need to say something and like gradient descent, an ambiguous statement may take us a step closer to a useful answer.
Was this sentence an illustrative example?
Sometimes I think we don't need to say something. I think we all (myself included) could benefit more by spending a bit longer before we open our mouths, or even not opening them as often. There's times where it is important to speak out but there are also times that it is important to not speak. It is okay to not know things and it is okay to not be an expert on everything.
> This is literally the Gell-Mann Amnesia Effect in action.
Absolutely! But there is some nuance, here. The failure mode is for an ambiguous question, which is an open research topic. There is no objectively correct answer to "Should I walk or drive?" given the provided constraints.
Because handling ambiguities is a problem that researchers are actively working on, I have confidence that models will improve on these situations. The improvements may asymptotically approach zero, leading to ever increasingly absurd examples of the failure mode. But that's ok, too. It means the models will increase in accuracy without becoming perfect. (I think I agree with Stephen Wolfram's take on computationally irreducibility [1]. That handling ambiguity is a computationally irreducible problem.)
EWD was right, of course, and you are too for pointing out rigorous languages. But the interactivity with an LLM is different. A programming language cannot ask clarifying questions. It can only produce broken code or throw a compiler error. We prefer the compiler errors because broken code does not work, by definition. (Ignoring the "feature not a bug" gag.)
Most of the current models are fine-tuned to "produce broken code" rather than "compiler error" in these situations. They have the capability of asking clarifying questions, they just tend not to, because the RL schedule doesn't reward it.
Producing fewer "Compiler errors" and more "broken code errors" is a fundamental failure. The cost of detecting compiler errors is lower than detecting broken code. If the cost of detecting and fixing broken code increases at the same rate as LLMs "improve" then their net benefit will remain fixed. I asked my five year old the above "brain teaser" and he got it right. I did a follow up of what should he wash at a car wash if he walked there, he said, "my hands." Chat answered with more giberish.
I agree it is a fundamental failure of the current state of models. I believe it is solvable. The nuance is just that "solving" the problem might not look like what we think of as a solution. Hence the asymptote.
I hadn't realized. This does make me consider using alternatives more.
reply