In case you’re not familiar, I will point you to the classical program synthesis literature. There the task is to take a spec written in say first-order logic, and output a program that satisfies this spec.
I think the biggest barrier to adoption of program synthesis is writing the spec/maintaining it as the project matures. Sometimes we don’t even know what we want as the spec until we have a first draft of the program. But as you’re pointing out, LLMs could help address all of these problems.
The 2.5D rendering gives a lot of opportunities to visually obscure the insight that’s necessary to solve the maze. It makes me think that a good maze should have a “oh, duh” moment. There were a few times where the false assumption I had to resolve was very close to the starting point. Tricky stuff.
And you designed all of these by hand? That is very impressive. How many mazes are there? The way I would start automating some of this would be to build a catalog of these visually tricky blocks that require the player to resolve a false assumption. Then I think it would be a matter of stitching these together in novel ways. Maybe the stitching procedure can be implemented by expressing a constraint system and solving for a stitching that has the properties you want.
I’m on a touch screen and I would say that the movement is a little bit sensitive. Concretely this means I found myself… going in a direction I didn’t intend to. Maybe this is a skill issue on my part. I don’t have alternative controls to suggest and I probably don’t understand the mechanics of the movement enough to even suggest which parameters to tweak.
Thanks! I keep creating new mazes by hand, right now I'm at 45 and the goal is 60, plan is that then the game is finished :)
Initially, I wanted to create a maze generator. But I had no idea how to write it. Creating mazes by hand, I'm slowly learning how to make them work (I've gotten waaay better than I was at the beginning). So, perhaps in the future, a maze generator that generates a daily maze seeded by the current date. Idea is to start with a simple maze every Monday and gradually increase the difficulty over the week. But I'm getting a little carried away: the more I think about it, the more I don't know how to create the maze generator.
Wrt sensitive controls: not sure which version you played, but I just reworked the controls a whole lot to automove to the next junction when you stop touching. It took my LLM and me a lot of time, mostly because I was searching in the dark, not entirely sure what I wanted. The controls are by far the least legible part of the codebase for me!
For me the danger of AI is that it enables the surveillance state through facial recognition and the instantaneous aggregation of all my data. For "national security" reasons, I may be detained and denied of my rights if Palantir hallucinates. Who do I sue if Palantir decides I am an illegal?
The thing is a government never needed technology to be authoritarian. The government today already has all the tools to ruin your life. It had them in 1940. It had them in 1840 and it had them in the year 40 as well. And that tool is known as the monopoly on violence. It can be wielded in many ways good and bad.
> The thing is a government never needed technology to be authoritarian. The government today already has all the tools to ruin your life. It had them in 1940. It had them in 1840 and it had them in the year 40 as well. And that tool is known as the monopoly on violence. It can be wielded in many ways good and bad.
Not to the same extent. An army of humans is obedient up to a point, but there is a limit to what orders you can give them. When the officers are algorithms that limitation is a lot weaker.
It's more that in the past widespread surveillance required a lot of people, many of whom will have a conscience which will end up disrupting your surveillance.
The movie Das Lieben der Anderen makes this point very cogently.
Nowadays you can run a huge surveillance program with far fewer people, all of whom can be conscience free.
Im not sure how the next stasi will crumble but it'll be a lot harder to wrest them from power with the tools they have at their fingertips.
You're confusing autocratic with authoritarian. Total war reached its most recent zenith in the 20th century. If governments have always been able to control people to the same degree, why was not until Napoleon that we saw the beginnings of nationalism? I say this rhetorically, as it is quite obvious that it was technology and industrialization. When we look at ancient Empires and see their territory on a map it would be much more accurate to only highlight population centers not the entirety of the land. Illiterate farmers, who made up the majority of the world, resided in small towns and villages and their daily lives were largely unaffected by conquerors.
There was nationalism pre napoleon. Arguably east asia is a better example than european history IMO. I would say there is strong sense of nationalism among han chinese both now and in history. Likewise for Japan and Korea. Pre islam Persia as well. I guess the source of this was consistent centralized authority over a large region vs any technological change. You had that in east asia. You didn't have that in europe after roman times. Even larger empires like kingdom of spain were not really seen as "spain" as we know it but a unified monarchy over the kingdoms of castile, leon, aragon, sicily, and napoli. Interestingly you didn't really have that in india either, no one controlled the continent until mughal times and by then the religious and cultural regional differences were pretty set in stone.
India is a great example of how relatively recent technology was required to finally unite and control a people. One can also just observe urbanization, capitalism, communication mediums(media). While China is unique for its cycles of unity and then disunity. These kingdoms were also dynastic and worshipped the emperor as a god. Such ways of government are a justification for ruling which supersedes the need of a national identity.
> The government today already has all the tools to ruin your life. It had them in 1940. It had them in 1840 and it had them in the year 40 as well. And that tool is known as the monopoly on violence.
There are a couple of problems with this:
1. As a matter of raw empirical fact, a government around the year 40 wasn't too likely to possess a monopoly on violence.
2. A monopoly on violence isn't necessary to ruin your life. A simple nonexclusive license, which governments of the period did have, is sufficient.
Yea, and they were way more successful at it in 1940 than 1840. Are you accounting for all the times they tried to enforce their authority but ultimately failed?
> And that tool is known as the monopoly on violence.
No one has a monopoly on violence. What they really have is called "qualified immunity."
In this particular instance, though, their violence is particularly enabled by cheap technology and computing power.
Ot worse because it didn't hallucinate, and they are coming for you, as a free thinking "radical". They can tell from a long deleted blog post you made in 2005 about green energy.
Why bother with all that though? Just ask them to do their job for the party. If they don't, or you suspect they don't align with the party, you just execute them. Don't need tech for this. The tech is just for some people to get rich, not to really enable any new evil that can't already be achieved today with pen and paper and bullet (as modeled extensively in the last century).
Put it this way, if Hitler had grok, would it really get any worse for the Jews? I don't think so. I think they would be screwed no matter what.
> if Hitler had grok, would it really get any worse for the Jew
Not grok specifically, but yes.
The holocaust in the Netherlands was remarkably bad in large part because the Dutch administration was so well-organized and had kept a registry of Jews.
Bad guys are going to use this technology to evil ends if given the chance.
BTW, there's a chilling alternate history novel called NSA by German author Andreas Eschbach about precisely that kind of idea. The premise is that computer science progressed a lot more quickly. The book opens with German data scientists in the 1930s combining census and financial transaction data (i.e., food purchases using electronic cash) to identify households that are hiding Jews and other "undesirables" .
Because you can't do the Nazi Germany thing these days. I mean... disgust aside, it kinda failed. But you can spy on people under "national security" while keeping them feeling happy enough. And that arrangement can last 1000 years.
Still not convinced that AI is offering anything new here. Especially when the statistics you'd reach for are often like 100 years old or more. Bayes theorem is older than the united states. I think among lay people there is a lot of conflation between AI and statistics, and also a lack of understanding of the state of that field and how mature it is. Nazi Germany of course heavily used statistical modeling and even contracted with IBM to quantify Jewish populations.
This your point of view is kind of silly when you think about it. They used the modeling going after jews, but going after the people that were German but hid jews was much more difficult. With moden AI/statistical modeling they'd take all those people too.
Most of Nazi Germany is after the fact revision. They were popular around the world in the 1930s - for their plan to deal with the Jews. It is only after they went to war that we decided they were bad for that plan as well. (some people were opposed to the plan all along, but there were plenty who were in favor of it)
Even the Allies hated the Jews. They just had a different plan to get rid of them. Instead of gassing them, the Allies expelled them to Palestine, so they'd be someone else's problem
> Because you can't do the Nazi Germany thing these days. I mean... disgust aside, it kinda failed.
It failed because Nazi Germany was not militarily superior to combination of the nations that it got upset with it externally, not because of any internal failure of control. While its nice to think that Nazi Germany “failing” somehow disproves the viability of the same broad kind of one-party, massacre-the-opposition totalitarianism, it isn't really justified.
The last line of GP's comment is key here: "Who do I sue if Palantir decides I am an illegal?"
This shouldn't make as much of a difference as it does, but due to how our legal system works, it's much harder to get meaningful legal satisfaction when an algorithm (or other inhuman distributed system) commits a crime against a person than when a person does so.
I think you're confused about the mechanism involved. It's hard to get satisfaction due to e.g. qualified immunity. The fact they use technology is largely irrelevant. You couldn't sue the NSA for spying on you before AI either.
If we assume they are on quotas then what difference does technology make? They had quotas before the technology, qualified immunity too. 100 false arrests with no recourse are 100 false arrests with no recourse.
If anything I would expect technology favors the victim of false arrest because it gives the cop a face-saving get out. Previously, a cop who false-arrested you would have been incentivized to take it all the way, because you getting justice for it is intrinsically tied to impugning their word and/or reputation.
We define determinism as a model behaving predictably, while also producing useful supporting metadata, like confidence scores from specialized DNNs/CNNs, not just text tokens generated as "scores".
So for the same kind of task, you can expect the same kind of output every time, without randomly breaking structured output or having to constantly change generation hyperparams.
Lets solidify definitions. A procedure is deterministic iff for all inputs, it always produces the same output on that input.
Now, I am going to be pedantic because words matter here. I agree with the author that LLMs have downsides that can be addressed with _symbolic_ tools. But _determinism_ has very little to do with this.
> LLMs by nature are non-deterministic
This is false. LLMs are functions. All appearances otherwise are an artifact of how we use them.
This fact already suggests that determinism isn’t (entirely) what you want. Because even if you _could_ use LLMs as functions (I admit you can’t always do this with frontier models), that wouldn’t make you happy.
> I want the output to always be predictable based on the behavior of the program and provided configuration.
Here, I will argue that predictability here is divorced from determinism. You want an output that has a certain semantic relationship with the input. E.g., if you give a spec as input, you want a program that satisfies this spec.
Here it should be obvious that getting the same output on the same input is not very important. Who cares if the arguments to the function are renamed? Who cares if the function is implemented differently but essentially does what the spec asks?
I argue the _only_ thing that matters is that the output satisfies the intended relationship with the input. And this is orthogonal to determinism.
Edit:
It looks like the author has the same realization:
> And while even this example shows how differently an LLM responds to the same query, it ends up producing a more reliable output.
Hey, @derrak! Thanks for the feedback! I appreciate your constructive criticism of the post.
> > LLMs by nature are non-deterministic
> This is false. LLMs are functions. All appearances otherwise are an artifact of how we use them.
I completely agree that LLMs are functions, and that they are essentially programs like any other program is. What I am trying to illustrate is that these functions are not pure. They are full of side effects. And thereby produce different results based on a large number of factors.
> This fact already suggests that determinism isn’t (entirely) what you want. Because even if you _could_ use LLMs as functions (I admit you can’t always do this with frontier models), that wouldn’t make you happy.
What do you mean here by "even if you could use LLMs as functions, that wouldn't make you happy"? What do you imply by using them as functions, and by making one happy?
> Here it should be obvious that getting the same output on the same input is not very important. Who cares if the arguments to the function are renamed? Who cares if the function is implemented differently but essentially does what the spec asks?
I would disagree here. Renaming arguments or implementing the function differently, doesn't negate the fact that I still want "pure" output from a function. I don't want to have a function that once in a while produces 2+2=5 instead of 4. I want it to always be the same: two arguments (2, 2) should always result in the same output (4).
> I argue the _only_ thing that matters is that the output satisfies the intended relationship with the input.
I agree with you here, but with one addition - it "always" satisfies the relationship with the input. As long as it always does that, that is deterministic behavior.
The definition of deterministic algorithm is as follows: "A deterministic algorithm is an algorithm that, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states." This is what I am essentially talking about and seeking in my interactions with AI systems.
> You’re going to stop naturally so carry on smoking as usual until then.
> Avoid cutting down beforehand because reducing cigarettes often makes them seem more valuable rather than less. Setting a quit date helps you prepare mentally and physically to stop smoking on that day.
> As a countermeasure to key disclosure laws, some personal privacy products such as BestCrypt, FreeOTFE, and TrueCrypt have begun incorporating deniable encryption technology, which enable a single piece of encrypted data to be decrypted in two or more different ways, creating plausible deniability.
Doesn’t this make the whole idea of legislating this a non-starter?
On the issue of “are LLMs good at lisp” I have a bit of a tangential response/observation.
I saw this [paper](https://ai.meta.com/research/publications/logic-py-bridging-...) awhile ago. Long story short they made a python looking DSL for LLMs to convert natural language logic puzzles to. Then they converted the DSL expression to something a SAT/SMT solver could munch on.
My initial reaction was “why don’t they just have the LLM write smtlib2.” And I guess the answer is that LLMs are probably better at writing python-looking smtlib2. Probably an oversimplification of their work on my part. But I didn’t see any comparison between their work and a direct encoding into smtlib.
Makes me wonder if your idea could work along similar lines. Instead of using lisp directly, could you use a DSL that looks like more traditional languages? Would that help?
Interesting, I'm not familiar with that paper, but I guess the performance gain comes from raising the abstraction level (hiding solver boilerplate).
PTC-Lisp could achieve the same thing by defining constraint-building functions as tools, and the LLM writes high-level PTC-Lisp that calls a solver. In fact,
Lisp has a natural advantage here: instead of building a separate DSL with its own compiler (as I guess Logic.py did), Lisp can extend itself with macros — code is data, data is code - however this is not impl. yet in ptc-lisp.
I think the translation from L0 to L1 is going to become more and more important. There have been a lot of discussions here on HN about how natural language specs “aren’t code” and how LLMs provide no formal relationship between their inputs and outputs.
One way to side step this is to have the LLM translate the NL into a formal language and persuade the human that the formal language captures their intent. This reduces the burden because the user only has to look at and understand the formal language spec, rather than all the code produced by the LLM.
Also once a formal spec is obtained, you can do lots of interesting computation on it. Property based testing comes to mind. Or even full-blown verification of the formal spec. Or, LLMs might be good at recognizing ambiguity. An LLM could generate two formal specs, use an SMT solver to find an input where the specs differ, and help the user use this diff to ask clarifying questions and resolve the ambiguity.
One comment I have is that layers L1 and L2 _might_ be reinventing the wheel slightly. Your ensure statements remind me of Dafny or Verus, for instance, which have a lot of tooling behind them.
Indeed, both Verus and Dafny are very close to home here — thanks for the pointer. I did feel at times that I might be reinventing the wheel a bit.
I also thought about pushing more toward Lean and theorem proving instead of a lighter SMT-style direction, but programming and mathematics diverge quickly unless that abstraction boundary is handled carefully. To me, though, that mostly concerns the L1 -> L2 step.
L2 is still the least settled layer and the one I am actively exploring. The idea there is to let relations acquire semantic interpretations: this relation may be realized as a class, an embedding, a parser, etc. Since there is often more than one valid implementation route, I want that choice to be explicit.
And yes, part of the point is exactly to constrain and verify that the end result still matches the original intent at some IR/formal level.
So I am beginning to think the layering does offer something beyond plain Markdown specs. L1, at least as I currently see it, is intentionally very small: just types and relations, in a logic-oriented style somewhat closer to Prolog — or even SQL — where the world is modeled relationally and constraints are expressed over that. Essentially having more than one targets on L2 level (at the same time in one module) will help to generate both the code and the semantical verification. That's actually a very fruitful observation - big thanks!
> I also thought about pushing more toward Lean and theorem proving instead of a lighter SMT-style direction
I think your intuition here is good. For these settings I think you want formal methods that are highly automated.
Question about your calculator example: is the intended use case that the user would write these L0-L2 files? Or is it expected that an LLM would write them with user intervention? And the go program, how is that obtained from L2? Is that a symbolic transformation or is the LLM doing it?
Apologies if this is written somewhere and my skimming missed it.
> The proofs stop at the language boundary. The bugs don’t.
In formal verification, you have to model everything you care about. I suspect we’ll see large fragments of popular languages being more thoroughly modeled in languages like Dafny and Lean.
An alternative that side steps all of this is to not use an external language at all. ACL2 might be a better fit in this regime than Dafny or Lean because of how close it sits to SBCL.
I suspect the same but I wonder how effective it will be in reality. I mention a group that has been trying to verify SQL for 12 years, and they haven't fully formalised all basic queries.
I think LLMs will speed things up but will it result in popular libraries becoming verified or an entirely new class of libraries being built and verified from the ground up?
I do use ACL2, although I don’t do many proofs. When I do, the proofs usually go through automatically or require me to state only a few lemmas or tell the tool how to do the induction.
This is partially a commentary on how good the automation is in ACL2 and partially a commentary on the fact that I don’t use it for hard problems :)
I use it more for property based testing. The counterexample generator is very powerful because it leverages the knowledge base of the theorem prover (all the accumulated lemmas, etc.) among other things.
> The counterexample generator is very powerful because it leverages the knowledge base of the theorem prover (all the accumulated lemmas, etc.) among other things.
That's a really interesting application. Could be very powerful for what I'm doing. Thank you!
Happy to help. In general, I think counterexample generation is more important than proofs when it comes to software (most of our software is “trivially” wrong). The world might not be ready for full-blown formal verification, but I think it is ready for formal specification and counterexample generation.
I think the biggest barrier to adoption of program synthesis is writing the spec/maintaining it as the project matures. Sometimes we don’t even know what we want as the spec until we have a first draft of the program. But as you’re pointing out, LLMs could help address all of these problems.
reply