The 20b solved the wolf, goat, cabbage river crossing puzzle set to high reasoni...

mgoetzke · 2025-08-06T11:29:54 1754479794

Try changing the names of the objects. eg fox, hen, seeds for examples

aspect0545 · 2025-08-06T10:40:47 1754476847

But was it reasoning or did it solve this because it was parting it‘s training data?

CMay · 2025-08-06T18:08:36 1754503716

Maybe both? I tried using different animals, scenarios, solvable versions, unsolvable versions, it gave me the correct answer with high reasoning in LM Studio. It does tell me it's in the training data, but it does reason through things fairly well. It doesn't feel like it's just reciting the solution and picks up on nuances around the variations.

If I switch from LM Studio to Ollama and run it using the CLI without changing anything, it will fail and it's harder to set the reasoning amount. If I use the Ollama UI, it seems to do a lot less reasoning. Not sure the Ollama UI has an option anywhere to adjust the system prompt so I can set the reasoning to high. In LM Studio even with the Unsloth GGUF, I can set the reasoning to high in the system prompt even though LM Studio won't give you the reasoning amount button to choose it with on that version.

ben_w · 2025-08-06T11:09:35 1754478575

Allow me to answer with a rhetorical question:

S8O2bm5lbiBTaWUgZGllc2VuIFNhdHogbGVzZW4sIGRhIGVyIGluIEJhc2UtNjQta29kaWVydGVtIERldXRzY2ggdm9ybGllZ3Q/IEhhYmVuIFNpZSBkaWUgQW50d29ydCB2b24gR3J1bmQgYXVmIGVyc2NobG9zc2VuIG9kZXIgaGFiZW4gU2llIG51ciBCYXNlIDY0IGVya2FubnQgdW5kIGRhcyBFcmdlYm5pcyBkYW5uIGluIEdvb2dsZSBUcmFuc2xhdGUgZWluZ2VnZWJlbj8gV2FzIGlzdCDDvGJlcmhhdXB0IOKAnnJlYXNvbmluZ+KAnCwgd2VubiBtYW4gbmljaHQgZGFzIEdlbGVybnRlIGF1cyBlaW5lbSBGYWxsIGF1ZiBlaW5lbiBhbmRlcmVuIGFud2VuZGV0Pw==

And yes, that's a question. Well, three, but still.

danbruc · 2025-08-06T12:18:10 1754482690

In case of the river puzzle there is a huge difference between repeating an answer that you read somewhere and figuring it out on your own, one requires reasoning the other does not. If you swap out the animals involved, then you need some reasoning to recognize the identical structure of the puzzles and map between the two sets of animals. But you are still very far from the amount of reasoning required to solve the puzzle without already knowing the answer.

You can do it brute force, that requires again more reasoning than mapping between structurally identical puzzles. And finally you can solve it systematically, that requires the largest amount of reasoning. And in all those cases there is a crucial difference between blindly repeating the steps of a solution that you have seen before and coming up with that solution on your own even if you can not tell the two cases apart by looking at the output which would be identical.

daveguy · 2025-08-06T16:10:23 1754496623

As mgoetzke challenges, change the names of the items to something different, but the same puzzle. If it fails with "fox, hen, seeds" instead of "wolf, goat, cabbage" then it wasn't reasoning or applying something learned to another case. It was just regurgitating from the training data.

odo1242 · 2025-08-06T16:38:03 1754498283

(Decoded, if anyone's wondering):

> Können Sie diesen Satz lesen, da er in Base-64-kodiertem Deutsch vorliegt? Haben Sie die Antwort von Grund auf erschlossen oder haben Sie nur Base 64 erkannt und das Ergebnis dann in Google Translate eingegeben? Was ist überhaupt „reasoning“, wenn man nicht das Gelernte aus einem Fall auf einen anderen anwendet?

>

> Can you read this sentence, since it's in Base-64 encoded German? Did you deduce the answer from scratch, or did you just recognize Base 64 and then enter the result into Google Translate? What is "reasoning" anyway if you don't apply what you've learned from one case to another?

gf000 · 2025-08-08T12:00:56 1754654456

Just as a random data point, gpt 4-1 managed to "solve" it on the first run with a basic prompt like "Solve this riddle for me"

tanseydavid · 2025-08-06T15:51:38 1754495498

<well-played>