More

vessenes · 2025-12-10T23:44:12 1765410252

Ooh, I like Public Sans! I hadn't seen it before.

vessenes · 2025-12-10T23:41:52 1765410112

If you read the article, Calibri usage was instituted during the Biden administration. So, there's probably a diversity of government styles that get involved with typefaces.

watwut · 2025-12-10T23:55:16 1765410916

Calibri is designed for screen use and Times New Roman for printing. As usually, there is a practical option and conservative option.

But stakes are quite low here. Some bureaucrats will have nearly undetectably harder time to read Trump speaches

vessenes · 2025-12-10T22:23:43 1765405423

And the remaster! https://mbleigh.dev/posts/broccoli-man-remastered/

CamperBob2 · 2025-12-11T03:30:41 1765423841

Hey man, nice slop! (No, really, that's great.)

vessenes · 2025-12-10T20:00:17 1765396817

This is really awesome, actually! It looks great, very diverse, and clearly is scalable to extremely large maps. Props for testing the generation out on Minecraft - where terrain generation really matters.

strongbond · 2025-12-10T20:46:05 1765399565

No it's not. The whole presentation of it is confusing. Sorry.

vessenes · 2025-12-11T00:18:14 1765412294

This is a low quality, extremely low quality comment. What in particular did you find confusing?

vessenes · 2025-12-10T19:54:11 1765396451

I'm not convinced its end-to-end multimodal - in that case, you'll have a speech synthesis section and this will be some of the result. You could test by having it sing or do some accents, or have it talk back to you in an accent you give it.

vessenes · 2025-12-10T19:52:29 1765396349

Interesting - when I asked the omni model at qwen.com what version it was, I got a testy "I don't have a version" and then was told my chat was blocked for inappropriate content. A second try asking for knowledge cutoff got me the more equivocal "2024, but I know stuff after that date, too".

No idea how to check if this is actually deployed on qwen.com right now.

zamadatix · 2025-12-10T20:01:18 1765396878

> No idea how to check if this is actually deployed on qwen.com right now.

Assuming you mean qwen.ai, when you run a query it should take you to chat.qwen.ai with the list of models in the top left. None of the options appear to be the -Omni variant (at least when anonymously accessing it).

vessenes · 2025-12-10T20:03:11 1765396991

Thanks - yes - I did. The blog post suggests clicking the 'voice' icon on the bottom right - that's what I did.

mh- · 2025-12-10T21:49:46 1765403386

For what it's worth, that's not a reliable way to check what model you're interacting with.

vessenes · 2025-12-11T18:08:34 1765476514

It’s a good positive signal, but not a good negative one.

It would be convincing if it said “I’m qwen-2025-12-whatever”. I agree it’s not dispositive if it refuses or claims to be llama 3 say. Generally most models I talk to do not hallucinate future versions of themselves, in fact it can be quite difficult to get them to use recent model designations; they will often autocorrect to older models silently.

vessenes · 2025-12-10T19:20:29 1765394429

Thanks for open sourcing this.

I'm skeptical of the value of this benchmark, and I'm curious for your thoughts - self play / reinforcement tasks can be useful in a variety of arenas, but I'm not a priori convinced they are useful when the intent is to help humans in situations where theories of mind matter.

That is, we're using the same underlying model(s) to simulate both a patient and a judgment as to how patient-like that patient is -- this seems like an area where I'd really want to feel confident that my judge LLM is accurate; otherwise the training data I'm generating is at risk of converging on a theory of mind / patients that's completely untethered from, you know, patients.

Any thoughts on this? Feel like we want a human in the loop somewhere here, probably on scoring the judge LLMs determinations until we feel that the judge LLM is human or superhuman. Until then, this risks building up a self-consistent, but ultimately just totally wrong, set of data that will be used in future RL tasks.

vessenes · 2025-12-08T17:48:29 1765216109

I used Tailscale on my remarkable tablet for a while; synchronizing documents over ssh is a lot easier with a static IP. It's fairly hard to get stuff to start on boot on the RM, or at least it was at the time, so I eventually moved off that plan. But it was pretty awesome to be able to ssh in from anywhere in the world.

svat · 2025-12-08T20:42:40 1765226560

Oh that sounds cool! What do you do now instead?

vessenes · 2025-12-08T23:39:39 1765237179

Rmapi calls to sync. My use case is updating an annual calendar pdf which is inked on tablet but shows calendar updates day to day, so I run it on a cron

vessenes · 2025-12-06T16:51:43 1765039903

https://modelenginenews.org/techniques/minid.html Mentions that this engine needs a spring start and 50% ether and it runs up to 40k rpm(!) seems like it’s finicky, which makes sense given its size

antonvs · 2025-12-06T20:53:06 1765054386

The spring start makes a lot of sense, thanks.

vessenes · 2025-12-05T02:36:19 1764902179

Sahara desert average air humidity is .. I think 25%?

As far as I know most air trap type humidity stuff works in the desert, just not as quickly as in, say, the jungle.