More

ssokolow · 2026-03-12T06:43:36 1773297816

By "game tutorials", I think they mean modern successors to the role GameFAQs used to play.

There is a combining character that, by its description, sounds like it should be implemented to do the desired thing (U+20DD Combining Enclosing Circle), but my fonts don't render it very well when I stuff geometric characters matching the PlayStation buttons into it.

Without spaces: △⃝□⃝×⃝○⃝

With two spaces between each one so you can see how "enclosing" is getting interpreted: △⃝ □⃝ ×⃝ ○⃝

For the Markdown renderer I'm working on to replace WordPress for my blog, I resorted to shortcodes which resolve to CSS styling the `<kbd>` tag with `title` attributes to clarify and the occasional bit of inline SVG for things where I didn't want to specify a fixed font to get sufficient consistency, like PlayStation button glyphs.

https://imgur.com/a/1EPm7QV

(In all fairness, it's a nerd-snipe made based on the idea that I'll be more willing to blog about things I have nice tools for. I don't currently typeset button presses in any form.)

FeepingCreature · 2026-03-12T15:02:09 1773327729

I was thinking of ingame tutorials, but now that you mention it GameFAQs and forums would be a great usecase.

ssokolow · 2026-03-13T08:07:20 1773389240

*nod* As-is, we're stuck with hacks like custom shortcodes and emoji.

...though, given the inconsistent naming of consistently laid-out buttons, I think anything that makes its way into Unicode should include something that follows the lead of what Batocera Linux does on their Wiki and with custom emojis in their Discord.

See https://wiki.batocera.org/configure_a_controller for an example of how they look inline but the gist is that it's an outline of the SNES-originated diamond of action buttons that pretty much everyone but Nintendo uses these days and which is embodied in XInput and the SDL Gamepad API, with one of the circles filled in to represent the button in question.

ssokolow · on Sept 4, 2024

I know Sylvain Kerkour is a perennial "Rust should be more like Go. I don't care that they're trying to meet different needs" person and has been for many years now, but I do wish we could at least get a little acknowledgement that Rust's design took a great deal of influence from Python, both on what worked and what didn't, and that this was a direct response to how, as Amber Brown put it, Python has batteries included, but they're leaking.

Python is the most infamous example of how putting something in the standard library doesn't automatically mean everyone will use it.

For example, as of the end of the Python 2.x cycle, Python had urllib and urllib2 in the standard library and everyone said to ignore them and use Requests... which contains a urllib3, the maintainers of which refuse to ever add to the standard library.

Python had/has a bunch of "use Twisted instead" network protocol implementations. Python's standard library XML implementations carry a big warning to use the third-party `defusedxml` crate if you are processing untrusted data. etc. etc. etc.

I have next to no Java experience, but I vaguely remember it also having some similar cases of common wisdom being to ignore the standard library-provided solution.

ssokolow · on Feb 9, 2024

*nod* Give https://blog.readyset.io/bounds-checks/ a read.

They tried doing a comparison between ReadySet compiled normally and ReadySet with bounds checking removed so thoroughly that they needed to use a patched toolchain to achieve it and found the difference to be within the noise threshold.

Their conclusion was:

> At the end of the day, it seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible. It’s tough to say precisely why this is, but my intuition is that CPU branch prediction is simply good enough in practice that the cost of the extra couple of instructions and a branch effectively ends up being zero - and compilers like LLVM are good enough at local optimizations to optimize most bounds checks away entirely. Not to mention, it’s likely that quite a few (if not the majority) of the bounds checks we removed are actually necessary, in that they’re validating some kind of user input or other edge conditions where we want to panic on an out of bounds access.

vlovich123 · on Feb 9, 2024

As I noted originally, that’s not the reason. It’s because no one is trying to write code that can do hundreds of millions of high level operations per second. There’s so much “inefficiency” in the software stack (or a problem domain where more work is being done in the hot path) that bounds checking is in the noise typically. Readyset’s experiment is flawed because any existence of bounds check on some critical hot path would have already been noticed and fixed as low hanging fruit.

So the remaining bounds check is off the hot path where it doesn’t matter. But in the hot path where you should be bounded by HW limits, it can be a significant slowdown. Prediction can help but it’s not free.

So for most people, they only need to care about bounds checking when they’re doing something in the hot path and even then only when their hot path is running into HW limits. If their hot path is some complicated CPU computation, bounds checking should be but a blip unless you do something stupid like check the bounds too frequently.

So the general advice to not worry too much about bounds checks when writing Rust is directionally correct for the vast majority of people, but recognize it’s incorrect in places and it’s hard to notice because it’s such a small thing hidden in code gen without an easy flag to test the impact.

ssokolow · on Feb 14, 2024

That's fair.

I still think the "because it's not in the fast path" part of "Most software will not see a bottleneck because of bounds checking because it's not in the fast path" is a bit too much of a blanket statement and could detract from the admonition to benchmark very carefully before optimizing but, otherwise, I agree.

ssokolow · on Oct 3, 2023

> Before comparing strings or searching for a substring, normalize!

...and learn about the TR39 Skeleton Algorithm for Unicode Confusables. Far too few people writing spam-handling code know about that thing.

(Basically, it generates matching keys from arbitrary strings so that visually similar characters compare identical, so those Disqus/Facebook/etc. spam messages promoting things like BITCO1N pump-and-dumps or using esoteric Unicode characters to advertise work-from-home scams will be wasting their time trying to disguise their words.)

...and since it's based on a tabular plaintext definition file, you can write a simple parser and algorithm to work it in reverse and generate sample spam exploiting that approach if you want.

https://www.unicode.org/Public/security/latest/confusables.t...

> and CD-ROM!

I think you mean Microsoft Windows's Joliet extensions to ISO9660 which, by the way, use UCS-2, not UTF-16. (Try generating an ISO on Linux (eg. using K3b) with the Joliet option enabled and watch as filenames with emoji outside the Basic Multilingual Plane cause the process to fail.)

The base ISO9660 filesystem uses bytewise-encoded filenames.

dystroy · on Oct 3, 2023

But not all normalizations are done to fight spam, not all of them should be interested in visual similarity.

I normalize strings in searches not because of bad intents but because for all user related purposes "Comunicações" and "Comunicações" are the same, their different encodings being more of an accident.

ssokolow · on Oct 4, 2023

*nod* ...and stemming is that taken to a greater extreme.

I was just pointing out that Unicode itself has various forms of normalization and normalization-adjacent functionality that people are far too unaware of.

ssokolow · on Oct 3, 2023

My anglophone Canadian brother's name is André. Even if you're fine with alienating the ~50% of the world population using non-latin writing systems, probably best to at least stick to the stuff covered by the latin1 legacy encoding.

ssokolow · on Oct 3, 2023

Technically, a superset would have to somehow Schrödinger's cat around \ in latin1 and ¥ in Shift-JIS being the same codepoint.

Unicode just took it upon themselves to reliably round-trip legacy text... thus the precomposed forms.

Most of the other complexity and technical debt is in the writing systems themselves.

ssokolow · on Oct 3, 2023

According to a sibling to what you replied to, it's because the shapes of the glyphs are still under copyright by known-litigious rightsholders and the Unicode consortium doesn't want to subject font authors to that.

ssokolow · on Oct 3, 2023

According to a sibling to what you replied to, it's because the shapes of the glyphs are still under copyright by known-litigious rightsholders and the Unicode consortium doesn't want to subject font authors to that.

ssokolow · on Oct 3, 2023

"Extended (Grapheme Cluster)".

The .graphemes() method in Rust's unicode-segmentation crate takes an is_extended boolean as an argument and, if you set it to false, you're iterating legacy grapheme clusters.

ssokolow · on Oct 3, 2023

Cookie consent is only necessary if you're sharing it with others (eg. ad networks, Google Analytics, etc.) or using it for "non-essential" functions (again, stuff like analytics). Sites just don't want the general public to realize that.

As for the mouse cursors, I don't think they qualify as personal information under the GDPR, but IANAL.

extraduder_ire · on Oct 3, 2023

It's needed when you collect the data. Cursor position is being used directly to make the service "function" though. Even if the function it enables is pretty novel. This is entirely debatable, and probably matters less than showing were your cursor is in a google doc.

Regardless, I don't think it matters since the author is not in the EU.