panpan2's comments

panpan2 · on March 12, 2025

Thanks for catching this, changed to "hyphen".

> And while AI watermarking and fingerprinting is real, using typographically-correct Unicode instead of base ASCII isn't really it (though I guess anything that transforms text in a way which reduces variety like this does will make some of it less effective.)

I disagree. Your "writing signature" changes when you go from never using proper typography to suddenly using it perfectly. If you don't typically follow typography rules, LLM-generated text can make your writing inconsistent and detectable-especially in notes, where some parts follow your natural style while others suddenly have perfect punctuation (e.g., now you need to search for both your usual punctuation and the LLM's version to find something). Also, if you use an LLM to help rewrite a sentence within a longer piece, the output might include typographic details (like curly quotes or en-dashes) that don't match the rest of your writing.

panpan2 · on March 12, 2025

Just try it :) I’ve definitely come across random variation selectors now and then. Otherwise, the most common case is typography: like em-dashes instead of hyphens, curly apostrophes, etc. But if you're feeding LLM output into a search tool, these subtle differences might not be helping you!

panpan2 · on March 12, 2025

That's right! The same goes for en-dashes, em-dashes, and some other punctuation. While these aren’t ASCII, you can enable them with `--allow-chars` if you want to keep them. I imagine the average person doesn't know when to use which.

panpan2 · on March 12, 2025

Another viewpoint is that it's about privacy (e.g., unwanted tracking) and security (e.g., homograph attacks). As LLMs are increasingly used everywhere, this provides a way to normalize text as it moves between different systems.