Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm thinking even bog-standard European umlauts, cedillas, etc go multi-byte in Unicode? (Take a string of ÅÄÖåäöÜü and chop it off at various byte limits and see.)


This is just the general behavior of truncating strings by code point when they contain decomposed glyphs. This can also impact accents etc.


I don't remember the details, only that it was a bigger deal than with umlauts. I'll see if I can find the talk again.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: