Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note to self: on Monday, add a null character check to pre-commit hooks, and add the same check to pipelines.


It's perfectly normal for binary artifacts to contain null bytes, even long runs of them.


Yeah, I'd need to figure it out properly, but for unicode text files it should be OK. Good point about the binaries though, thank you!


You say Unicode, but you mean UTF-8. Now for 16 bit Unicode the story is different :)


I mentioned it in a separate parent, but null purge is - for the stuff I work with - completely non-negotiable. Nulls seem to break virtually everything, just by existing. Furthermore, old-timey PDFs are chock full of the things, for God knows what reason, and a huge amount of data I work with are old-timey PDF.


> Furthermore, old-timey PDFs are chock full of the things, for God knows what reason, and a huge amount of data I work with are old-timey PDF.

Probably UCS-2/UTF-16 encoding with ascii data.


It's hard to localize. Early Postscript - PDF software was the wild west, particularly when it comes to the text streams. Something I've noticed is that they're used a LOT in things like randlists (bullet lists), tab leaders, other "space that isn't a space".

I'm reminded of how you have to use `{empty}` character refs in lightweight markup like Asciidoc to "hold your place" in a list, in case you need the first element in the list to be an admonition or block. Like so:

  . {empty}
  +
  WARNING: Don't catch yourself on fire!
  +
  Pour the gasoline.
And the equivalent XML which would be something like

  <procedure>
    <step>
      <warning> Don't catch yourself on fire!</warning>
      <para>Pour the gasoline.</para>
    </step>
  </procedure>
This is one of those rare cases where the XML is more elegant than the lightweight markup. That hack with `{empty}` bugs me.

Anyways, I'm spitballing that these old-timey nulls I'm seeing are being employed in an equivalent way, some sort of internal bespoke workaround to a format restriction.


The problem ISN'T the null character though. The problem is that they tested the system, THEN changed stuff, then uploaded the changed stuff.

Their standard methodology was to deploy untested stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: