More

gildas · 2026-02-15T21:30:39 1771191039

I would like to know why ZIP/HTML polyglot format produced by SingleFile [1] and mentioned in the article "achieve static, single, but not efficiency". What's not efficient compared to the gwtar format?

[1] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG

gwern · 2026-02-15T21:33:59 1771191239

'efficiency' is downloading only the assets needed to render the current view. How does it implement range requests and avoid downloading the entire SingleFileZ when a web browser requests the URL?

gildas · 2026-02-15T21:38:35 1771191515

I haven't looked closely, but I get the impression that this is an implementation detail which is not really related to the format. In this case, a polyglot zip/html file could also interrupt page loading via a window.stop() call and rely on range requests (zip.js supports them) to unzip and display the page. This could also be transparent for the user, depending on whether the file is served via HTTP or not. However, I admit that I haven't implemented this mechanism yet.

gwern · 2026-02-15T21:45:34 1771191934

> that this is an implementation detail which is not really related to the format. In this case, a polyglot zip/html file could also interrupt page loading via a window.stop() call...However, I admit that I haven't implemented this mechanism yet.

Well, yes. That's why we created Gwtar and I didn't just use SingleFileZ. We would have preferred to not go to all this trouble and use someone else's maintained tool, but if it's not implemented, then I can't use it.

(Also, if it had been obvious to you how to do this window.stop+range-request trick beforehand, and you just hadn't gotten around to implementing it, it would have been nice if you had written it up somewhere more prominent; I was unable to find any prior art or discussion.)

gildas · 2026-02-15T21:52:10 1771192330

The reason I did not implement the innovative mechanism you describe is because, in my case, all the technical effort was/is focused on reading the archive from the filesystem. No one has suggested it either.

Edit: Actually, SingleFile already calls window.stop() when displaying a zip/html file from HTTP, see https://github.com/gildas-lormeau/single-file-core/blob/22fc...

gwern · 2026-02-15T22:35:27 1771194927

What does that do?

gildas · 2026-02-15T22:45:48 1771195548

The call to window.stop() stops HTML parsing/rendering, which is unnecessary since the script has downloaded the page via HTTP and will decompress it as-is as a binary file (zip.js supports concatenated payloads before and after the zip data). However, in my case, the call to window.stop() is executed asynchronously once the binary has been downloaded, and therefore may be too late. This is probably less effective than in your case with gtwar.

I implemented this in the simplest way possible because if the zip file is read from the filesystem, window.stop() must not be called immediately because the file must be parsed entirely. In my case, it would require slightly more complex logic to call window.stop() as early as possible.

Edit: Maybe it's totally useless though, as documented here [1]: "Because of how scripts are executed, this method cannot interrupt its parent document's loading, but it will stop its images, new windows, and other still-loading objects." (you mentioned it in the article)

[1] https://developer.mozilla.org/en-US/docs/Web/API/Window/stop

Edit #2: Since I didn't know that window.call() was most likely useless in my case, I understand your approach much better now. Thank you very much for clarifying that with your question!

gwern · 2026-02-15T23:27:04 1771198024

Well, it seems easy enough to test if you think you are getting efficiency 'for free'. Dump a 10GB binary into a SingleFileZ, and see if your browser freezes.

gildas · 2026-02-15T23:43:44 1771199024

I just ran a test on a 10GB HTML page and called window.stop() via a 100ms setTimeout, which, in my opinion, simulates what would happen in a better-implemented case in SingleFile if the call to window.stop() were made as soon as the HTTP headers of the fetch request are received (i.e. easy fix). And it actually works. It interrupts the loading at approx. 15MB of data, the rendering of the page, and it's partially and smoothly displayed (no freeze). So it's not totally useless but it deserves to be optimized at a minimum in SingleFile, as I indicated. In the end, the MDN documentation is not very clear...

Edit: I've just implemented the "good enough of my machine fix" aka the "easy fix", https://github.com/gildas-lormeau/single-file-core/commit/a0....

Edit #2: I've just understood that "parent" in "this method cannot interrupt its *parent* document's loading" from the MDN doc probably means the "parent" of the frame (when the script is running into it).

gwern · 2026-02-17T01:19:19 1771291159

OK, so assuming you clean that up a bit and this becomes officially supported in SingleFile/SingleFileZ, what is missing compared to Gwtar? Anything important or just optional features like image recompression and PAR2?

gildas · 2025-11-26T12:29:37 1764160177

If we were to compare this to the JS world, it seems Python’s async is closer to Babel-style generator-based coroutines [1] than to JavaScript’s async/await execution model.

[1] https://babeljs.io/docs/babel-plugin-transform-async-to-gene...

gildas · 2025-11-26T11:55:40 1764158140

You haven't got it wrong, the first example in the article behaves the same in JS, see https://jsfiddle.net/L5w2q1p7/.

krackers · 2025-11-26T19:57:34 1764187054

I think in JS it's easier to see because of the correspondence between Promises and async/await.

So in your example the behavior is much more obvious if you sort of desugar it as

  async function parent() {
      print("parent before");
      const p = child();
      await p
      print("parent after");
  }

gildas · 2025-11-12T10:14:23 1762942463

Author of SingleFile here. Sorry, this is obviously not normal. Please feel free to report any bugs here https://github.com/gildas-lormeau/SingleFile/issues.

jjice · 2025-11-12T15:03:30 1762959810

Anecdotally (not to diminish any bug the parent had), SingleFile is one of my favorite extensions. Been using it for years and it's saved my ass multiple times. Thank you!

Edit: What's the best way to support the project? I'm seeing there's an option through the Mozilla store and through GitHub. Is there's a preference?

gildas · 2025-11-12T22:05:09 1762985109

Thank you also for the kind words! Regardoing support, you can choose whichever method you prefer; it makes no difference to me actually.

gildas · 2025-10-17T08:51:35 1760691095

Great idea, some people have already implemented it for the same type of need, it would seem (see the list of user agents in the source code). Implementation seems simple.

https://github.com/0x48piraj/gz-bomb/blob/master/gz-bomb-ser...

kijin · 2025-10-17T23:44:46 1760744686

Be careful using this if you're behind cloudflare. You might inadvertently bomb your closest ally in the battle.

gildas · 2025-10-16T19:51:30 1760644290

For implementation in a library, you can use HttpRangeReader [1][2] in zip.js [3] (disclaimer: I am the author). It's a solid feature that has been in the library for about 10 years.

[1] https://gildas-lormeau.github.io/zip.js/api/classes/HttpRang...

[2] https://github.com/gildas-lormeau/zip.js/blob/master/tests/a...

[3] https://github.com/gildas-lormeau/zip.js

toomuchtodo · 2025-10-16T20:18:18 1760645898

Based on your experience, is zip the optimal archive format for long term digital archival in object storage if the use case calls for reading archives via http for scanning and cherry picking? Or is there a more optimal archive format?

gildas · 2025-10-16T21:03:07 1760648587

Unfortunately, I will have difficulty answering your question because my knowledge is limited to the zip format. In the use case presented in the article, I find that the zip format meets the need well. Generally speaking, in the context of long-term archiving, its big advantage is also that there are thousands of implementations for reading/writing zip files.

duskwuff · 2025-10-16T22:57:32 1760655452

ZIP isn't a terrible format, but it has a couple of flaws and limitations which make it a less than ideal format for long-term archiving. The biggest ones I'd call out are:

1) The format has limited and archaic support for file metadata - e.g. file modification times are stored as a MS-DOS timestamp with a 2-second (!) resolution, and there's no standard system for representing other metadata.

2) The single-level central directory can be awkward to work with for archives containing a very large number of members.

3) Support for 64-bit file sizes exists but is a messy hack.

4) Compression operates on each file as a separate stream, reducing its effectiveness for archives containing many small files. The format does support pluggable compression methods, but there's no straightforward way to support "solid" compression.

5) There is technically no way to reliably identify a ZIP file, as the end of central directory record can appear at any location near the end of the file, and the file can contain arbitrary data at its start. Most tools recognize ZIP files by the presence of a local file header at the start ("PK\x01\x02"), but that's not reliable.

Lammy · 2025-10-16T23:58:21 1760659101

> there's no straightforward way to support "solid" compression.

I do it by ignoring ZIP's native compression entirely, using store-only ZIP files and then compressing the whole thing at the filesystem level instead.

Here's an example comparison of the same WWW site rip in a DEFLATE ZIP, in a store-only ZIP with zstd filesystem compression, in a tar with same zstd filesystem compression (identical size but less useful for seeking due to lack of trailing directory versus ZIP), and finally the raw size pre-zipping:

  982M preserve.mactech.com.deflate.zip
  408M preserve.mactech.com.store.zip
  410M preserve.mactech.com.tar
  3.8G preserve.mactech.com


  [Lammy@popola] zfs get compression spinthedisc/Backups/WWW
  NAME                     PROPERTY     VALUE           SOURCE
  spinthedisc/Backups/WWW  compression  zstd            local

This probably wouldn't help GP with their need for HTTP seeking since their HTTP server would incur a decompress+recompress at the filesystem boundary.

nicman23 · 2025-10-17T06:07:36 1760681256

lool why use zip then anyways? put them in a folder

Lammy · 2025-10-17T07:04:02 1760684642

It's for when you have a very large number of mostly-identical files, like web pages with consistent header and footer. If 408MiB versus 3.8GiB is a meaningless difference to you then sure don't bother with compression, but why I want it should be very obvious to most people here.

nicman23 · 2025-10-18T12:02:33 1760788953

you completely missed what i asked you but ok

Lammy · 2025-10-25T22:12:02 1761430322

I don't think I did, but please explain :)

The last example in my list of four file sizes is them in a folder. Filesystem compression works at the file level, so you have to turn many-almost-identical-files into one file in order to benefit from it. ZFS does have block-level deduplication, but that's it's own can of worms that shouldn't be turned on flippantly due to resource requirements and `recordsize` tuning needed to really benefit from it.

nicman23 · 2025-10-27T11:03:31 1761563011

you do not need dedup just use reflinks for everything. if that workflow does not work then eh i understand why you would use zips

although zfs dedup is probably better in 2025

gildas · 2025-10-17T08:38:13 1760690293

FYI, zip.js has no issues with 1 (it can be fixed with standard extra fields), 3 (zip64 support), and 5 (you cannot have more than 64K of comment data at the end of the file).

duskwuff · 2025-10-17T21:20:06 1760736006

With regard for the first two - that's good for zip.js, but the problem is that support for those features isn't universal. There's been a lot of fragmentation over the last 36 years (!).

As far as the last (file type detection) goes, the generally agreed upon standard is that file formats should be "sniffable" by looking for a signature in the file's header - ideally within the first few bytes of the file. Having to search through 64 KB of the file's end for a signature is a major departure from that pattern.

gildas · 2025-08-05T11:16:04 1754392564

The warning message you mentioned simply means that the extension can inject "content scripts" into the web pages you visit. This feature is necessary, for example, to remove ads that cannot be blocked via HTTP.

gildas · on March 5, 2025

Bonus point for clients that don't support the HTTP “Accept-Encoding” header [1] and consume all your bandwidth.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...

fc417fc802 · on March 5, 2025

Seems like a reasonable case for disregarding the client preference. If you're able to speak TLS then you're able to load up a public domain (de)compression library.

gildas · on March 2, 2025

One doesn't preclude the other, but I have serious doubts about free speech concerns. There are moderate and extreme movements in Europe. They all express themselves freely within the law.

sepositus · on March 2, 2025

Of course, we’re probably seeing the normal amount of movement when administrations shift. I work for a global company and have only witnessed the EU to US movement. I’m sure both are happening.

jakeogh · on March 2, 2025

blink... They outlawed silent prayer.

gildas · on March 2, 2025

The UK left EU some years ago.

jakeogh · on March 2, 2025

No worries though, Germany is raiding peoples houses for offending people online. Please spare me the "only for Holocaust deniers" thing.

gildas · on March 2, 2025

I'm sorry but it's normal. You don't have the right to offend people online or not.

sepositus · on March 2, 2025

What's normal? You just offended a whole group of people and probably had no intention of doing it. You're proving the point on why the whole thing doesn't work. You should get your comment removed if you seriously believe this to be true and want to avoid being hypocritical.

gildas · on March 2, 2025

I'm sorry if you feel offended. My comments will be removed by HN moderators if they think it's necessary. You can flag and downvote them meanwhile. You can also contact the moderators or your local police if you think it is necessary.

EDIT: It might not be possible to flag or downvote comments. So, I recommend you to contact the moderators or your local police. You can find my name and address on my GitHub profile.

sepositus · on March 2, 2025

I have no interest in pressing charges against you or getting you in trouble. That would be hypocritical of me and contrary to my beliefs. I was simply pointing out how easy it is to offend people on the internet and why it simply doesn't work at scale. If no one has the right to say something potentially offensive on the internet, then the whole thing needs to be shut down.

gildas · on March 2, 2025

Thank you for explaining your point of view on the interest of this conversation. Here's mine, I consider an offense to be something quite subjective sometimes. In some cases, it's possible to offend someone without meaning to. The solution to this problem is to apologize and offer to talk about it. If that doesn't work, and the offense is in some way "forbidden", then the offended person can simply defend themselves by going to the “authorities”.

refurb · on March 2, 2025

What you just said offends my beliefs in fundamental human right to free speech.

Who do I call to press charges?

gildas · on March 2, 2025

I sincerely apologize again if you found my response offensive. You can contact your local police to make a complaint, I suppose.

adamsb6 · on March 2, 2025

I find this extremely offensive.

gildas · on March 2, 2025

I'm sorry. I'm open to discussion if you can explain what is offending to you

blackeyeblitzar · on March 2, 2025

You can’t have free speech without being able to offend. And you can’t have democracy without free speech.

gildas · on March 2, 2025

Yes, but in this case, there's nothing to stop you from apologizing and possibly discuss the offending point so that it doesn't happen again.

jakeogh · on March 2, 2025

[flagged]

gildas · on March 2, 2025

I sincerely apologize if you found my response offensive. It was not my intention and I try to respect the moderation rules on HN. I just wanted to give my point of view, as a person living in the EU.

jakeogh · on March 3, 2025

This might seem unrelated. I pray you decided to watch the first yt link.

https://www.youtube.com/playlist?list=PL6lsUJdGLFx7OGLRQCQfH...

gildas · on Feb 3, 2025

No T-spins [1], impressive!

[1] https://harddrop.com/wiki/T-Spin_Guide

charlieyu1 · on Feb 3, 2025

No T-spins, disappointed.

It looks like the bot just beat humans because they react faster

foobaw · on Feb 3, 2025

yes but this aligns with the initial statement in the post that top 1 doesn't happen as much and top 15 is more common.

gildas · on Feb 3, 2025

I must admit I'm a little disappointed that the bot isn't able to do this. It would be much more efficient.