Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

WebRecorder [0] is the best implemention of this that I've tested. It runs as an extension in your browser, intercepting HTTP streams, so as long as you open a page in your browser the data is captured to reproduce it exactly. It outputs WARC files that are (in theory) compatible with the rest of the web archiving ecosystem, and has a WARC explorer interface to browse captured archives.

For pages with dynamic content that can't be trivially reproduced by their HTTP streams— E.G., opening the archive triggers GETs with a mismatched timestamp, even if the file it's looking for is in the WARC under a different URI— There's always SingleFile [1], and Chromium's built-in MHTML Ctrl+S export, which "bake" the content into a static page.

0: https://chromewebstore.google.com/detail/webrecorder-archive...

1: https://github.com/gildas-lormeau/SingleFile



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: