Hacker Newsnew | past | comments | ask | show | jobs | submit | mikehotel's commentslogin

If your threat model includes the TLA types, then backup to a physical server you control in a location geographically isolated from your main location. Or to a local set of drives that you physically rotate to remote locations.


Decryption is not usually an issue if you encrypt locally.

Tools like Kopia, Borg and Restic handle this and also include deduplication and other advanced features.

Really no excuse for large orgs or even small businesses and somewhat tech literate public.


Thanks for posting. Love to see rust as strategic direction for MS and how they are using it in core OS, Azure and security areas, and much of it open source.

I still use https://sysinternals.com (though not via their live channel).


The performance improvements are impressive:

> In Automerge 3.0, we've rearchitected the library so that it also uses the compressed representation at runtime. This has achieved huge memory savings. For example, pasting Moby Dick into an Automerge 2 document consumes 700Mb of memory, in Automerge 3 it only consumes 1.3Mb!

> Finally, for documents with large histories load times can be much much faster (we recently had an example of a document which hadn't loaded after 17 hours loading in 9 seconds!).


I wonder if this is accomplished using controlled buffers in AsyncIterators. I recently built a tool for processing massive CSV files and was able to get the memory usage remarkably low, and control/scale it almost linearly because of how the workers (async iterators) are spawned and their workloads are managed. It kind of blew me away that I could get such fine-tuned control that I'd normally expect from Go or Rust (I'm using Deno for this project).

I'm well above 1.3mb, and although I could get it down there, performance would suffer. I'm curious how fast they sync this data with such tiny memory usage. If the resources were available before, despite using 700mb of memory, was it still faster?

These people are definitely smarter than I am so maybe their solution is a lot more clever than what I'm doing

edit: Oh, they did this part with Rust. I thought it was written in JS. I still wonder: how'd they get memory usage this low, and did it impact speed much? I'll have to dig into it


They say: "In Automerge 3.0, we've rearchitected the library so that it also uses the compressed representation at runtime. This has achieved huge memory savings."


Right, this didn't click at first but now I understand. I can actually gain similar benefits with my project by switching to storing the data as parquet/duckdb files; I had no idea the potential gains from compressed representations are so significant, so I'd been holding off on testing that out. Thanks for the nudge on that detail!


> I recently built a tool for processing massive CSV files and was able to get the memory usage remarkably low

is it OSS? i'd like to benchmark it against my csv parser :)


No, it's very specific to some watershed sensing data that comes from a bunch of devices strewn about the coast of British Columbia. I'd love to make it (and most of the work I do) OSS if only to share with other scientific groups doing similar work.

Your parser is almost certainly better and faster :) Mine is tailored to a certain schema with specific expectations about foreign keys (well, the concept and artificial enforcement of them) across the documents. This is actually why I've been thinking about using duckdb for this project; it'll allow me to pack the data into the db under multiple schemas with real keys and some primitive type-level constraints. Analysis after that would be sooo much cleaner and faster.

The parsing itself is done with the streams API and orchestrated by a state chart (XState), and while the memory management and concurrency of the whole system is really nice and I'm happy with it, I'm probably making tons of mistakes and trading program efficiency for developer comforts here and there.

The state chart essentially does some grouping operations to pull event data from multiple CSVs, then once it has those events, it stitches them together into smaller portions and ensures each table maps to each other one by the event's ID. It's nice because grouping occurs from one enormous file, and it carves out these groups for the state chart to then organize, validate, and store in parallel. You can configure how much it'll do in parallel, but only because we've got some funny practices here and it's a safety precaution to prevent tying up too many resources on a massive kitchen-sink server on AWS. Haha. So, lots of non-parsing-specific design considerations are baked in.

One day I'll shift this off the giga-server and let it run in isolation with whatever resources it needs, but for now it's baby steps and compromises.


thanks!


- single binary file deployment

- TUI based configuration

- API endpoints


See https://archive.ph/oXYXe for more info about TeleMessage version of Signal approved for use by government offices.


Are "paid for" and "properly approved for classified information" being conflated here? I may have missed something.


If you don’t mind supervised mode, this can help prevent this bypass: https://www.techlockdown.com/blog/prevent-turning-off-wifi-i...


Congrats on publishing!

It seems like a very polished and better integrated version of https://www.when2meet.com/.

You say you do not collect info. Are you saving the meeting details and availability in a database?


there is no external server! all meeting information is stored as metadata on the message.

this leads to some issues with potential collisions like if two people click the Whenish message at the same time and submit their message, there is no way to merge both that data. while this is an issue, i wanted to err on the side of privacy as much as possible and not rely on a server at all.


From the terms of use:

To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output. Google takes steps to protect your privacy as part of this process. This includes disconnecting this data from your Google Account, API key, and Cloud project before reviewers see or annotate it. Do not submit sensitive, confidential, or personal information to the Unpaid Services.

https://ai.google.dev/gemini-api/terms#data-use-unpaid


Thank you for your service!

How many from USDS are left after this group of resignations?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: