Hacker Newsnew | past | comments | ask | show | jobs | submit | jhk727's commentslogin

Thank you for the clarification - I had heard from a few sources that the block allocator algorithm actually changes at higher utilization, but was previously unable to find anything concrete in the documentation. This helped clear up a longstanding curiosity for me.


Author here - it's difficult to provide a single number to summarize what we've observed re: CPU, but one data point is that average CPU utilization across our cluster increased from ~40% to ~50%. This effect is more pronounced during NA daylight hours.

Worth noting that part of the reason this is relatively low impact for our read queries is that the hot portion of our dataset is usually in Postgres page cache where the data is already decompressed (we see a 95-98% cache hit rate under normal conditions). We've noticed the impact more for operations that involve large scans - in particular, backups and index builds have become more expensive.


Hey thanks for the clarification. That seems like a worthwhile tradeoff in your case.

For backups in particular, are ZFS snapshots alone not suitable to serve as a backup? Is there something else that the pg backup process does that is not covered by a "dumb" snapshot?


We use wal-g and extensively leverage its archive/point-in-time restore capabilities. I think it would be tricky to manage similar functionality with snapshots (and possibly more expensive if archival involved syncing to a remote pool).

That being said, wal-g has worked well enough for us that we haven't put a ton of time into investigating alternatives yet, so I can't say for sure whether snapshots would be a better option.


I'd personally recommend pgBackRest as a wal-g replacement. We (Covalent) started with wal-g ourselves, but pgBackRest does full backup and restore so much faster. Besides the parallelism (which is great), pgBackRest's backups are manifests, symbolically mapping to the individual objects in storage that may have come from previous backups. Which means that a differential or incremental backup doesn't need to be "replayed after" a full backup, but instead is just like a git commit, pointing to some newer and some older objects.

Also, auto-expiry of no-longer-needed WAL segments (that we use due to our reliance on async hot standbys) along with previous backups is pretty great.

And we haven't even started taking advantage of pgBackRest's ability to do incremental restore — i.e. to converge a dataset already on disk, that may have fallen out of sync, with as few updates as possible. We're thinking we could use this to allow data science use-cases that would involve writing to a replica, by promoting the replica, allowing the writes, and then converging it back to an up-to-date replica after the fact.


Have you looked at postgres row/column compression? Obviously, compressing the same data twice won't be too helpful, but maybe there are more wins to be had.


how/why did you choose Postgres over MariaDB? I am facing such a decision now.


After working for years with both, I'd say that PostgreSQL is much more friendlier to the developer and more pleasant to work with. In any area: documentation, features, error messages, available SQL features, available extensions, available docs, available books.

One tiny example: I prefer to work with databases using CLI interfaces (mysql and psql).

psql CLI is a tool which is pleasant to use, has no bugs in the interface and it even gets improvements from time to time.

mysql CLI is awful to use (e.g. doesn't display long lines properly, has difficulties with history editing, etc) and looks like there wasn't a single improvement since 1996 (I'm sure there were, I just never felt the effect of such improvements).


Actually, there was a significant regression. Many many years ago, Oracle decided to drop the support for the gpl-licensed readline altogether, likely because they can't ship it with MySQL Enterprise. To this day, Percona still carries a small patch to add that functionality back, which is great because I wouldn't touch any CLI without readline.


Dropping a link to `rlwrap` in case anyone is not familiar with it:

https://github.com/hanslub42/rlwrap

Note that I've never tried it myself with the mysql/mariadb CLI, but I have used it with other tools, and it's brilliant.


Yep Postgres seems to have more features too for sure. Also, a fun little limitation of the MySQL CLI is that it truncates hostnames to 100 characters - not usually a problem, but AWS database VPC hostnames easily hit that limit, and it just silently truncates rather than failing.


"It silently does something unexpected rather than failing" succinctly summarizes all the reasons you shouldn't be using MySQL.


I’ve been using a Postgres foreign data wrapper to interact with a MySQL database and it’s much nicer for interactive use.


Not MariaDB but after working with Postgres for six years I'm now using MySQL and... the error messages are useless, and comparing different types works in strange ways rather than failing like in Postgres.


To add, execution plans presented by MariaDB are also nearly useless.


User grants in Maria/MySQL drive me nuts. I hope it is better in Postgres


Postgres always


Good callout - we use a higher blocksize than Postgres page size because it gives us a much higher compression ratio, at the cost of some read/write amplification.

And yes - Postgres will automatically TOAST oversized tuples and compress the relevant data (if you configure it to do so). This is much lower impact for us than filesystem level compression, as it doesn't affect the main relation heap space (or any indexes).


What about: https://people.freebsd.org/~seanc/postgresql/scale15x-2017-p...

16k record size 2x amplification and still (?) allows compression w/ lz4


We tested this extensively a few years back. We saw a compression ratio of ~1.9 with 8k recordsize/lz4, ~2.7 with 16k/lz4, and now ~5.5 with 64k/zstd.


There has to be something better than a potential 8 fold write performance reduction wrt compression


We aren't using cstore_fdw, though we've looked into it in the past. cstore tables don't support deletes or updates, and we still rely on updates for some key parts of our write pipeline. Additionally, we rely heavily on btree partial indexes, while cstore tables only support skip indexes.


Postgres is not designed for OLAP, but you can push it a lot farther than one would expect with the correct schema and indexing strategy. See https://heap.io/blog/running-10-million-postgresql-indexes-i... for a little more detail about how we schematize for distributed OLAP queries on Postgres at scale.


Author here - It's explained in the post but the primary driver is cost. 80%+ reduction in storage is massive when you're storing petabytes of data on ssds.


how important was being able to disable postgres full page writes under zfs? that was almost a throw-away line in the article but could be really important for some pg workloads i'd think? certainly got my attention


It was a nontrivial improvement to write throughput at the time. I'd imagine it could have similar impact for other write-intensive workloads.


We are, though we started out using EBS. As you mentioned, NVMe instance storage performs much better for our workload. We work around the lack of durability through strong automation of point in time restore/swapping in of new nodes in case of hardware failures.

And yes, reservations make a massive difference economically.


Author here - as others have noted, there's a lot of benefits to a company of our size operating infrastructure on AWS vs. managing physical hardware. A couple of the highlights for managing our primary database cluster include:

- Automation - this was noted by another commenter, but with AWS we can fully automate the instance replacement procedure using autoscaling groups. On hardware failure, the relevant database is removed from its autoscaling group and we automatically start restoring a fresh instance from latest backup. This would be much more difficult if we were to manage our own hardware.

- Flexibility - we have the ability to easily change instance classes via selling/buying reservations. Some of our biggest wins historically have come from AWS releasing new instance families - we've been able to swap out the hardware for our entire cluster over a week or so, for negligible cost (often saving money in the process due to the cost per unit of hardware decreasing on new instance classes). While we could leverage the same developments in a self-managed environment, it would be more difficult, and likely more expensive due to how capital-intensive self-hosted is.

Additionally - there's a ton of value in the integration of the AWS ecosystem. We use many AWS managed services, including heavy use of RDS, Kinesis, S3, and others. For a company with a relatively small engineering team managing a large infrastructure footprint, it hasn't made financial sense yet to invest in moving to self-hosted infrastructure.


The things you mention are ostensibly true, yet still they don't make sense to me. It might make sense when you're a startup that's growing, but when your SSD costs are so large you can save millions on them, then the numbers just don't add up.

In my experience, doing things in the cloud is about as expensive per 12-18 months as buying the hardware up front is. That's super interesting for a fast growing startup that could go bust any minute and wants to spend every second of their time on growing, expanding and marketing.

But when you're spending so much on AWS you can save millions just by reducing filesystem overhead by 20%, it should have stopped making sense a while ago. $2 million should get you a team of 10 sysadmins and devops engineers. Sure automation would be more difficult, but you'd have the manpower to achieve it. Isn't that what running a business is about?

Flexibility, when you're growing quickly it's nice that you can provision new hardware instantly, but AWS is so expensive you could continuously over provision your hardware by 50% and still always be ahead of the AWS price curve. And as I said, you could fully swap out your hardware every 18 months and be at the same price basically. You could even hire a merchant to offload your old hardware and recuperate 50% of those costs.

And I'm not saying to throw AWS overboard altogether, that you have your core business outside of AWS's datacenter doesn't preclude you from buying into RDS, Kinesis, S3.

Is AWS just cutting you more financial slack than we're getting as a tiny company? Or am I underestimating the costs of getting that sysadmin team on board?


> In my experience, doing things in the cloud is about as expensive per 12-18 months as buying the hardware up front is.

The fallacy is comparing hardware costs to services cost. The hardware is the cheap part.

When you run your own system, you have to develop the entire system up front and maintain it on the backend. The hardware is cheap by comparison to the salaries and development costs you pay.

> $2 million should get you a team of 10 sysadmins and devops engineers.

Probably double that once you add in fully-loaded costs as well as the compensation for ~2 managers to manage them.


I think he compared hardware + team salary vs aws. After a certain size that starts tipping over to the side of having things on prem. Running things on prem is nothing scary. You just need people with the skill set needed to do it. But when you’re spending millions in infrastructure, that’s hardly a problem.


>When you run your own system, you have to develop the entire system up front and maintain it on the backend. The hardware is cheap by comparison to the salaries and development costs you pay

The development costs are OTC that are amortised during the life of the solution. Whereas in xAAS, they are MRC. The longer your tech refresh cycle, the cheaper it is. It's inherent to the pricing model.

Also when you get down to the cruz if it, these solutions (like openstack or vSphere), are software platform that provides similar features. There's not much development costs, it's just software licensing and PS.

In terms of operations, it's not like you can get rid of sysadmins, they just morphed into DevOps.

>Probably double that once you add in fully-loaded costs as well as the compensation for ~2 managers to manage them.

You might as well add all sorts of additional costs such as egress charges on exit and cloud consultancy.


There are plenty of financing options( with pretty competitive interests) that will help you amortize your upfront costs over project lifetime if your credit is good ( at this kind range any startup gets access to this fairly easily), So both options can be monthly recurring if need be.


Yeah, but the financing options usually have a fixed period, unless you go with hardware subscription. Costs are hence capped.


But if you’re paying X in OpEx to AWS, at some point Y in CapEx (hardware) and Z for OpEx (your people) becomes more compelling. What I believe the comment above is arguing is that if you are saving millions for 20% savings on SSD costs, X >> Y + Z.

Yes, you have to manage the hardware, and Y doesn’t automatically go to zero for year two, but the convenience of the cloud isn’t always cost effective. Don’t get caught up with the details. The 2 million figure doesn’t matter as much. It’s finding that inflection point and making the better business decision.


> In my experience, doing things in the cloud is about as expensive per 12-18 months as buying the hardware up front is.

It seems like you are glossing over the other costs: staff to implement and manage, development time and maintenance for automation to re-implement everything that AWS includes, data center costs (not clear if you were thinking of hardware ownership only or data-center also).

I'm not saying you didn't think about those things just saying that they can't be ignored in these types of comparisons.


You don’t need to reimplement everything aws provides. AWS provides services on demand for a huge costumer base. The sysadmin/devops team you set up needs to solve only your particular problem. You can often get a better, easier to use and maintain system this way. The downside is that you have to pay for that extra team, but if only 20% of your data already costs millions, your scale is big enough that you’ll likely save money by hiring a sys admin team.


a lot of people underestimate how expensive it is just to have a 10g/25g/100g network and maintain it. that alone is extremly expensive. especially when you want it over multiple locations. if you want to connect two datacenters with a low latency high troughput network you would probably go to aws since that is cheaper. and that is just the network, you also need to maintain other stuff like storage. maintaining a storage network is extremly hard. like s3/block storage, etc.


Sometimes, sometimes not. A lot of common ‘prod bricks’ (S3, Managed kubernetes, etc) get used because they are convenient, and while they could be implemented in some other, more bespoke way, it’s rarer and rarer that it actually pencils out as a net win. You also have to deal with the complexity of managing your own version of it, which is non trivial over the full lifecycle of something.

If it is your core business to provide that thing? Sometimes or even often worth it. Otherwise, often not.


Unless you are literally Amazon or big Co, spending ten of millions ( if 20% is millions then at-least 5-10 million ) on just SSDs alone ! should make it pretty much a core business problem to solve?

My sense that in the last decade startups have not lost the skills to do Co-Location setups as they did in 2000s and think it is more complex than actually is. Co-Lo hardware management is hard yes, but if it not even worth doing 10+ million /year budgets we would never have SaaS companies pre-cloud at all.


Which is the core business problem exactly that you think they should be solving?


A core business problem, the core business problem may or may not be the same.[1] At this scale infrastructure cost is a critical problem that can afford to have dedicated teams/vertical trying to solve for.

Hypothetically if you are spending $50 Million+ / Year on the cloud a dedicated team of even 10 senior engineers to setup your co-location with your hardware to consider migration of your costliest and also least cloud native components would maybe cost $2-5M more. With attractive debt financing that is readily available these days you can easily amortize your purchase expenses over the 2-3 year hardware lifecycle and realize savings, there is not much justification not to also pursue this along with all other features you are also pursuing.

The cost is a very low investment compared to your costs with potential for very high saving ROI, so even if the chance of success is low you should give it a shot. i.e. If you can save say $10 Million on the $50 Million, your $2 Million investment needs only 20% probability of success to have expected value in the green.

[1] I don't think there should be only one core (the) business problem for a startup, there are always few critical problems startups have to solve for at any given stage.


The parent was mentioning operational agility, and ability to quickly nearly ‘hot swap’ in live fully formed instances - and with a footprint of multiple petabytes of storage.

Their core business problem is providing databases, and apparently they see leveraging the huge VM and storage pools available at AWS as a major advantage here (and I for one can’t blame them), over hardware spend absolute efficiency.

Being able to providing a couple hundred TB of extra SSD with a config file change (or return it and stop paying for it almost immediately), has real advantages over rolling it yourself, especially if you only have a 10 person ops team or the like.

Considering the apparent business model, I can see their point.

This project being discussed on the thread is likely a couple folks for a few months - low hanging fruit to save millions. What you’re referring to is a major business effort, if not doubling of headcount, for such a company with at best similar payoff. Running their own colos also means a lot of thinking, planning, and lifecycle management when it comes to equipment generations, upgrades, making sure you’ve got the right amount of spare capacity but not too much, etc.

Also, let’s not forget geo/availability zones.

Not saying co-located hardware is not always worth it - rather they seem to be aware of the trade offs, and are making a rational decision based on their business model.

Later, if they have switched from ‘rapid growth and adjustment’ to a more stable state where they can predict things more in advance, maybe they’ll switch. Maybe they won’t.

Like a large energy consumer, running on utility grid at a certain size in a certain area is often much better than rolling your own generation capacity. Sometimes it’s impossible or less cost effective. Sometimes it doesn’t make sense to even try to do the math, and just get hooked up to the grid.


It’s impossible to know the best approach without understanding the specifics. It’s true that adding more resources quickly when you own the hardware will take more, but given how much cheaper it is, you can seriously over provision. Using aws is probably going to be more efficient usage of hardware, that’s why it’s not even more expensive. The thing is that it’s often not the more efficient use of money, specially at that scale.


All these things have tradeoffs, but even with a kubernetes setup there is a variety ways you can configure it that will work better or worse for your workload.

Regarding being the core business, in this regard, that doesn't matter that much. You'll either pay amazon or hire your own team. In either case you'll be spending money in something that's not your "core" product. If you can replace amazon with a bespoke system for a fraction of the price and same resilience, why not?

Someone have said in this thread before, managing those things is not really rocket science. You can have a small, focused team who's able to manage a lot of resources and that can and often is cheaper than outsource. Obviously, it depends in a number of factors. Whether or not it's your core business doesn't seem a decisive one.


Most people (period) will be unable to effectively hire/build/retain a team who can competently build those associated pieces, let alone all of them necessary to operate effectively at this scale. It is why tech is hard, and remains hard, for the majority of companies, governments, etc.

If something is a core part of the business, 1) it’s something they either already have a demonstrated level of competency in, or they wouldn’t be in business, and 2) efficiencies and improvements here should make them more money in a direct and measurable way, and 3) attempting to outsource it exposes them significantly to counterparty risk that can put them out of business, which is generally not considered a good thing.

In some ways it’s like a factory that uses a lot of power. Should they build their own power plant or use the utility. That depends on many factors. Using the utility is often the better choice and works out better, but not always.

If the quality and price of the power is a core part of what makes the company competitive, probably - and that is going to be a key factor in where the factory is located, when it operates, etc.


Where did I gloss over them? I literally suggest spending 2 million a year on staff.


Without specific numbers it is a bit difficult to be as clear as I would like but I read your comment as suggesting that the savings from owning/hosting your own equipment could pay for the team needed to operate that solution -- but then what was the point of switching?

The devil is in the details, and I wouldn't say that it never makes sense to bring operations in-house, but your post didn't make a clear case from my point of view.


Well that's a great point of discussion. My illustration was that you could spend all the same budget and actually have 10 staff and a bunch of real hardware on your balance sheet, instead of just an enormous AWS bill. That's staff that might bring real talent and innovation to your company. And real hardware that will serve you even through financial hard times.

Or you could spend half the budget and in my opinion still be way ahead, but that depends on your execution and the talent pool that's available to you of course.


There's also opportunity cost. What if you instead of hiring a team of 10 people (probably need more, managers, distributed geographically etc), you hire more developers, marketing, sales and increase revenue. So instead of saving $2M, you make $3M more revenue. Making up numbers of course, but companies can estimate this. If that's the case it makes sense to keep paying AWS, until the ratio changes, and then they can cut costs.


> $2 million should get you a team of 10 sysadmins and devops engineers

Managing staff vs managing AWS ... I know what I'd choose (without really knowing the numbers)


It seems shocking to me, that you haven't yet migrated, however you know your costs/benefit ratio more than I do. Have you ever examined a split model, where some parts of the load are run on your own or rented dedicated servers, and some runs on AWS?

Separately to your comment about LVM... the LVM snapshot requires that a separate part of the volumes be set aside to hold the snapshot data.

If the snapshot volume fills up with changes being made to volume that holds your data before the snapshot completes, then the snapshot will fail.

This does not occur with ZFS as you have noticed.


With the price of the egress being what AWS charges, such a split may still not make economical sense if it crossed a data-intensive boundary.

Also, with many customers also using AWS, much of the traffic may not even leave a datacenter, improving speed, reliability, and maybe even cost.


> This does not occur with ZFS as you have noticed

I'm not sure exactly what "This" refers to? Just wanted to note that a ZFS snapshot can be destroyed when the parent pool runs out of space too, but you don't need to allocate a volume.


For LVM you usually have to pre-allocate the space. So perhaps you think that you will need 8GB to hold the changes during the snapshot operation, and it works great for 6 months until more data is added and 1 customer does a lot of small updates during their maintenance window, which overlaps with your backup schedule... and the snapshot operation fails. No data is lost in this case, but the back up doesn't finish.


One thing to consider is that when running on-prem, you will be able to procure disks that advertise 512B sector size, and actually performing well when doing so. With that, you can use 8KB record size on ZFS, and still obtain the same 5.5x compress ratio, assuming you are currently using 4KB sector size with 64KB record size on ZFS.

With 8KB record size, there would be no read/write amplification caused by postgres. More importantly, ZFS fragmentation would be much less of an issue.

Of course, disks procurement can be a nightmare process, which is why AWS is printing money.


Thank you for writing this article.

I’m curious: The engineers that brought these significant cost savings to your company, did they receive a share of the money saved?


how would you even allocate the money fairly besides rolling it into the company to keep it alive and successful (and maybe the added profit increases the bonus pool if that exists)?

The engineers didn't do it in isolation - how much of a share goes to the office receptionist that answered the phones and kept visitors out of the way of the engineers? How much goes to the Finance department that kept the engineering paychecks coming while they did they work? How much goes to the salespeople who kept the deals flowing and money coming in that filled the disks in the first place... and so on and so on.

Once a company exits the "a few devs in a garage" stage, many people contribute to the company's success.


Let's face it, increasing your employer's profit margin will only benefit the employees if the company is struggling and they were about to be laid off because costs were starting to eat into the margins. The only case where it does is with a bonus or with workers owning shares with dividends. "keeping it alive and successful" only matters if that wouldn't have happened otherwise and even then not much if you have good job mobility.

No need to allocate. Something like "our clever engineers managed to save us $2M per year, so we're giving everyone a $500 bonus this month" seams entirely reasonable.


I don't know about this company or if it's profitable, but in many small startups, saving money decreases the burn rate, which extends the time before the company runs out of money, can become profitable, or needs to close another round of funding.


I have worked in high tech since 1983. Over my career, I have received unsolicited bonuses, stock grants, and other perks, from time to time.

At the same time, when I thought I had made a material contribution to the company's success that was above and beyond my normal role, I would ask for compensation of one kind or another. At times, I was told no, and at other times, my request was granted. I believe Wayne Gretzky said something like "You miss 100 percent of the shots you don't take."

My lived experience is that life is not fair. In the organizations where I was an employee, owner or executive, it has never been the case that everyone, in their "heart of hearts," really believed everything was entirely fair.

I think it's right and proper to recognize and work to correct injustices, to the extent possible, yet I think it's also true that people vary all over the map and on a perhaps uncountable number of dimensions, and trying to achieve complete fairness is simply not possible.

I will also say that my belief that life is not fair does not require us to go through life constantly angry and frustrated.

I could be wrong, and I don't wish to put words in your mouth. Interested in your thoughts, if you want to say more.


[flagged]


I'm not sure how appropriate it is to take a serious comment where the author has a genuinely unpopular opinion and say you laughed really hard at it.


It's clearly not a serious comment, and I say this as someone who thinks there should be more workers' cooperatives in the tech industry.


I was and still am entirely serious. I don’t know you and doubt that you and I have ever said three words to each other. What’s your basis for your claim to clearly know my intent?

The article author works for a company that has 54 open positions. Thousands of capable developers will or have already read his article. Why can he not talk about how great it is to be a developer at his company? Seems like a missed opportunity to me.

Also, if you are an engineer that can save your company millions of dollars in real life, that’s worth something, isn’t it? You won’t get paid for it unless you have the courage to ask hard questions.

Your experience will obviously be different than mine. But please don’t pretend that you know me, or what I’m thinking when I offer a comment. It would be much better to show up with curiosity rather than condescension.

I wish you well.


Condescension is bad, but disingenuously affected curiosity is worse. I put it to you that your tone here makes it very clear that your original comment was not actually sincere and you were far from "curious". You're not doing your cause any favours.


Thanks for this comment, the reply from lmm made me doubt if I was misreading the situation. Now I'm glad I wrote what I did.


I interpreted that as Imm correctly identifying my comment as a joke.

Regardless of how sincere the original post was, my comment was very obviously (to me, at least) a swipe at the morally bankrupt world we inhabit and the fact that a company doing this of their own volition right now feels absurd - not a dig at the idea itself.

To the original point - I tend to agree with people elsewhere in the thread saying this particular idea is infeasible and ignores the behind-the-scenes contributions of other staff. I am however very much in favour of the related concept of making employee ownership mandatory in certain situations (think anything on a major stock exchange). In that scenario, the cost savings would "trickle down" to all employees.


Are you in the gap oxide computer is trying to fill?


curious to know what instance type are you using for AWS? Are they i3 instances?


Author here, happy to answer any questions.


Hey all, author here. This project was a fun demonstration of the versatility of Redis. Happy to answer any questions.


Hey! Maybe I'm missing something, but why do you need to lock Kafka consumers? You can run multiple consumers within the same consumer group and get built-in high-availability (partition rebalancing when one consumer goes down)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: