Hacker Newsnew | past | comments | ask | show | jobs | submit | tracker1's commentslogin

I think the point is that subdomains are fine, assuming you ARE using a separate IP address for said subdomains.

I'm mostly with you here... it's amazing how many devs don't have a certain amount of baseline knowledge to understand file-io, let alone thin abstractions for custom data and indexing like tfa. Then again, most devs also don't understand the impacts of database normalization under load either.

FWIW, you can do some things like this on top of S3 Metadata.

Transactions are one thing I want the most, and that's not going to happen on S3. Sure, I can reinvent them by hand, but the point is I want that baked in.

Yeah, closest thing there is MS-SQL FILESTREAM, but even that has flaws and severe limitations... you can do similar transaction implemenations for binary data storage in any given RDBMS, or do similarly via behavior to read/write to filestream along with a transactional row lock that corresponds to the underlying data. But that gets its' own complexities.

Definitely appreciate the post and the discussion that has come from it... While I'm still included to just reach for SQLite as a near starting point, it's often worth considering depending on your needs.

In practice, I almost always separate the auth chain from the service chain(s) in that if auth gets kicked over under a DDoS, at least already authenticated users stand a chance of still being able to use the apps. I've also designed auth system essentially abstracted to key/value storage with adapters for differing databases (including SQLite) for deployments...

Would be interested to see how LevelDB might perform for your testing case, in that it seems to be a decent option for how your example is using data.


I worked in an org where a lot of records were denormalized to be used in a search database... since I went through that level of work anyway, I also fed the exports into S3 records for a "just in case" backup. That backup path became really useful in practice, since there was a need for eventually a "pending" version of records, separate from the "published" version.

In practice, the records themselves took no less than 30 joins for a flat view of the record data that was needed for a single view of what could/should have been one somewhat denormalied record in practice. In the early 2010's that meant the main database was often kicked over under load, and it took a lot of effort to add in appropriate caching and the search db, that wound up handling most of the load on a smaller server.


For me, I just wish MongoDB had scaling options closer to how Elatic/Cassandra and other horizontally scalable databases work, in that the data is sharded in a circle with redundancy metrics... as opposed to Mongo, which afaik is still limited to either sharding or replication (or layers of them). FWIW, I wish that RethinkDB had seen more attention and success and for that matter might be more included to use CockroachDB over Mongo, where I can get some of the scaling features while still being able to have some level of structured data.

Even then... I'd argue for at least LevelDB over raw jsonl files... and I say this as someone who would regularly do ETL and backups to jsonl file formats in prior jobs.

I'd argue for using LevelDB or similar if I just wanted to store arbitrary data based on a single indexable value like TFA. That said, I'd probably just default to SQLite myself since the access, backup, restore patterns are relatively well known and that you can port/grow your access via service layers that include Turso or Cloudflare D1, etc.

Embedded KV stores like LevelDB are great for what they are, but I’ve often found that I’ll need to add an index to search the data in a different way.

And then another index. And at some point you want to ensure uniqueness or some other constraint.

And then you’re rewriting a half-complete and buggy SQLite. So I’ve come around to defaulting to SQLite/PostgresQL unless I have a compelling need otherwise. They’re usually the right long-term choice for my needs.


Absolutely... I was just bringing it up, as it seems to have in the box support for a lot of what TFA is discussing. I'm generally more inclined to just use SQLite most of the time anyway.

That it's now in the box (node:sqlite) for Deno/TS makes it that much more of an easy button option.


I generally limit Oracle to where you are in a position to have a dedicated team to the design, deployment and management of just database operations. I'm not really a fan of Oracle in general, but if you're in a position to spend upwards of $1m/yr or more for dedicated db staff, then it's probably worth considering.

Even then, PostgreSQL and even MS-SQL are often decent alternatives for most use cases.


That was true years ago but these days there's the autonomous database offering, where DB operations are almost all automated. You can rent them in the cloud and you just get the connection strings/wallet and go. Examples of stuff it automates: backups, scaling up/down, (as mentioned) adding indexes automatically, query plan A/B testing to catch bad replans, you can pin plans if you need to, rolling upgrades without downtime, automated application of security patches (if you want that), etc.

So yeah running a relational DB used to be quite high effort but it got a lot better over time.


At that point, you can say the same for PostgreSQL, which is more broadly supported across all major and minor cloud platforms with similar features and I'm assuming a lower cost and barrier of entry. This is without signing with Oracle, Inc... which tends to bring a lot of lock-in behaviors that come with those feature sets.

TBF, I haven't had to use Oracle in about a decade at this point... so I'm not sure how well it competes... My experiences with the corporate entity itself leave a lot to be desired, let alone just getting setup/started with local connectivity has always been what I considered extremely painful vs common alternatives. MS-SQL was always really nice to get setup, but more recently has had a lot of difficulties, in particular with docker/dev instances and more under arm (mac) than alternatives.

I'm a pretty big fan of PG, which is, again, very widely available and supported.


Autonomous DB can run on-premises or in any cloud, not just Oracle's cloud. So it's not quite the same.

I think PG doesn't have most of the features I named, I'm pretty sure it doesn't have integrated queues for example (SELECT FOR UPDATE SKIP LOCKED isn't an MQ system), but also, bear in mind the "postgres" cloud vendors sell is often not actually Postgres. They've forked it and are exploiting the weak trademark protection, so people can end up more locked in than they think. In the past one cloud even shipped a transaction isolation bug in something they were calling managed Postgres, that didn't exist upstream! So then you're stuck with both a single DB and a single cloud.

Local dev is the same as other DBs:

    docker run -d --name <oracle-db> container-registry.oracle.com/database/free:latest
See https://container-registry.oracle.com

Works on Intel and ARM. I develop on an ARM Mac without issue. It starts up in a few seconds.

Cost isn't necessarily much lower. At one point I specced out a DB equivalent to what a managed Postgres would cost for OpenAI's reported workload:

> I knocked up an estimate using Azure's pricing calculator and the numbers they provide, assuming 5TB of data (under-estimate) and HA option. Even with a 1 year reservation @40% discount they'd be paying (list price) around $350k/month. For that amount you can rent a dedicated Oracle/ExaData cluster with 192 cores! That's got all kinds of fancy hardware optimizations like a dedicated intra-cluster replication network, RDMA between nodes, predicate pushdown etc. It's going to perform better, and have way more features that would relieve their operational headache.


In the spirit of helpfulness (not pedantry) FYI "knocked up" means "impregnated". Maybe "put together"?

Ah, this must be a British vs American English thing, thanks for the info.

Yes I meant it in this sense: "If you knock something up, you make it or build it very quickly, using whatever materials are available."

https://www.collinsdictionary.com/dictionary/english/knock-u...


And, again... most of my issues are with Oracle, Inc. So technical advantages are less of a consideration.

I think part of it is the scale in terms of the past decade and a half... The hardware and vertical scale you could get in 2010 is dramatically different than today.

A lot of the bespoke no-sql data stores really started to come to the forefront around 2010 or so. At that time, having 8 cpu cores and 10k rpm SAS spinning drives was a high end server. Today, we have well over 100 cores, with TBs of RAM and PCIe Gen 4/5 NVME storage (u.x) that is thousands of times faster and has a total cost lower than the servers from 2010 or so that your average laptop can outclass today.

You can vertically scale a traditional RDBMS like PostgreSQL to an extreme degree... Not to mention utilizing features like JSONB where you can have denormalized tables within a structured world. This makes it even harder to really justify using NoSQL/NewSQL databases. The main bottlenecks are easier to overcome if you relax normalization where necessary.

There's also the consideration of specialized databases or alternative databases where data is echo'd to for the purposes of logging, metrics or reporting. Not to mention, certain layers of appropriate caching, which can still be less complex than some multi-database approaches.


What about the microservices/serverless functions world? This was another common topic over the years, that using SQL with this type of system was not optimal, I believe the issue being the connections to the SQL database and stuff.

I think a lot of the deference to microservices/serverless is for similar reasons... you can work around some of this if you use a connection proxy, which is pretty common for PostgreSQL...

That said, I've leaned into avoiding breaking up a lot of microservices unless/until you need them... I'm also not opposed to combining CQRS style workflows if/when you do need micro services. Usually if you need them, you're either breaking off certain compute/logic workflows first where the async/queued nature lends itself to your needs. My limited experience with a heavy micro-service application combined with GraphQL was somewhat painful in that the infrastructure and orchestration weren't appropriately backed by dedicated teams leading to excess complexity and job duties for a project that would have scaled just fine in a more monolithic approach.

YMMV depending on your specific needs, of course. You can also have microservices call natural services that have better connection sharing heuristics depending again on your infrastructure and needs... I've got worker pools that mostly operate of a queue, perform heavy compute loads then interact with the same API service(s) as everything else.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: