Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[flagged] Simplify: move code into database functions (2015) (sivers.org)
32 points by nsomaru on Oct 27, 2019 | hide | past | favorite | 43 comments


Don't do this.

Don't FUCKING do this.

I've spent the last 15 months untangling a mess of an e-commerce system that was built exactly like this. The problem is SQL is not a programming language and the only way to accomplish some things directly in the database is by creating solutions with O(n^2) or worse running-time. Our system has queries with O(n^4) time complexity. Less than two weeks ago, I had to untangle a query that was over 3,100 lines long with 18 sub-queries and was able to refactor it into about 100 lines of SQL and another 100 lines of code. Making matters worse, changes to stored procedures don't show up in git, and can break some regression-testing tools, making it very difficult to track changes and hunt down bugs.

Use the right tool for the job: Don't treat every problem like a nail once you have a hammer. Use SQL for data and your language of choice for logic. Don't create a monstrosity that someone else will have to spend weeks fixing just because you can.


> Making matters worse, changes to stored procedures don't show up in git, and can break some regression-testing tools, making it very difficult to track changes and hunt down bugs.

If stores procedures become part of the system, they belong into the SCM as well. The deployment process has to include the stored procedures as well. Different to tables and other constructs that have state, stored procedures can be easily migrated.

In general I agree that stored procedures massively underdeliver in many aspects, but not properly treating the database as the application server it has become is part of the problem.


Every horror you've just described can happen up in the app layer as well.

    The problem is SQL is not a programming language and the only way to 
    accomplish some things directly in the database is by creating solutions 
    with O(n^2) or worse running-time. Our system has queries with O(n^4) 
    time complexity.
Two observations here.

1. The approach being espoused is not, "Move everything down into the database layer at all costs, no exceptions." It's more like, "Many things are simpler and faster at the database layer."

2. These O(n^2) (and n^3, n^4, whatever) horrors are at least as common at the app layer. I have dealt with my fair share at every possible level of the stack.

    I had to untangle a query that was over 3,100 lines long with 18 sub-queries
I just want to say: ouch. I am sorry!


You say "could happen" he says "must happen". SQL won't be able to lower the complexity needed. But in the app level it is just a matter of being good programmer or under good review process.


Datomic solves this tension by letting you upload Clojure code into the database to run as (basically) a stored procedure. So there isn’t a tension between programming language and query language, there’s no network between them, they compose directly and are interchangeable. Do what makes your code simplest. Program your database with the usual data structures, if-statements, function composition, the standard library you’re used to, and share library functions between your backend and database at will (even compile them to ClojureScript and run them in the UI!)


Yup. Same with Neo4j. You can write your stored procedures in Java ( or any JVM) language. Import any library you want, go crazy.


I wasn't clear enough. in Datomic, you upload your entire application code (web server and all) into the database and run it from the same JVM cluster as the database in the same memory space as the database. There is no opportunity for dichotomy between application logic and database logic because the database is serving the application (or the application is serving the database...this is now just a matter of perspective)


Yes, PostgreSQL does this too, you can use JVM/Python/Node etc. But it's not really geared towards being your app server like Datomic is.


Same with MSSQL. You can write stored procedures in C#


If you could refactor the code into smaller and faster SQL, that means the code was badly written, not that SQL (and stored procedure language) itself is bad.

Also almost every algorithm can be written in PLSQL if needed, so yuu can surely improve on the O(n^4) time complexity.

I agree with the procedure change tracking, that's much harder, but server side programs can be extremely responsive (low latency) and powerful.


He never said SQL was bad, he said use it to query data because that's what it is good at.


I was using it and PL/pgSQL together to modify data and although I had to think differently, working in the relational space even with modifying data is awsome (of course you have to use cursors).


It always helps to consider the context. This is a personal website for Derek. You can do all kinds of crazy stuff and get away with it in this context, because you don't have to work with others.

If you're working on a big project which requires a team of people (or may grow into a team), then yeah, don't do this. Keep your systems as idiomatic, unsurprising, and consistent as possible, unless you have a very good reason not to. The less cognitive load, the easier and faster it is for others to help without making mistakes.


I do this and I rarely have a stored procedure that's bigger than 25 LOC. Most are around ~5 LOC and look like this: https://gist.github.com/ericbets/a6a82041a9fd9181ee52ac0d346...


> Making matters worse, changes to stored procedures don't show up in git

This is not necessarily true. It depends on your environment. You can track changes to db with the right tools.

Rest of your points I completely agree. Don't put unnecessary complexity in your db. It's a database, not a code repo.


I don't like migrations, it's like doing version control with dated copies of files.

I wish there were (perhaps I'll be told there is) a migration system that used a single file per table, and was git-aware so that when the live schema didn't match the checked out schema file, it would determine a migration from the historical version that it does match.

Alas, I don't think it's possible, because such a file would describe a state, not, as migration files do, transitions between states. It's tempting to imagine that it's possible to robustly infer transitions from one to another, but I've used Django.


While I think what you want is understandable - it's also extremely dangerous. You'd no longer be absolutely certain you're executing the same migration in production as on the testing environments.

That's exactly why I think Django's migrations are insanely good. Simple ones are autogenerated and everything else is possible as well, with either python code or direct sql queries


- Liquibase expects the scemea to be modeled in its schema language. This makes it possible to compute changes between the database and the new schema and apply those changes, or to just create old-fashioned migration scripts.

- Th. Doctrine and Hibernate ORMs can do similar things.

- Flyway expects developers to write migrations by themselves, either in Java or in SQL, and just executes them and records state information. It's not too bad to write migrations yourself because the data you work with is usually valuable.


Yes, they can do similar things. However, there are several cases which you cannot handle automatically, due to incomplete information:

- The current schema has a column A and the new version does not have A but a similar column B. Is this a new column? Or should A be renamed into B? (same for renaming tables) - You add a new table A which contains columns from B, which are also removed from B. Should this data be copied? - You change the structure of a table, and have some of it split off into different tables. How does the schema tool know how to rearrange the current data?

There are probably some more cases, but these are the ones I could think of quickly.


Stored procedures MUST be stored in the VCS. On every upgrade, the older set of procedures MUST be fully deleted and the new set of procedures MUST be installed.

Otherwise, you have just created a huge hole in your buuild reproducibility. SQL (including pl/SQL) is a programming language, therefore treat it like a programming language - keep all of it in a versioning system and ensure that you can trace every piece of code that runs inside your database to a piece of code in your VCS.


Making matters worse, changes to stored procedures don't show up in git...

They do if you use database migrations.


Isn't PL/SQL a full fledged programming language?


I sincerely hope this will be downvoted out of sight so we won't be dealing with more dog slow sites. Moving business logic to essentially non-scaleable layer is a terrible idea, benefits are so small compared to drawbacks


It's not that as simple as all that. Often the most expensive part of a database transaction is all the round trips back and forth while holding locks with an open transaction in the database. Move that logic into a stored procedure and suddenly your business logic can run in less than the time of a couple round trips to the database. Lock contention drops, number of concurrent transactions drops, and performance could actually increase. You scale the system vertically as long as you can, and then start partitioning and scaling it horizontally. You can go pretty far like that.

There are lots of other reasons this is a questionable idea at best, but performance is not necessarily one of them.


    Move that logic into a stored procedure and suddenly 
    your business logic can run in less than the time of 
    a couple round trips to the database. Lock contention 
    drops, number of concurrent transactions drops, and
    performance could actually increase
Yes! A thousand times yes. This is especially important for write-heavy workloads. Releasing those locks and finishing those transactions ASAP is crucial.

    There are lots of other reasons this is a questionable 
    idea at best, but performance is not necessarily one of 
    them. 
When it comes to scalability.... yeah I think the 90/10 rule applies here. Probably the bulk of one's code can comfortably live up at the app layer. And the 10% of your code that's causing 90% of the bottlenecks is a candidate for solutions such as moving it down into the data layer, when feasible.


Performance is the primary benefit for adopting the author's approach, actually.

Normally the bottleneck of scaling an application is data access and storage, there aren't typically a lot of problems that are CPU bound.

Adding additional layers doesn't increase performance, it increases latency and processing time. The closer you can keep your processing logic to the actual data, the better your performance is going to be.

That being said, there are reasons why you may not want to follow that architecture, but generally performance isn't going to be one of them.


It’s not a bad idea because it’s not well supported by current tooling and practices, not because of scalability. The scalability problem is tractable.


One danger with overuse of stored procs is unit tests; they're hard to write in sql, and also hard to keep in sync with tests in code. There are also dangers with ninja patching stored procs without corresponding code deploys.

It does make query development much easier to use a database specific IDE for writing the queries versus munging big strings in code.


> There are also dangers with ninja patching stored procs without corresponding code deploys.

This just means that modifying sprocs has to be treated with the same caution as deploments. Unit testing is truly difficult though. It doesn't help that many DBMS products (looking at you, Postgres) don't offer an embedded mode.


As someone currently untangling a monster sproc at work--Just no.


Plsql, T-SQL, and it's ilk are poorly suited to complex logic.

One of the best features of pg is it's ability to write db logic in your language of choice.

The performance characteristics of these languages meets or exceeds the execution time of plsql in many cases.


It does feel like this whole concept is predicated on chasing the new hotness in languages or frameworks.

In practice, if you’re thinking of changing at that level, the problem probably lies elsewhere. Unless your app is some special snowflake (note: 99.99% aren’t), any of the popular languages/frameworks will do just fine for a very long time.

The only move Sivers arguably needed to make here was away from Perl. To Ruby, PHP or JavaScript, but not all 3.

If those extra moves hadn’t happened, this wouldn’t even be a thing.


If you do this, you will regret it and it will cost you money.


There are a lot of advantages to the approach the author is discussing, but some of the drawbacks are (as he mentions) the poor syntax of plsql, difficulty with SCC and versioning, and some others.

I wrote pgproxy[1] in an attempt to overcome some of those issues. So far, it's worked pretty well for me.

[1] https://github.com/claytongulick/pgproxy


A way to develop along those lines without much of the potential hassle is exposed in Dimitri Fontaine's books.

https://tapoueh.org/


The motivation for this idea was to make it easier to rewrite everything else. > In 1997, I started in Perl. In 1998, I switched to PHP. In 2004, a rewrite in Rails. In 2007, back to PHP. In 2009, minimalist Ruby. In 2012, client-side JavaScript.


Rewriting something that works is a terrible idea in the first place.


Discussion is already ongoing in this thread: https://news.ycombinator.com/item?id=21362190


After looking into solutions like Hasura and Postgraphile, this doesn't seem like such a bad idea.


This is making a simple thing complex in order to make complex things simple.


Usually the recommendation and the simpler approach is quite the opposite: You can't version control, deploy and generally handle database function as easily as a piece of code. My experience is that nowadays more and more code is pushed to the ORM or database handling code level because it greatly simplifies everything.


ORM's have improved considerably over the years, but I've always been a bit hesitant to use them.

I've seen a lot of developers begin to believe in the abstraction, and forget the underlying access.

This has caused performance problems with increased round trips to the server, non-obvious indexing strategy issues, and code tightly coupled to the ORM rather than the data.

I get especially suspicious with framework-type ORMs that can cause a massive rewrite between major versions. It's costly, time consuming, and doesn't provide any application benefit, it's only new forms of abstraction.

ORMs are seductive, because a lot of these issues aren't obvious when you first start using them. The marketing is usually pretty compelling. Unfortunately, by the time you start to run into issues I've described, it's normally too late to change paths.


The author's not advocating anything as drastic as abandoning the application layer entirely, of course.

These aren't mutually exclusive approaches. You can use an ORM like ActiveRecord to manage migrations and move functionality down to the database layer when reasonable to do so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: