Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.

ORM is a tool that can make this easier, it's also a tool that can make it easier to shoot yourself in the foot (ie by making it easy to create N+1 queries without knowing). Like all tools, there are tradeoffs that need to be accounted for together with the actual use case to make a decision. Operating on SQL statement strings is not something I'd recommend in any case.



Different people have wildly different experiences with ORMs as they used them for wildly different levels of integration and tasks, and in wildly different languages with different features that make ORMs more or less useful, yet there's always someone willing to go out of their way and ignore all that and make absolutest statements about how it's good or bad.

We should just learn to recognize it for what it is, someone that's being controversial for attention, and move on. Let's save our attention for the ORM article and discussion that starts out along the lines of "ORMs can provide benefit, but it's important to recognize where, and not let the problems of their use outweigh their benefits. Here's what I've found."


> Different people have wildly different experiences with ORMs as they used them for wildly different levels of integration and tasks, and in wildly different languages with different features that make ORMs more or less useful, yet there's always someone willing to go out of their way and ignore all that and make absolutest statements about how it's good or bad.

I don't have a horse in this race, I could care less if you do or don't use an ORM. But, and maybe this is the cynic in me, there are practices in software development that absolutely, 100%, for certain, have no good reason and are perpetuated in part by this belief that there must have been a good reason for it to exist (Chesterton's fence and all that).

Null terminated C strings are a prime example. There is absolutely no good reason, other than the fact that the authors may have wanted to save 3 bytes, that C strings should be null terminated. Fortran was created in 1954, and passed the length of the string with the string itself. How many countless bugs and CVEs have risen due to errors in handling null terminated C strings (one example[0])? And for what? To save 3 bytes or just because of the authors decision at a whim's notice?

Likewise, decisions made at Javascript's inception have burdened it for its entire life. Decisions that were made at a whim's notice, like implicitly converting numbers to strings sometimes, and sometimes implicitly converting strings to numbers! (Tell me what `10 + "10"` is and what `"10" + 10` is without using the inspector). And the million and one ways to define something that's undefined.

Anyways, when somebody tells me there's absolutely no good reason for a development practice to exist, sometimes, that is the absolute truth. And I would rather have more people throwing away these crummy practices that lead to unnecessary headaches (or at least questioning them) then people continuing to laud the practice and perpetuate it ad infinitum.

[0]: https://defendtheweb.net/article/common-php-attacks-poison-n...


> Null terminated C strings are a prime example. There is absolutely no good reason, other than the fact that the authors may have wanted to save 3 bytes

Or, you know, they wanted to make the language track assembly as closely as possibly (which was possibly back at that time when processors were much simpler and instructions weren't constantly reordered), and if you've ever written assembly, you know you're not really working at a string level, you're working at a byte and character level. Sized strings are more complex than null terminated ones in that you either need to set a max size or you need to waste multiple bytes per string (which actually mattered on many systems C was used on when it was developed) or you have to use masks on those early bytes to or determine if the next byte is part of the string or a continuation of the size.

And honestly, when you're working on systems where ram (and maybe storage) is in kilobytes or less and speed is in kilohertz, the extra code to do that and the extra time to process them and the extra space to store sized strings is a lot less of an obvious choice to make.

Was C a good choice for the time and where it was used? Possibly. Is it a good choice these days? Probably not without a bunch of extra utils and compiler guards to beat back the worst problems. I do t blame C as much for that as I do the people that continue to use it without extra safeguards.

What was that you said about Chesterton's fence? That's one of those terms you should be careful about throwing around when supporting an absolutist position...


I could almost buy your argument, except for the fact that Fortran, which was created in 1954 when systems like the IBM 650 had a maximum of 35KB of memory[0] (which I'm assuming included program memory), and it still included the size of the string with the string as a convention.

But that's just me guessing. There's no reason for us to do that when Dennis Ritchie wrote down the reason for this:

>> This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.[1]

So this was a change made primarily for convenience. And if the limitations of 255 characters was really a huge blocker, they could have easily created a spec like UTF8 to allow variable length encoding depending on the size of the string, which funnily enough Ken Thompson who also worked with Ritchie, later did invent. You mentioned that the processing time would have been an issue, but C strings require you to process the entire length of the string to determine the length, and Ritchie notes that as an additional tradeoff for this convention.

But that wasn't done. And I can't blame Ritchie for that either, because he didn't think this language would become what it is today! Later on in the paper he alludes to this:

>> C has become successful to an extent far surpassing any early expectations[1]

All throughout the paper you can see him referring to decisions that were made out of convenience, and not because he had done extensive analysis to determine whether the tradeoff for the convenience was worth it:

>> Two ideas are most characteristic of C among languages of its class: the relationship between arrays and pointers, and the way in which declaration syntax mimics expression syntax...In both cases, historical accidents or mistakes have exacerbated their difficulty.

>> C treats strings as arrays of characters conventionally terminated by a marker...and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type.

All that to say, yes of course there were reasons that decisions were made the way they were. But, and this is what I've noticed more and more in programming communities, these decisions are often made with little to no analysis and usually made out of a subjective preference, or to make the implementors life a tad easier. So, yea, I think it's right to call out a lot of "best practices" because history has shown that programmers really don't put too much thought into their decisions. And then you end up with gurus proclaiming that a decision made out of convenience was actually the best decision available and we should never change the way we do things because clearly this is the right way.

[0]: https://en.m.wikipedia.org/wiki/IBM_650

[1]: https://www.bell-labs.com/usr/dmr/www/chist.html


> I could almost buy your argument, except for the fact that Fortran

Fortran was not created for the same purpose. Fortran existing as an invalidation of C's choices is like Java existing being an invalidation of C++'s choices. There's a reason Fortran and Java are not common choices to write in OS kernel in, while C and C++ are/were. There's a reason why C and C++ aren't often used for web development, but Interpreted languages are. Different design choices fir different niches better or worse.

> So this was a change made primarily for convenience.

Convenience can mean a lot of things, and in this context and in the absence of contrary evidence I interpret that statement to be entirely inline with what I said above. It was inconvenient to have a more complex type to deal with, for multiple reasons. I'm not sure why you would think it different, it's not like I said it could not work the other way, just that there were things that went into the reasoning that made it less obvious than in today's world.

> You mentioned that the processing time would have been an issue, but C strings require you to process the entire length of the string to determine the length, and Ritchie notes that as an additional tradeoff for this convention.

Yes, tradeoff. If what you're doing with strings most the time is parsing them, knowing the length ahead of time may be of little benefit, since you're going to step through them character by character anyway. For many operations, knowing the length ahead of time is irrelevant.

> decisions that were made out of convenience, and not because he had done extensive analysis

I'm not sure anyone is arguing they used extensive analysis. I'm certainly not. But when the reality you live and work in is that you are running up against real hardware constraints routinely, that's bound to affect your ad-hoc reasoning about what choice to make when you don't do extensive analysis.

> All that to say, yes of course there were reasons that decisions were made the way they were.

Given that this started because you wanted to support absolutist statements with "there is absolutely no good reason, other than the fact that the authors may have wanted to save 3 bytes, that C strings should be null terminated." and your prime example has now been walked back to the fact that yes, there were some considerations beyond that, including making it a simple language to describe, I think you've proved my original point.

Who is to say that C's simple description and ease of implementation for additional architectures isn't a major factor in it's success and spread? And yes, while we've been paying the price for that for quite a while now, it may also have allowed for a level of software portability that really helped advance computers beyond where they would currently be otherwise.

I don't like C all that much and I don't use it for anything, but I'm also willing to note that it must have done quite a few things right to get to where it did, and I'm not willing to call out any aspect of it as completely without merit while still acknowledging the immense benefit the language brought as a whole, because at that point we're getting into conjecture about alternate histories.


Fair enough :)

Absolute statements are generally bad ideas, and I've just been reminded of why that is the case haha.


> Decisions that were made at a whim's notice, like implicitly converting numbers to strings sometimes, and sometimes implicitly converting strings to numbers!

I see this as a problem that's horribly inflated by those who don't use JS on a daily basis.

Practitioners of the language largely don't care, because in actual code you rarely see cases where it would matter.

Now that we have template strings it's even less relevant.


I use TypeScript on a daily basis for my job. It was an entire language created to make up for JS's shortcomings. There's no horribly inflated reasoning going on when most of the industry has decided the best idea is to just throw away the language and use a different one that transpiles to it.


> Null terminated C strings are a prime example

It’s funny - it’s almost like C never had a string type and null-termination is more a convention than a primitive data type.

There’s nothing that prevents us from having Pascal-like strings as much as we want, provided we know we’ll need to reimplement everything we need.


Isn't that the whole point of this discussion? We're talking about conventions programmers treat as absolutes that are detrimental to the maintenance and security of the programs.

So I don't see how this has much bearing on the ultimate point I'm making, which is that yes, conventions can be bad haha.


what have you found?


> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries

I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.

Whether it's an ORM or a home-built query compositor or whatever, one thing I know from experience is that once your application is "sufficiently complex" that you start to (incorrectly) believe you need this, your application has become too complex to use it reliably.

You absolutely will be mistakenly evaluating these builders in the wrong layers, iterating results without realising you're generating N+1 queries, etc.

You don't need an ORM.


> I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.

Whichever layer you put it in, you need a way to compose two query fragments. Otherwise you have to write N*M queries manually instead of N+M query fragments.

Either you use an ORM to help you with this or you don't, with all the usual tradeoffs about using a library or not. But you still have to solve the problem. (Or you do a lot of tedious copy-paste work - with all the usual tradeoffs of that)


usually, every interaction with your database is its own unique code path and query

it's pretty rare for queries to be dynamically composed from arbitrary sub-queries

if this is a problem you need to solve then ORMs certainly make more sense, but even in this case I find query builders to be more effective

the interface between the application and the DB is actually a string! it's not an abstract data type, it doesn't benefit from being modeled by types


You never built admin interfaces, faceted search or dynamic query filters?


sure, sometimes, rarely -- these are exceptions, not rules

in general, it should not be possible for user input to produce arbitrarily complex queries against your database

each input element in an HTML form should map to a well-defined parameter of a SQL query builder, like, you shouldn't be dynamically composing sub-queries based on the value of a text field, the value should add a where or join or whatever other clause to the single well-defined query

sometimes this isn't possible but these should be super rare exceptions


I prefer using something like Rails or Django to build 10 fully working CRUD interfaces with well-defined yet dynamic filters in a day instead of spending two weeks needlessly writing the equivalent code by hand.


why would it take you two weeks to write 10 SQL simple queries?


10 simple CRUDs you mean? With dynamic filters, admin UI, auth, tables, and so on? Because these frameworks allow you to do that in a single day.


i'm running out of ways to say that a CRUD endpoint should not have dynamism in the sense that you mean

/users/:id should map to 1 endpoint that's parameterized on userid

/search?userid=:userid&tag=:tag should map to 1 endpoint that's parameterized on userid and tag(s)

endpoints should be simple to write


You’ve never actually implemented a real world implementation, have you?

You’re going to have parameters that are compound. You’re going to end up filtering on objects 3 relations removed, or deal with nasty syncing of normalization. You’ll have endpoints with generic relations, like file uploads, where the parent isnt a foreign key.

It’s going to be a mess. They will NOT always be simple to write.


Yeah and programs should also be simple, but there would be no value to them that way.


Have you ever heard of APIs?


> it's pretty rare for queries to be dynamically composed from arbitrary sub-queries

I'm talking static, not dynamic. You still need to compose two pieces together into a single query, and you can either use an ORM to help with that or not.

> the interface between the application and the DB is actually a string! it's not an abstract data type, it doesn't benefit from being modeled by types

No it isn't. You can't send an arbitrary string to the database and expect it to work. At the very least you benefit from having an interface that's structured enough to tell you whether your parentheses are balanced and your quotes are matched rather than having to figure that out at runtime.


huh?

when your app queries the db, the query is not composed from several pieces, it is well-defined in the relevant method

    fn search(q string) -> result
        return db.query(`SELECT id, text FROM table WHERE text LIKE $1;`, q)
this is a single query, not multiple

the db accepts a string and parses it to an AST, it does not accept a typed value

this means the interface is the string

unbalanced parens and whatever other invalid syntax is obviously caught by tests


> when your app queries the db, the query is not composed from several pieces, it is well-defined in the relevant method

> this is a single query, not multiple

And when you want to query for multiple related things together, the whole point of having a relational database? For different purposes you need different views on your data, and those views are generally constructed out of a bunch of shared fragments; you can either figure out a way to share them, or copy-paste them everywhere you use them.

> the db accepts a string and parses it to an AST, it does not accept a typed value

> this means the interface is the string

The DB accepts a structured query, not a string. It might be represented as a string on the wire, but if that was what mattered then we'd use byte arrays for all our variables since everything's a byte array at runtime.

> unbalanced parens and whatever other invalid syntax is obviously caught by tests

Tests are a poor substitute for types.


i'm not sure what you're thinking about when you say "multiple related things"

every "view" on your DB should be modeled as a separate function

every possible "thing" that's input to a function which queries the database should be transformed into a part of the query string by that function

> The DB accepts a structured query, not a string. It might be represented as a string on the wire, but if that was what mattered then we'd use byte arrays for all our variables since everything's a byte array at runtime.

...no

the API literally receives a string and passes it directly to the DB's query parser

if the DB accepted structured queries, then the API would rely on something like protobuf to parse raw bytes to native types -- it doesn't

like `echo SELECT * FROM whatever; | psql` does not parse the `SELECT * FROM whatever;` string to a structured type, it sends the string directly to the database


> every "view" on your DB should be modeled as a separate function

OK, and when a significant amount of what those functions do is shared, how do you share it? (E.g. imagine we're building, IDK, some kind of CMS, and in one view we have authors (based on some criteria) and posts by those authors, and in another we have tags and posts under those tags, and in another we have date ranges and posts in that date range. How do you share the "fetching posts" bit of SQL between those three different (parameterized) queries?)

> if the DB accepted structured queries, then the API would rely on something like protobuf to parse raw bytes to native types -- it doesn't

> like `echo SELECT * FROM whatever; | psql` does not parse the `SELECT * FROM whatever;` string to a structured type, it sends the string directly to the database

In both cases the parsing happens on the server, not the client. "echo abcde | psql" and "curl -D abcde http://my-protobuf-service/" are both doing the same kind of thing - passing an unstructured string to a server which will fail to parse it and give some kind of error - and both equally useless.


> How do you share the "fetching posts" bit of SQL between those three different (parameterized) queries?)

your application has a fetch posts method, that method takes input including (optional) author(s), tag(s), etc., it builds a query that includes WHERE clauses for every provided parameter

the code that converts an author to a WHERE clause can be a function, the point is it outputs a string, or something that is input to a builder and results in a string

i'm not sure what a "fetching posts bit of SQL" is, a query selects specific rows, qualified by where clauses that filter the result set, joins that modify it, etc.


> the code that converts an author to a WHERE clause can be a function, the point is it outputs a string, or something that is input to a builder and results in a string

So you do string->string processing, and you explicitly won't treat the queries you're generating as structured values? Enjoy your rampant SQL injection vulnerabilities.


no?

creating a query string that's parameterized on input usually means you model those input parameters as `?` or `$1` or whatever, and provide them explicitly as part of the query

nobody is doing printfs of values


Ok so now your WHERE clauses are no longer strings, you have to have some structured representation of them that knows which parameters go with which clauses, and something that understands the structure of your queries enough to line up the parameters when you stitch together different WHERE clauses to make your full query - exactly the kind of thing you were saying was unnecessary.


Wait, why does every query fragment need to compose with every other? What are the N and Ms here?


Say you create a nice little utility function that makes a call to the db func1 and you write func2 that also makes a db query. If you have a func3 that needs func1 and func2 conceptually your options are:

* Accept that this will just be two trips to the DB.

* Write a new function func12 that writes a query to return all the needed data in one query and use that instead.

* Have your tool be able to automatically compose the queries.

If you do with the second option you have to do that with every combination of functions that end up being used together which is multiplicative in general.


Well sure but that not necessary for every combination of every call. Most likely those types of combinations are rare.


This is a problem, seems like ctes and views are a possible ormless solution here


CTEs and views are both a bit bigger than the parts that you would normally want to share and reuse, and they're also not very well standardised. Plus no-one really agrees on how dynamic tables should be, so you end up with the Excel maintainability problem of no real separation between code and data.


Following this advice too closely and your application logic starts leaking into your data layer.

One simple example is when you need to make atomic changes to two different types of entities. In the data layer, this is usually trivial — just run two queries within a transaction — but you need to expose a function that boils down to `updateBothAandB`. Rinse and repeat enough times and your data layer is a soup of business logic.


Exactly, in my opinion, separating your business logic from your data access layer completely is futile. You will need to either have business logic in your data access layer or fine grained query controls (when to add a filter clause, when to execute a query, what should go into a transaction, etc) in your business logic for performance reasons. This is OK, it's incidental complexity.


There are patterns to resolve this. Depending how you structure your DAL, your data access classes can provide transaction handles. It's a data access layer, you don't need to abstract away the fact that you're dealing with a database.


I mean, yeah, but the patterns all look like the thing you said you don’t accept in your own applications: “something like a query builder that allows different parts of the application to work together to lazily compose queries”.


IME young businesses evolve so quickly too much abstraction is a higher risk that some bleeding among DAL and business rules.


How is this not just a hand rolled ORM?


An ORM is a specific kind of data layer. It is general-purpose, and as the name suggests, it is an Object-Relational Mapper whose primary purpose is to map relations to objects and vice-versa.

So no, bespoke data access layer code is not an ORM.


Why are a data access layer and an ORM mutually exclusive? An ORM is just an abstraction over your database. You might find use in it as a tool to access your database but contain the use of the ORM to within your data access layer.


I never said they are mutually exclusive. I said you don't need the ORM.


Mind sharing an example of a large(ish) app that doesn't make use of an ORM? Last time this topic came up, I went looking for one (admittedly not too hard) and I came up empty handed.


> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.

Welcome to LINQ - introduced in 2007 (?).

LINQ isn’t exactly an ORM. You create strongly typed LINQ statements that are turned into expression trees that are then turned into a query by a query provider.


LINQ was/is fantastic.

Being able to run queries over data without having to implement the various boilerplate is excellent.

Unfortunately basic mistakes can result in so many of those N+1 type queries if you're not careful. It can also result in reading entire table(s), possibly inside those nested N+1s.

It gets expensive when it's a remote DB server and/or dealing with a lot of records.

Profiling tools are essential.

It's why I'm not a fan of LINQ (and Entity Framework) for talking to DB severs without very strict controls.

Many times I've been called in to deal with a "slow" server, only to find out the issue is someone chained together a bunch of calls through EF and we have hundreds of thousands of queries that can be replaced by one or two.


> Unfortunately basic mistakes can result in so many of those N+1 type queries if you're not careful. It can also result in reading entire table(s), possibly inside those nested N+1s.

I don't know which "basic mistake" would do this. Maybe when using Lazy loading Proxies?

We use EF Core and it is quite easy to control eager loading or load-on-demand on a per-case basis.


And that also happens with SQL - which is also an abstraction.


SQL is only an abstraction if you consider modifying the physical file to be the purer form of RDBMS manipulation.


Would you consider SQL an abstraction over DynamoDB? ElasticSearch’s native query language? Apache Presto to query files? Mongo?

All of those can use SQL as a worse query language than their native counterparts.

You can also create very bad SQL if you don’t know the underlying engine. For instance if you try to write SQL for a columnar database like you would for a traditional database, you are in for a world of hurt.


You seem to be talking about databases who's native language is not SQL. In those cases, yes, SQL is definitely an abstraction. There's some layer that's transforming that into (say) Elastic query or Dynamo queries.

I would argue that when talking about writing SQL, most folks are talking about applications who's query planner talks some dialect of SQL. MS SQL Server, MariaDB, MySQL, Oracle, SQLite.

In those cases, it's not really an abstraction any more than writing assembly is an abstraction over (say) Intel's CPU microcode is.

The query planner takes your SQL and turns it into something else, sure, but you can't generally do that yourself. It's the lowest layer of abstraction that's reasonably available.


Iirc, Linq is exactly a query builder and not an orm. IMO what truly makes an orm is when object fields have data binding against the database (updating a field triggers an update in the db)

It's kinda gross because the developer needs to make a decision about what is authoritative source of truth, the programming model makes you feel like you can trust your code (probably the wrong choice), and all the footguns around distributed state kick in (possibly even worse if you have a frontend with two way data binding and data structures that last longer than an http request in your backend)


LINQ is a query builder and the .Net runtime creates expression trees. Entity Framework translates the expression tree to SQL and maps data back to objects.


And iirc you can use databases in .net with Linq only and pass on entity framework, if you so chose


Linq is really just a high level abstraction for querying any data that can be queried. EF adds some more abstractions mostly related to mapping C# objects to database tables. Then it adds database specific query providers (implementations) that combine the (abstract) linq query and the mapping to produce sql.

You can really use linq to query anything (APIs, for example), including databases without using EF. It's just not very common because building a custom query provider is a lot of work.


Linq2SQL was a thing. I’m not even sure if it is still supported.


Clarifying: ORMs are gross, query builders are not.


TypeORM thinks it’s an ORM but it doesn’t do what you say. It’s pretty good at writing basic migrations from annotations, but that’s about it.


LINQ is just an interface, you need some sort of an implementation below it (be it the unmaintained LINQ-to-SQL that AFAIK nobody should be using anymore, or the latest EF rewrite).

It's also the first thing I recommend to anyone claiming that concatenating SQL strings together is a good use of your time in the 21st century. Makes writing things like complex HTML tables with configurable columns and tons of optional filters so much faster and more maintainable.


In my experience, across a wide variety of applications built wide a number of different teams, LINQ has almost always been an anti-pattern. Whenever you inject a leaky abstraction (as Spolsky would say), things start to go awry. The crimes of grotesque inefficiency I've seen because the magical LINQ is there to shield devs from proper data tiers.


Absolutely, there are times where programmers try to do way too much in LINQ and don't realize that the query being generated to back up their chain of LINQ operations is a monstrosity of thousands of lines of SQL that will grind the server to a hault. Or the abstraction leaks badly due to the programmer using something EF doesn't know how to translate to SQL, so it ends up loading a much larger dataset and trying to complete the rest of the filtering in memory.

As developers get more acclimated to the .NET ecosystem they usually pick up a good spidey sense for what query "shapes" are appropriate to express in LINQ and which are asking for trouble.

However, for every one of those awful LINQ queries in a codebase, I think there are 40 to 50 run-of-the-mill queries that make it worthwhile. The biggest win that it delivers over a simpler, "stringier" ORM is the ability to return structured data. I have no problem writing SQL, but processing the results back out of flat rows into objects gets extremely annoying.

Say I want to load some Foos with their Bars. If I use straight SQL with a row mapper, I define a type to represent each FooBar row, load a List<FooBarRow> into memory from my simple left join, then I'll have to do a GroupBy in memory on that List to get the actual structure that I want: a list of Foos where each one has its own list of Bars. Notice that I also have to handle the empty-list case specially.

  var fooBarRows = await connection.QueryAsync<FooBarRow>("select f.FooId, f.Name as FooName, b.BarId, b.Name as BarName from Foos f left join Bars b on b.FooId = f.FooId where whatever");
  // re-shape result set manually from flat query results
  var foos = fooBarRows
      .GroupBy(fbr => fbr.FooId)
      .Select(g => new Foo
         {
            FooId = g.Key,
            FooName = g.First().Name,
            Bars = g.First().BarId == null ? new List<Bar>() : g.Select(b => new Bar { BarId = b.BarId.Value, BarName = b.BarName }).ToList()
         });
In EF it's just:

  var foos = await db.Foos.Where(whatever).Include(f => f.Bars).ToListAsync();
Saving that clutter -- both the code and the otherwise useless intermediate FooBarRow type -- really adds up because the apps I write have a zillion queries very much like this. So I'll take the occasional shitty query that has to be tracked down and optimized in exchange.


>I have no problem writing SQL, but processing the results back out of flat rows into objects gets extremely annoying.

I'm pretty sure Dapper lets you do this fairly easily. You've got a couple options - you can either provide a mapping function, or you can have your query/sprocs return multiple result sets. Then you can assemble the result sets into the object structure.

It's not quite as magical as LINQ, but it's also not quite as annoying and fraught with so much marshalling code as your example.


Was waiting for Dapper to get brought up. Fantastic micro ORM.


You made a strong argument, up until you forced client side execution of the query by calling `ToListAsync()`

https://learn.microsoft.com/en-us/ef/core/querying/client-ev...


When they say it forces client evaluation, what they mean is that any LINQ operations you run after the ToListAsync() will run on the client side - since it's just a normal list in memory at that point and no longer has any knowledge of the database. So if you want to make sure some code will execute client-side, you should ToList or AsEnumerable your query first, then do your additional client-side operations on that list.

The IQueryable operations composed before to the ToListAsync() will run as SQL on the server. In the case of my example it will look pretty similar in structure to the string SQL I wrote before it: a straightforward left join with a where clause. Performance will be similar too. It will not load the entire universe of Foos and Bars and then filter them down on the client.

One of the habits I have developed from working with EF for the past 10 years is to be explicit and as local as possible about forcing the query to evaluate. It lets me pretty reliably predict what the SQL is going to look like.

You can return IQueryables from methods and pass them around through many layers of your program, adding complexity to them as you go, and lazily getting the results at the last possible moment, five layers up the call stack from where the query first started.

It seems at first like that would be good, because that way the maximum amount of logic will run on the DB server, and letting the server do stuff is better, right? But that's also where you open yourself up to being very surprised about what your SQL looks like by the time it executes, with multiple layers of your program each tacking complexity onto the query. It can get ugly. Also you can have problems if you're hanging onto IQueryables that you still haven't executed after you've Dispose()d your DbContext. For this reason I try not to let IQueryables travel very far through my program before forcing evaluation.


Really really well said.

It is weird that people jump into strange extremas when you can just combine best of both worlds.


It's sad that in programming world, something is either anti-pattern and you must not use it at all, or something is so awesome that you need to use it everywhere.


>LINQ has almost always been an anti-pattern

You mean "LINQ" as an input for ORM? or LINQ in general?

either way this is crazy claim.

There's shitton of projects where queries aren't complex and LINQ will make your life saner instead of this stringly typed hell called SQL.

Just check what ORM is generating for your queries and if it is a mess then write it manually and you're good to go.

This way you get both benefits: sane type system and performance.


LINQ in general is a very bad indicator if it isn't very restricted. I don't think it's a crazy claim whatsoever: When LINQ appears littered through code, it usually mean there has been perilously little data access planning, so instead of planning out and centralizing concise, planned, secure, optimized data access avenues, just pretend that LINQ being available everywhere is a solution. Every LINQ-heavy solution I've had the misfortune of using has always, with zero exceptions, been a performance nightmare. And instead of one big function you can just optimize, it is death by a trillion cuts.

I get that there are big LINQ advocates. It's convenient. I'm sure there are projects where it is use spectacularly. I've never seen such a project.


LINQ is just functional list comprehensions.

Do you think that LINQ is only used for EF/LINQ2SQL?


First time I’ve heard that claim was when I had to start working with Go. I was telling someone how much I miss LINQ/functional programming, and he said exactly the same thing “ugh, LINQ is why .NET is so slow”. I was perplexed. Do you even know what LINQ is? How does it make .NET slow? And to the best of knowledge they were comparing JIT startup time to Go’s AOT time. Smh


Yeah. The amount of FUD and ignorance out there in the technical world WRT the .NET ecosystem is astounding.


This comment make me question if you really have any experience with LINQ. If you had, you would not make a comment where you seem to think that LINQ is used only for data access.

LINQ is (at the most basic level) list comprehensions done better. It is functional programming for the imperative C# programmers. It has the potential to remove most/all loops and make the code more readable in the process.


>This comment make me question if you really have any experience with LINQ

The classic fallback.

>If you had, you would not make a comment where you seem to think that LINQ is used only for data access.

List comprehension is data access. Accessing sets in memory is data access.

>If you had, you would not make a comment where you seem to think that LINQ is used only for data access.

It has the potential, and almost the certainty, of allowing one to thoroughly shoot themselves in the foot. See without the "magic" of LINQ the grotesqueness of many patterns of data access (which, as previously mentioned, includes in memory structures. Pretty bizarre that anyone actually in this field thinks this only applies to databases, or that only DBs are "data") would lead one to rethink.

LINQ is almost always a bad indicator. It is actually a fantastic thing in one off scripts and hack type code, but when it appears in production code, I would say 90%+ of the time it is absolutely gross, but it hides how gross it actually is.


> LINQ is almost always a bad indicator. It is actually a fantastic thing in one off scripts and hack type code, but when it appears in production code, I would say 90%+ of the time it is absolutely gross, but it hides how gross it actually is.

Yeah, you definitely have no idea what LINQ, or an list, even is.


How is this gross then?

It is just a slightly slower than e.g fors, so unless this is hot path, then it is basically not relevant meanwhile it improves readability of the code


Good points! As someone who uses both ORM and raw SQL (I actually really like SQL as language) I sympathize with both sides of this debate. I prefer the readability of ORM in my models when developing, but closely monitor and optimize for N+1 and other inefficiencies, when needed. "When needed" is never cut-and-dry, but I try not to optimize too early. I actually tend to start new projects from the database and will scaffold with factories, seeders, and raw SQL queries to get a sense of the data model prior to coding.

ORM's such as Eloquent in Laravel also have some nice methods to resolve N+1 and perform lazy loading, but it's always tradeoff.


> As someone who uses both ORM and raw SQL

There is no other way — people who don’t know one should not touch the other. Otherwise they are either juniors, or crazies who just like to complain that “the plane is a bad vehicle because I can’t just sit inside and land it properly without years of training”.


There are libraries that flip the concept of an ORM on its head. Instead of a library that allows lazy query composition, you write your queries and the library generates the code for that query at compile time. It’s a much better model in my opinion.

E.g. you could write a query like this:

    getUser:
    SELECT * FROM users WHERE id = ?
And the library would generate a class like:

    class GetUserQuery {
        static getUser(id: String): GetUserQueryResult
    }


Some examples for anyone else reading:

Go: https://github.com/kyleconroy/sqlc

Rust: https://github.com/cornucopia-rs/cornucopia

This is my preferred method of interacting with databases now, if available.

Very flexible.


We had a query builder in our app and ended up stripping it out in favour of raw SQL and string interpolation (for dyanmic queries, not for passing in data). We found the raw sql was much more readable.

We still used the library for inserts and updates.


IME, the accidental inefficiencies aren't a huge problem. What I hate about some ORMs is that they want to be considered the foundation of a system or, worse, multiple systems, rather than just a tool for implementing pesky infrastructure details. LINQ-to-SQL and Entity Framework feel like they want to be at the center of everything, though they don't have to be use that way. Something like Dapper, on the other hand, seems like it just wants to get your app the data it requires, through classes the application defines, and then get out of your way.

In short, I don't think ORMs should generate class libraries to be referenced by applications, they should help you get exactly what you need, when you need it, nothing more, nothing less, and no one expect the person that maintains that code should ever need to care how the data got there. If a column gets added to a table that has nothing to do with a specific application, that application shouldn't need to be updated.


I've seen them help with things like dirty checking, that's a lot of work to take on properly. But in the end I think relying on them for significant load and scale is naive.


> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.

Right, but is an ORM a good way to achieve this? I would posit no. The problem, correct me if I’m wrong, is one of an in memory representation of working data and a data access layer (how to get/update the data). An ORM provides a framework for both of these, but in my opinion it’s very inefficient to build, maintain, and optimise. You don’t need to turn to raw SQL as the alternative data access layer, but having one that is easier to customise is better in my opinion.

Not at all an expert, probably nonsense, please correct me.


It depends on what exactly the ORM offers as a feature set. ActiveRecord scopes in Ruby for example do exactly that.


SQL is an industry standard well understood by legions of senior developers and DBAs alike. SQL skills learned are repeatable, widely useful and build on past experiences.

Debugging and optimizing SQL queries is a well understood art, barring some rare DB specific optimizations (which ORMs don't even touch)

Getting expert SQL advice and support when you need is very practical and the results show.

Putting an ORM as the middle man generating your queries, negates all those benefits for some pithy premature laziness.

Operating on raw SQL is something I'd strongly recommend.


I would challenge your assertion that "any sufficiently complex application" will require lazily composed queries. I have worked on quite complex applications that had no such need. Well encapsulated string construction in a Data Access Object pattern worked just fine.

If you find yourself needing to lazily compose queries, then by all means, reach for a query builder. But I would encourage you to first examine whether you've split code into services that really should live together and whether that code should live closer to the data layer before you build a whole query builder.

And don't reach for a query builder until you need it.


In a sufficiently complex application, the database lives in another service that exposes a fixed set of queries through an API and certainly does not let you compose arbitrary queries.


How do you do anything with that? Like do you just accept that if some part of your app needs the data from query1 and query2 you make two trips to the database?


You file a ticket with the database team so they can add your fixed query to the service API. After a few rounds of exchanging messages and meetings it might be added.

If it's a third-party API (say, a weather forecast or market data) the answers will range from "no" to "yes, and it will cost you X".


Why use an ORM if all you need is a query builder? They exist as of the shelve solution, some are even crazy good.


the line between an ORM and a query builder is a very blurry one

your query builder isn't just string based to avoid a lot of potential bugs which are easy to introduce and miss in tests? It also has a simple way to (de)serialize row from/to POD structs? Now you already have a thin ORM.


Care to mention one or two? All the ones I’ve seen have been dismissed as ORMs by ORM-haters.


The N+1 problem feels a bit like y2k in my experience; would definitely be a problem if smart folks didn’t come up with clever solutions, but since they did it’s not nearly the issue it’s made out to be for the average implementer.


That has not been my experience. Then again, I’ve written tens of thousands of sql queries over the decades, so sql is as easy as breathing for me. No need for any sort of query builder.


Query builders are generally not intended for developers to build "static" queries, but to build dynamic queries programmatically while being safe (eg. from SQL injection) based on dynamic data, often with user inputs, the likes from ASTs from parsed user input in query fields.


Getting inner joins and denormalized data correct are a PITA that should be delegated to a semantic API that already is flexible and proven.


Query Builders are a subject I am thinking about at the moment. The question is whether or not OO is the right model.

JooQ in Java is a DSL for writing SQL in Java, it works amazingly well. It’s not guaranteed every JooQ statement compiles to valid SQL but I have pretty good experiences writing crazy complex queries using things like string_agg in pgsql. Here the relational model is primary and the representation as Java objects is secondary, given the SQL is persistent and the Java objects ephemeral this makes sense.

My RSS reader uses Arangodb and Python and I’d like to stop a bit and develop a query builder inspired by OWL class axioms which if you look at them the right way look like building blocks for a Scratch-like SQL query builder. I want both ordinary queries (browse my favorites, say an article with the keyword “niantic” that appeared on Tildes) and rules (don’t want read articles from the Guardian that have “- live” in the title)


> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries (vs combining/filtering/sorting the data in memory) will be created at some point.

Then your devs have way too much time on their hands. Find them some actual problems to solve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: