In practice product requirements can get in the way of technically ideal solutions. One example of this is that analytics products allow users to pass in and analyze arbitrary number of user properties to do analysis on - more than even a columnar database can handle. The current solution of storing JSON as a column indeed has a very significant performance trade-off, but it's also needed to power queries users need to run. This is also why we're really excited about the new Object data type that landed in 22.3 as it handles these cases gracefully by creating dynamic subcolumns.
On JOINs - again, requirements bite us in different ways. Product analytics tool data ingestion pipelines can get quite complicated due to needing to handle merging anonymous and signed in users and user properties changing over time. Handling that via JOINs is as a go-to-market helps avoid that upfront cost by centralising the logic in SQL, but indeed does come with a significant cost in scalability. Delaying in turn allows you to be building tools users need. That said every loan needs to be paid at some point and we're currently knee deep in re-architecting everything to avoid these joins.
Also note that JOINs don't work the way you described from our experience - rather the right hand side of the join gets loaded into memory. The bottleneck there is memory pressure rather than I/O with a good ORDER BY on the table.
All that said, what a great summary of all the different things to keep an eye on. Thanks for reading and sharing your thoughts!
I talked about memory latency with respect to joins - that the random access in the hash table is much slower per "row" than vectorized operations on columns. I didn't say I/O would be slow. That I said hash table implies that one side is fully loaded into RAM.
> I'm convinced the only people who have good things to say about using linux (or BSD for that matter, been there done that, no thanks) on a laptop are the kind of people who keep their "laptops" on the same desk, plugged in to ethernet, and are effectively using a desktop with poor thermals
Good for you for making your own decisions, but don't be a condecending arschloch. Personally I prefer linux because it works fine and consider Apple is overpriced piece of spyware and many their users smug idiot hipsters.
It's the problem with star systems, which is that we'll always have different definitions. Your 3s are living next to the other poster's 3s and mean very different things.
That said, I think the world has also suffered from ratings inflation. I tend to assume anything under a 4 means "bad" or "meh" myself.
This is exactly the way the current Goodreads rating is supposed to work (and I'm personally OK with it). But my guesstimate is that for 95% Goodreads users everything below 4 stars means that the book sucks.
I absolutely love Fish, but it's worth warning anyone who's taking a first look and excited to try it, that some syntax differences from more familiar shells like Bash may cause frustration, both in terms of muscle memory and anytime you need to copypasta. It hasn't been enough of a problem for me to switch back (or to ZSH), the benefits far outweigh the frustration, but I do think it's worth a small warning.
This one's tricky. I've driven fish daily for years, and it's definitely snappier than my old omzsh setup (I much prefer this style of history), but you can get bitten by certain tools not using shebangs properly.
In europe, https://eagronom.com/ is taking an interesting approach - rather than throw all existing farming practices out with the bathwater, try to build on top of them and modernize to achieve sustainable farming.
There's an infinite amount of detail that's impossible to capture in a comment and which invariably changes over time and doesn't hold in the future.
For my team, the solution has been writing longer commit messages detailing not only what has changed, but also the why and other considerations, potential pitfalls and so forth.
So in this case, a good commit message might read like:
```
Created square root approximation function
This is needed for rendering new polygons in renderer Foo
in an efficient way as those don't need high degree of accuracy.
The algorithm used was Newton-Raphson approximation, accuracy was
chosen by initial testing:
[[Test code here showing why a thing was chosen]]
Potential pitfalls here include foo and bar. X and Y were also
considered, but left out due to unclear benefit over the simpler
algorithm.
```
With an editor with good `git blame` support (or using github to dig through the layers) this gives me a lot of confidence about reading code as I can go back in time and read what the author was thinking about originally. This way I can evaluate properly if conditions have changed, rather than worry about the next Chthulu comment that does not apply.
How so? The code still documents what is happening, the commits however lay out the whys.
The point is that these two are separate questions, and that trying to use comments as a crutch to join the two religiously is a headache. It's impossible to keep everything in sync and I don't want to read needless or worse misleading information.
What's worse, in comments we often omit the important details such as why was the change made, what other choices were considered, how was the thing benchmarked, etcetc.
That said, comments still have a place. Just not everywhere for everything and especially not for documenting history.
I disagree. I think the "whys" belong in the comments- in fact, that's the most important part of the comment if the code is cleanly written. I don't want to be happily coding along, get to a glob and have to go to the repo pane, hunt for the commit that explains this particular thing, then read a commit message. Put it in a comment in the code. Pretty please.
You need a queue in front of your database irregardless of write latency. Otherwise you tie your availability to database availability as downtime (even for upgrades) is often unavoidable and network problems are common.
Note that working remotely takes some getting used to just as with working at an office. Yes, you can dump someone into unfamiliar conditions and draw conclusions from that, but that's just confirming your prior assumptions without actually trying.
On JOINs - again, requirements bite us in different ways. Product analytics tool data ingestion pipelines can get quite complicated due to needing to handle merging anonymous and signed in users and user properties changing over time. Handling that via JOINs is as a go-to-market helps avoid that upfront cost by centralising the logic in SQL, but indeed does come with a significant cost in scalability. Delaying in turn allows you to be building tools users need. That said every loan needs to be paid at some point and we're currently knee deep in re-architecting everything to avoid these joins.
Also note that JOINs don't work the way you described from our experience - rather the right hand side of the join gets loaded into memory. The bottleneck there is memory pressure rather than I/O with a good ORDER BY on the table.
All that said, what a great summary of all the different things to keep an eye on. Thanks for reading and sharing your thoughts!