More

zjaffee · 2026-01-25T09:51:40 1769334700

The iOS version of most social media apps is better. IOS simply has better API integration to it's hardware, where with android, many OEMs (hell this was even the case to a certain extent with older pixel phones), do a number of things that make the hardware not as easily accessible as quickly from the OS API for said feature.

This is especially relevant for the camera, but also various other sensors and hardware modules that exist inside these phones.

That said, in recent years there are just a number of other areas that android is much better at such as deeper AI integration, which goes back to even prior to the current LLM craze.

direwolf20 · 2026-01-25T10:07:49 1769335669

What are those things?

zjaffee · 2026-01-25T08:00:39 1769328039

I'm originally from the US, but where I live now, whatsapp functionally replaced email for a lot of different types of communication (that would be an email in the US). Recruiters text me on whatsapp about jobs, I can ask for a prescription renewal through it, and I get support from everything ranging from a government agency to customer support for things from businesses, ect.

zjaffee · 2026-01-19T08:02:14 1768809734

One thing that is repeatedly underdiscussed about open source is that every time you have a major open source project become successful, be that anything from Linux to Apache Spark, you have private companies who come in, build something that can very reasonably still be called Linux or Apache Spark, but underneath has tons and tons of extra stuff that they never feed back into the open source community.

Hell, I think with the later (since all major cloud providers deploy their own version of spark on their respective data processing cluster services), people don't even know that they aren't in fact using open source software. Hell, eventually you get to a point where companies that choose not to use these third party services eventually just open source their own improvements or abstractions as again separate open source projects that never make it into the upstream project (which are often times heavily influenced by profit making entities).

This has been the model for a very long time, going back to at least the likes of redhat. And certainly will be going forward with countless future projects. Maybe there needs to be new models of open source governance, but I have no clue how successful such a thing would even be.

ekianjo · 2026-01-19T08:09:34 1768810174

> but underneath has tons and tons of extra stuff that they never feed back into the open source community.

Very unlikely for GPL2 projects

pjmlp · 2026-01-19T09:44:55 1768815895

See cloud provider specific distros, or Android Linux kernel.

Thing is, when they misbehave, someone has to have the money to bring them to court.

zjaffee · 2026-01-19T07:46:48 1768808808

It depends on what you were trying to with the data. Hadoop would never win, but Spark can allow you to hold all that data in memory across multiple machines and perform various operations on it.

If all you wanted to do was filter the dataset for certain fields, you can likely do something faster programmatically on a single machine.

zjaffee · 2026-01-18T12:17:37 1768738657

It's not about how much data you have, but also the sorts of things you are running on your data. Joins and group by's scale much faster than any aggregation. Additionally, you have a unified platform where large teams can share code in a structured way for all data processing jobs. It's similar in how companies use k8s as a way to manage the human side of software development in that sense.

I can however say that when I had a job at a major cloud provider optimizing spark core for our customers, one of the key areas where we saw rapid improvement was simply through fewer machines with vertically scaled hardware almost always outperformed any sort of distributed system (abet not always from a price performance perspective).

The real value often comes from the ability to do retries, and leverage left over underutilized hardware (i.e. spot instances, or in your own data center at times when scale is lower), handle hardware failures, ect, all with the ability for the full above suite of tools to work.

dapperdrake · 2026-01-18T14:22:39 1768746159

Other way around. Aggregation is usually faster than a join.

sgarland · 2026-01-18T16:02:37 1768752157

Disagree, though in practice it depends on the query, cardinality of the various columns across table, indices, and RDBMS implementation (so, everything).

A simple equijoin with high cardinality and indexed columns will usually be extremely fast. The same join in a 1:M might be fast, or it might result in a massive fanout. In the case of the latter, if your RDBMS uses a clustering index, and if you’ve designed your schemata to exploit this fact (e.g. a table called UserPurchase that has a PK of (user_id, purchase_id)) can still be quite fast.

Aggregations often imply large amounts of data being retrieved, though this is not necessarily true.

dapperdrake · 2026-01-18T17:04:18 1768755858

That level of database optimization is rare in practice. As soon as a non-database person gets decision making authority there goes your data model and disk layout.

And many important datasets never make it into any kind of database like that. Very few people provide "index columns" in their CSV files. Or they use long variable length strings as their primary key.

OP pertains to that kind of data. Some stuff in text files.

sgarland · 2026-01-19T03:43:42 1768794222

How is a proper PK choice a high level of optimization?

jitl · 2026-01-18T20:22:57 1768767777

unconvinced. any join needs some kind of seek on the secondary relation index, or a bunch of state if ur stream joining to build temporary index sizes O(n) until end of batch. on the other hand summing N numbers needs O(1) memory and if your data is column shaped it’s like one CPU instruction to process 8 rows. in “big data” context usually there’s no traditional b-tree index to join either. For jobs that process every row in the input set Mr Join is horrible for perf to the point people end up with a dedicated join job/materialized view so downstream jobs don’t have to re do the work

hunterpayne · 2026-01-19T02:06:16 1768788376

An aggregation is less work than a join. You are segmenting the data in basically the same way in ideal conditions for a join as you are in an aggregation. Think of an aggregation as an inner join against a table of buckets (plus updating a single value instead of keeping a number of copies around). In practice this holds with aggregation being a linear amount faster than a join over the same data. That delta is the extra work the join needs to do to keep around a list of rows rather than a single value being updated (and in cache) repeatedly. Depending on the data this delta might be quite small. But without a very obtuse aggregation function (maybe ketosis perhaps), the aggregation will be faster. Its updating a single value vs appending to a list with the extra memory overhead this introduces.

zjaffee · 2026-01-19T06:40:27 1768804827

I'm saying that a smaller amount of data means more compute is required for a join. Sorry if that wasn't clear.

zjaffee · 2026-01-05T17:34:40 1767634480

What an amazing set of articles, one thing that I think he's missed is the clear multi year trends.

Over the past 5 years there's been significant changes and several clear winners. Databricks and Snowflake have really demonstrated ability to stay resilient despite strong competition from cloud providers themselves, often through the privatization of what previously was open source. This is especially relevant given also the articles mentioning of how cloudera and hortonworks failed to make it.

I also think the quiet execution of databases like clickhouse have shown to be extremely impressive and have filled a niche that wasn't previously filled by an obvious solution.

zjaffee · 2025-11-10T07:44:10 1762760650

Montana likely passed such a law because they have a governor and a senator who both came from the tech sector in big ways (sold the same company to oracle).

That said, there are a lot of other legal hurdles that would prevent Montana from ever being significant to the tech sector, despite the fact that I'm certain many skilled people would love to live there. From being the only state to not have at will employment to having a completely out of wack tax system (ratio between income and sales tax for a state entirely dependent on tourism), to countless restrictions (and often necessary because of water restrictions) on building large amounts of new housing, it just sin't happening.

zjaffee · 2025-10-29T13:55:59 1761746159

AWS (along with the vast majority of B2B services in the software development industry) is good because it allows you to focus on building your product or business without needing to worry about managing servers nearly as much.

The problems here are no different than using SaaS anywhere else in a business, you can also run all your sales tracking through excel, it's just that once you have more than a few people doing sales that becomes a major bottleneck the same way not having an easier to manage infrastructure system.

zjaffee · 2025-10-20T11:17:02 1760959022

I couldn't agree more, there was clearly a big shift when Jassy became CEO of amazon as a whole and Charlie Bell left (which is also interesting because it's not like azure is magically better now).

The improvements to core services at AWS hasn't really happened at the same pace post covid as it did prior, but that could also have something to do with overall maturity of the ecosystem.

Although it's also largely the case that other cloud providers have also realized that it's hard for them to compete against the core competency of other companies, whereas they'd still be selling the infrastructure the above services are run on.

zjaffee · 2025-10-13T16:29:20 1760372960

“Avinatan has come home”: Jensen Huang hails release of Nvidia engineer after two years in Hamas captivity Nvidia’s CEO shared the emotional news with employees after Avinatan Or, who was among those kidnapped from the Nova music festival on October 7, 2023, was freed on Monday