__This is a summary of a somewhat long article, it cuts a lot corners due to character limits. Please check the article for more info.__
Some years ago I worked with a scale-up that was really focused on the way they handled data in their product. At some point they started to talk about standardizing their data transfer objects, the data that flows over the API connections, in these common models. The idea was that there would be a single Invoice, User, Customer concept that they can document, standardize and share over their entire application landscape.
What they were inventing is now known as a Canonical Data Model. A centralized data model that you reuse for everything. And to be fair to that team, there are companies that make this work. Especially in highly regulated environments you can see this in play for some objects. In banks or medical companies it’s not uncommon to have data contracts that need to encapsulate a ledger or medical checks.
## Bounded context
When that team was often talking about domain driven design concepts (value objects, unambiguous language) they seemed to miss the domain part. More specifically, the bounded context.
A customer can mean a lot of things to a lot of different people. This is the bounded context. For a sales person a customer is a person that buys things, for a support person they are a person that needs help. They both have different lenses.
Now if we keep following the Canonical Data Model, this Customer object will keep on growing. Every week there will be a committee that decides what fields need to be added (you cannot remove fields as that impacts your applications).
## Enter the Data Mesh
A way to solve this, is data mesh. This takes the concept of bounded context as a core principle. In the context of this discussion, data mesh sees data as a product. A product that is maintained by the people in the domain. That means that a customer in the Billing domain only maintains and focuses on the Billing domain logic in the customer concept.
They are responsible for the quality and contract but not for the representation. That means in practice that they can decide how a VAT number is structured. But not how the Sales team needs to format said model. They have no control or interest in how other domains use the data.
It’s a very flexible design but while Data Mesh solves the coupling problem, it introduces a new set of challenges. If I’m an analyst trying to find ‘Customer Revenue,’ do I look in Sales, Billing, or Marketing? The answer is usually ‘all of the above.’ In a pure Mesh, you don’t make multiple calls, you have to build multiple Anti-Corruption Layers just to get a simple report. It requires a high level of architectural maturity and that is something not every low-code or legacy team possesses.
## Federated Hub-and-Spoke Data Strategy
Let’s try and see if we can combine these two strategies. We centralize our data in a central lake. Yes, that is back to the CDM setup. But we split it up in federated domains. You have a base Customer table that you call CustomerIdentity that is connected to a SalesCustomer, SupportCustomer, … Think of this as logical inheritance, a ‘CustomerIdentity’ record that is extended by domain-specific tables through a shared primary key. When you create a new Customer in your sales tool you trigger an event. The CustomerCreate event. The CustomerCreate trigger fills out the base information for the Customer (username, firstName, lastName) in the central data lake, at the same time we store our customer (base and domain specific data) in our local database. You also do this for delete and update events. The base information goes to the server, the domain specific data stays on the sales tool as a single source of truth. Every night there is a sync of the domain tools to the central lake to fill out the domain tables with a delta
This week I wrote about my experiences with technical and architectural debt. When I was a developer we used to distinguish between code debt (temporary hacks) and architectural debt (structural decisions that bite you later). But in enterprise architecture, it goes way beyond technical implementation.
To me architectural debt is found on all layers.
Application/Infrastructure layer: This is about integration patterns, system overlap, and vendor lock-in. Not the code itself, but how applications interact with each other. Debt here directly hits operations through increased costs and slower delivery.
Business layer: This covers ownership, stewardship, and process documentation. When business processes are outdated or phantom processes exist, people work under wrong assumptions. Projects start on the back foot before they even begin. Issues here multiply operational problems.
Strategy layer: The most damaging level. If your business capability maps are outdated or misaligned, you're basing 3-5 year strategies on wrong assumptions. This blocks transformation and can make bad long-term strategy look appealing.
Here's a pattern I see destroying technical decisions: we've turned meetings into gladiatorial contests of quick wit instead of deliberate problem-solving.
Nemawashi, "turning the roots", is a side-step to that.
Pre-socialize decisions through 1-on-1s. Let people think privately, examine data, and reach consensus before the formal meeting.
People think better in private, not when they’re performing in front of others.
Since then, I’ve stopped seeing meetings as places for quick wit and started valuing the prep work. The coffee chats, the shared data, the quiet thinking. It’s slower, but it leads to better decisions and fewer grudges.
That is what I’m alluding to. But also, I work in a security aware industry, so those things will be vetted. And my own experience make me address some of the potential concerns already in the POC.
The real divide going forward will be between vibe coding with experience across domains vs vibe coding without IMHO.
The whole point of vibe coding is that you don't review or attempt to understand the code. If you do, you're a developer using AI as a tool, which the video is not arguing against - indeed, he links to another video he's made explaining one way of doing that. You could call that "vibe coding with experience" but it just devalues the term, I think, and certainly misses the point in this particular case.
These people believe they're on track to create life and displace the majority of labor in the world. Nothing else makes the level of investment make sense. It's a prisoner's dilemma where they all think they need to try because regardless of the likelihood of success, the expected value remains astronomical and the risk of not being the winner is extinction.
Yeah, half of my AI skepticism isn't that the tools don't work. It's that I'm having a tough time figuring out how these things are ever going to produce ROI.
Like, at some point the end product needs to be a literal genie's lamp or fountain of youth.
Scale AI has deep cooperation with military agencies and a fresh large contract with the Gulf state with the largest US military presence. It's likely they're building a new generation of combat command systems and the like, consolidation of surveillance and management tooling in the ongoing and future US wars isn't surprising.
[Okay so the third option that I thought of but decided not to put down was: literal genie's lamp, fountain of youth, or robot army. Because then who cares if you collapse the economy if AGI, you'll be safe with your robot army. Not particularly happy that this option is potentially not far from the mark.]
Not sure what you mean, but it's not like the militarisation of "tech" is a secret. The current US administration is bragging about its militaristic totalisation efforts:
If "absurd" implies "too high": I always thought strong reactions to valuations a bit strange. Businesses are complicated and assuming that somebody who is willing to spend billions of dollars thought a bit harder about the value than what I can provide with my gut reaction seems reasonable.
So I started to treat it as more of an update, as in "Huh, my idea of what something is worth just really clashed with the market, curious."
Does not mean the market is right, of course. But most of the time, when digging into it and thinking a bit more about it, I would not be willing to take the short position and as a consequence moderate my reaction.
Cogent Core also looks neat, but I didn't have the time to play with it before I switched over to using the Odin programming language instead of Go https://www.cogentcore.org/core
I personally had nothing but issues with Fyne (especially in regard to performance, across multiple computers and operating systems), but it's probably the most popular option https://fyne.io
Not a real fan of this approach.
This is what's called emerging strategy where you react on what happening around you (not to be confused with agile where you look at what's happening around you and then deciding a course of action). Problem here is that you are never in control of where you are going to, and wasting a lot of energy and work switching over to the new strategy.
Well that would be subtractive: I don't know what I want, but I don't want X & Y. You would steer yes, but it would be very broad. You're not really working towards something, you're working away from multiple things.
I find that orienting around results can help unlock whether positive or negative space (a goal or constraints set) is the better focus. In my experience, there are times when goals do not serve me, but rather hinder me. This is purely from regularly observing results. In those cases, pivoting to a focus on some well defined constraints has yielded better results. As long as the direction is the same between the two, that might still be considered proactive.
Looking at the charts they all seem to take off. To me that begs the question, do they take off because they hit the front-page and get all the attention, or are some of them bought upvotes like on Reddit?
I've yet to see anyone selling HN upvotes on the usual dubious forums like BHW. I'm not saying it doesn't happen, but I don't believe there is any commercial service being advertised.
I mean professional influence campaigns exist. Especially on social media. And HN also has its own required dynamics with respect to reaching the front page.
Well let's not forget that it's an opinionated source. There is also the point that if you ask it about a topic it will (often) give you the answer that has the most content about it (or easiest to access information).
I find that, for many, LLMs are addictive, a magnet, because it offers to do your work for you, or so it appears. Resisting this temptation is impossibly hard for children for example, and many adults succumb.
A good way to maintain a healthy dose of skepticism about its output and keep on checking this output, is asking the LLM about something that happened after the training cut off.
For example, I asked if lidar could damage phone lenses. And the LLM very convincingly argued it was highly improbable. Because that recently made the news as a danger for phone lenses, and wasn’t part of the training data.
This helps me stay sane and resist the temptation of just accepting LLM output =)
On a side note, the kagi assistant is nice for kids I feel because it links to its sources.
I should have been more specific, but you missed my point I believe.
I tested this at the time on Claude 3.7 sonnet, which have an earlier cut off date and I just tested again with this prompt: “Can the lidar of a self driving car damage a phone camera sensor?” and the answer is still wrong in my test.
I believe the issue is the training cut off date, that’s my point, LLM seem smart but they have limits and when asked about something discovered after training cut off date, they will sometimes confidently be wrong.
I didn't miss your point, rather I wanted you to realize some deeper points I was trying to make
- Not all LLM are the same, and not identifying your tool is problematic because "LLM's can't do a thing" is very different than "The particular model I used failed at this thing". I demonstrated that by showing that many LLMs get the answer right. It puts the onus of correctness entirely on the category of technology, and not the tool used or the skill of the tool user.
- Training data cutoffs are only one part of the equation: Tool use by LLM's allows them to search the internet and run arbitrary code (amongst many other things).
In both of my cases, the training data did not include the results either. Both used a tool call to search the internet for data.
Not realizing that modern AI tools are more than an LLM with training data, but rather have tool calling, full internet access, and can access and reason about a wide variety of up to date data sources demonstrates a fundamental misunderstanding about modern AI tools.
Personally, I don't use Claude for this kind of thing because while it's proven to be a very good at being a coding assistant and interacting with my IDE in an "agentic" manner, it's clearly not designed to be a deep research assistant that broadly searches the internet and other data sources to provide accurate and up to date information. (This would mean that ai/model selection is a skill issue and getting good results from AI tools is a skill, which is borne out by the fact that I get the right answer every time I try, and you can't get the right answer once).
Some years ago I worked with a scale-up that was really focused on the way they handled data in their product. At some point they started to talk about standardizing their data transfer objects, the data that flows over the API connections, in these common models. The idea was that there would be a single Invoice, User, Customer concept that they can document, standardize and share over their entire application landscape. What they were inventing is now known as a Canonical Data Model. A centralized data model that you reuse for everything. And to be fair to that team, there are companies that make this work. Especially in highly regulated environments you can see this in play for some objects. In banks or medical companies it’s not uncommon to have data contracts that need to encapsulate a ledger or medical checks.
## Bounded context When that team was often talking about domain driven design concepts (value objects, unambiguous language) they seemed to miss the domain part. More specifically, the bounded context. A customer can mean a lot of things to a lot of different people. This is the bounded context. For a sales person a customer is a person that buys things, for a support person they are a person that needs help. They both have different lenses. Now if we keep following the Canonical Data Model, this Customer object will keep on growing. Every week there will be a committee that decides what fields need to be added (you cannot remove fields as that impacts your applications).
## Enter the Data Mesh A way to solve this, is data mesh. This takes the concept of bounded context as a core principle. In the context of this discussion, data mesh sees data as a product. A product that is maintained by the people in the domain. That means that a customer in the Billing domain only maintains and focuses on the Billing domain logic in the customer concept. They are responsible for the quality and contract but not for the representation. That means in practice that they can decide how a VAT number is structured. But not how the Sales team needs to format said model. They have no control or interest in how other domains use the data. It’s a very flexible design but while Data Mesh solves the coupling problem, it introduces a new set of challenges. If I’m an analyst trying to find ‘Customer Revenue,’ do I look in Sales, Billing, or Marketing? The answer is usually ‘all of the above.’ In a pure Mesh, you don’t make multiple calls, you have to build multiple Anti-Corruption Layers just to get a simple report. It requires a high level of architectural maturity and that is something not every low-code or legacy team possesses.
## Federated Hub-and-Spoke Data Strategy Let’s try and see if we can combine these two strategies. We centralize our data in a central lake. Yes, that is back to the CDM setup. But we split it up in federated domains. You have a base Customer table that you call CustomerIdentity that is connected to a SalesCustomer, SupportCustomer, … Think of this as logical inheritance, a ‘CustomerIdentity’ record that is extended by domain-specific tables through a shared primary key. When you create a new Customer in your sales tool you trigger an event. The CustomerCreate event. The CustomerCreate trigger fills out the base information for the Customer (username, firstName, lastName) in the central data lake, at the same time we store our customer (base and domain specific data) in our local database. You also do this for delete and update events. The base information goes to the server, the domain specific data stays on the sales tool as a single source of truth. Every night there is a sync of the domain tools to the central lake to fill out the domain tables with a delta