Coming from a data engineering and business analytics background, the terms "structured" and "unstructured" seem to be swapped in meaning here.
I'd expect to talk about anything in a DB as "structured data". Flexible serialization formats such as JSON or XML are "semi-structured". And something from e.g., an office document is "unstructured".
This is not a complaint or criticism. I understand the article just fine. It's just interesting how different perspectives can affect what words mean (:
Kleppman[1] calls it schema-on-read (json, xml) and schema-on-write (typed columns in an RDB). I like it over structured/unstructured, it's a bit more specific.
> Coming from a data engineering and business analytics background, the terms "structured" and "unstructured" seem to be swapped in meaning here.
Mark Rosewater likes to write about his personal concept of "linear" Magic: the Gathering decks, which are decks in which the cards tend to pick up synergy bonuses from each other, so that having two of them together is considerably more powerful than you'd expect from the strength of those two cards individually.
This always bothers me because it is the opposite of the more normal use of "linear" relationships, in which everything contributes exactly as much to an aggregate as it's worth individually.
I think the SQL sense is more that "structured" means "it contains internal structure" (like a tree), whereas a table without JSON is free from additional structure apart from its own table structure.
I'd expect to talk about anything in a DB as "structured data". Flexible serialization formats such as JSON or XML are "semi-structured". And something from e.g., an office document is "unstructured".
This is not a complaint or criticism. I understand the article just fine. It's just interesting how different perspectives can affect what words mean (: