Hacker Newsnew | past | comments | ask | show | jobs | submit | tnorgaard's commentslogin

This talk seems set out to prove that "XML is Bad". Yes XML-DSig isn't great with XPaths, but most of these attack vectors has been known for 10 years. There is probably a reason why the vulnerabilities found where in software not commonly used, e.g. SAP. Many of the things possible with XML and UBL simply isn't available in protobuf, json. How would you digitally sign a Json document and embed the signature in the document?

The article nor the talk appear to reference the XML standard that EN 16931 is built upon: Universal Business Language, https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=... - which is freely available. Examples can be found here: https://github.com/Tradeshift/tradeshift-ubl-examples/tree/m... . It is a good standard and yes it's complex, but it is not complicated by accident. I would any day recommend UBL over IDOC, Tradacom, EDIFACT and the likes.


Most of these attack vectors have been known for 10 years, and yet researchers keep finding bugs in major implementations to this day. Here's one from last week: https://portswigger.net/research/the-fragile-lock

> How would you digitally sign a Json document and embed the signature in the document?

You would not, because that's exactly how you get these bugs. Fortunately serialization mechanisms, whether JSON or Protobuf or XML or anything else, turn structured data into strings of bytes, and signature schemes operate on strings of bytes, so you'll have a great time signing data _after_ serializing it.


This seems like a distinction without meaning. The question is whether JSON serializations intended for canonical signing would be somehow safer than those XML serializations. Obviously people would like all the same features that caused problems before.


That is not, in fact, the question. The whole point of storing signatures separately from the serialized bytes they sign is not having to rely on any properties of the serialization scheme. It does not matter whether your serialization is canonical or not if you don't need to parse the document before you've verified the signature on it. XML-DSig, to the contrary, requires that you parse the document, apply complex transformations to it, and then reserialize it before you can verify anything, which is what makes bugs like "oops the canonicalization method errored and now my library will accept a signature over the empty string as valid for any document" (https://portswigger.net/research/the-fragile-lock#void-canon...) possible.


You are saying people shouldn't want what they want and since JSON has no standards for it you assume it won't happen. Not even X509 is interested in working with detached signatures.

> It does not matter whether your serialization is canonical or not if you don't need to parse the document before you've verified the signature on it.

It most certainly does. First or last duplicate key?


I am comfortable saying that, when designing a signature scheme, people should not want features that are known to consistently lead to catastrophic vulnerabilities.


When I look at JSON related crypto, say JWT or WebAuthn, I am (un)comfortable saying the CVE causing complexities are there but repeating and not consolidated on a standard layer.


I'm not sure why you take me for a JSON/JWT fan (I'm happy to agree they've had their own share of implementation bugs), or what that has to do with signature wrapping bugs in XML-DSig, which is what I've been talking about this entire time.


-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

> How would you digitally sign a Json document and embed the signature in the document?

Embedding a signature into the same file is easy enough.

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v0.9.7 (GNU/Linux)

iEYEARECAAYFAjdYCQoACgkQJ9S6ULt1dqz6IwCfQ7wP6i/i8HhbcOSKF4ELyQB1

oCoAoOuqpRqEzr4kOkQqHRLE/b8/Rw2k =y6kj

-----END PGP SIGNATURE-----


Or use something similar to jwts, you normalize the document, sign the hash, wrap the original document with metadata and include the signature.


If one has a reproducible JSON serializer, then one can add a signature to any JSON object via serializing the object, signing that and then adding the resulting signature to the original object.

This avoids JSON-inside-JSOn and allows to pretty-print the original object with the signature.


> If one has a reproducible JSON serializer

Pretty significant catch if interoperability is a concern at all. Whitespace is easy enough to handle but how do dict keys get ordered? Are unquoted numbers with high precision output as-is or truncated to floats/JS Numbers? Is scientific notation ever used and if so when?


Just so people this far down can look it up the term is Canonicalization, and its cousin collation.

These are non-trivial issues that, thankfully, some very smart and/or experienced people have usually handled for us. However, they still frequently lead to all sorts of vulnerabilities. "Stuffing" attacks sometimes rely on these issues, as have several major crypto incidents.


> How would you digitally sign a Json document and embed the signature in the document?

Presumably the same way you accomplish the thing in xml:

    { “signature”: “…”, “payload”: … }


> XML-DSig isn't great with XPaths

Or at all.

> How would you digitally sign a Json document and embed the signature in the document?

Preferrably you wouldn't because that's a really bad idea.

That said, this type of support-every-conceivable-idea design-by-committee systems would be equally bad built on json or anything else. That much is true.

There's probably no silver bullet here. But that is still not an excuse for XML-Sig.


Other answers are good. One more that you could do is put the JSON document inside a container (A zip archive for example). Then your document can effectively be

    invoice.inv (zip archive)
    └- payload.json
    └- signature.asc
This has the benefit of adding more opportunities for many json documents within the archive.

It's effectively what the Java jar is.


dont unzip an untrusted payload


Unless you are worried about something like a gzip bomb, I don't see why this is an issue. A lot of formats are effectively just zips. The xlsx, odf, etc for example. It's a pretty common format style.

It helps to have a well defined expected structure in the archive.



Right, so long as step 1 in reading your file isn't "extract everything" you're pretty safe.

This specific exploit is one that only exists when you are extracting a zip on windows.


this is just one instance of a vulnerability associated with unzipping; a curious search would yield more.


A curious search reveals that vulnerabilities that do exist are of 2 flavors.

1. Standard C memory vulnerabilities

2. Unsafe file traversal while unzipping

The entire second class is avoided in a fixed file format. The first class of vulnerabilities plague everything. A quick look at libxml2 CVEs shows that.


and the zip bombs you mentioned! i keep a dummy SD card with one hehe.

but yeah the first class of vulns is why we have advice like don’t run untrusted input, which is not dissimilar to “don’t unzip untrusted payloads”.


As having implemented EDIFACT parsers and translation layers, Universal Business Language (Oasis UBL) is a bliss to work with. Yes, it's a big standard and looks scary when starting out with it, but it is very well designed for a complicated world.


Does the "Stop broadcasting SSID" option in most Wifi access points / routers prevent wardriving or is the BSSID still leaked?


In this case the AP still beacons (which includes the BSSID), just with the SSID field set to "".


I wish we would focus on making tooling better for W3C EXI (Binary XML encoding) instead of inventing new formats. Just being fast isn't enough, I don't see many using Aeron/SBT, it need a ecosystem - which XML does have.


Binary XML encoding (like W3C EXI) is useful in some contexts, but it’s generally not as efficient as modern binary serialization formats. It also can’t naturally express shared or circular reference semantics, which are important for complex object graphs.

Fory’s format was designed from the ground up to handle those cases efficiently, while still enabling cross‑language compatibility and schema evolution.


I am not sure if W3C EXI, or ASN.1 BER or something else is better, but agree that using DOP (rather than OOP) design principles is the right answer -- which means focusing on the encoding first, and working backwards towards the languages / clients.


DOP is great, but there’s always a gap between DOP and OOP. That gap is where Fory comes in. Right now, Fory takes an OOP‑first approach, but next we’ll add a DOP path by introducing an optional IDL — bridging the two styles. My goal is for the IDL to also support optional OOP‑style expressiveness, so teams can choose the balance that fits their needs.


DOP is very interesting, I like this idea too — most DOP approaches are implemented via an IDL, which is another valid direction. I plan to support that in Fory. I want to give users the freedom to choose the model that works best for them.


Super interesting compiling pg, I assume, with same as the zfs block size! It was always on our todo to try, but never got around to it. If possible, what block size did you end up with? Have you tried zfs direct io in 2.3.x, if so, could you share any findings? Thanks for sharing - and cool website!


I don’t think Postgres will be able to benefit from direct io? I might be wrong though!

I use Postgres with 32K BLKSZ.

I am actually using default 128K zfs recordsize, in a mixed workload, I found overall performance nicer than matching at 32K, and compression is way better.

> Thanks for sharing - and cool website!

Thank you!


If I may make an suggestion: Instead of a static json file, read at boot, I'd suggest passing the feature flags down per request as a header, or a pointer to the set of feature flag. So that all systems for a given request, observe the same features. Just my 2 cents.


Viking Link, the 765 km HVDC (VSC-based) link rated at 1400 MW between England and Denmark has a rated loss at 3.7% [0].

[0] https://www.viking-link.com/auction-faqs


I belive that Solaris (OpenSolaris) Zones predates LXC by around 3 years. Even when working with k8s and docker every day, I still find what OpenSolaris had in 2009 superior. Crossbow and zfs tied it all together so neatly. What OpenSolaris could have been in another world. :D


Answer: Materalized Views.

On a unrelated note: Still hoping for those automatically refreshed materalized views in PostgreSQL, ala what VoltDB has.


I've been looking at Materielize for a while (https://materialize.com/). It can handle automatically refreshed materialized views. Last time I checked, it didn't support some Postgres SQL constructs that I use often, but I'm really looking forward to it.


> Still hoping for those automatically refreshed materialized views in PostgreSQL, ala what VoltDB has.

Not exactly what you're hoping for and you probably already follow this pattern. pg_cron can help (and is now available in AWS RDS).

```sql CREATE EXTENSION IF NOT EXISTS pg_cron;

CREATE MATERIALIZED VIEW IF NOT EXISTS activeschema.some_thing_cached AS ...;

SELECT cron.schedule('some_thing_cached', '/5 * * *', $CRON$ REFRESH MATERIALIZED VIEW some_thing_cached; $CRON$ ); ```


I think that the problem is when you have a materialized view which takes hours to refresh. We are lucky that 99% of our traffic is during 7-19 on weekdays, so we can just refresh at night, but that won't work for others.

I don't know much about how postgresql works internally, so I just probably don't understand the constraints. Anyway as I understand, there are two ways to refresh. You either refresh a view concurrently or not.

If not, then postgres rebuilds the view from its definition on the side and at the end some internal structures are switched from the old to the new query result. Seems reasonable, but for some reason, which I don't understand due to my limited knowledge, an exclusive access lock is held for the entire duration of the refresh and all read queries are blocked, what doesn't work for us.

If you refresh concurrently, postgres rebuilds the view from its definition and compares the old and the new query result with a full outer join to compute a diff. The diff is then applied to the old data (like regular table INSERT/UPDATE/DELETE I assume), so I think you get away with just an exclusive lock and read access still works. There are two downsides to this, first that it requires a UNIQUE constraint for the join, second that the full outer join is a lot of additional work.

I never had the time to test Materialize, but it seems to do what I want with its continuous refresh.

I also thought about splitting the materialized view into two, one for rarely changing data and another one for smaller part of the data which changes daily. Then I would only have to refresh the smaller view and UNION ALL both materialized views in a regular view. Not sure how well will that work with postgres query planner.


Not sure about how that would work with the PG query planner either, but a batch for rarely changing data and rapid changing data is basically the Lambda data architecture, so probably a good call!


If it's a one shot data compilation, you could use something like postgres' NOTIFY to trigger a listening external app.


There's one gotcha with this approach: if there's another DDL operation running simultaneously with REFRESH MATERIALIZED VIEW, you'd get an internal postgres error.

You cannot be sure that refresh won't coincide with a grant on all tables in the schema, for example.


Given how well they work on any non-specialised DBMS, I prefer Postgres to take their time and do it right (AKA, differently from everybody else).


TimescaleDB (psql extension) has these, specific to time-series however.

https://docs.timescale.com/timescaledb/latest/how-to-guides/...


Mssql has "indexed views" which are automatically updated instantly... But they destroy your insert/update performance and their requirements are so draconian as to be completely impossible to ever actually use (no left joins, no subqueries, no self joins, etc...).


Yes, views are nice, but there is also a fair concept of not needlessly bogging down a table. Sure, they were making up data, but a flat table with stats, profile data and other easily external data is just bloat. Once you have an id then static fields can be retrieved from other services/data stores.


I'm not sure I am following. Aren't materialized views just formal, cached results of a query? That wouldn't bog down a table.


I think their point is more ‘don’t store all that junk in your primary database and then do all your work on it there too if you can just stuff it somewhere else’. Which has pros and cons and depends a lot on various scaling factors.


Materialized views are persistent tables that are typically updated when the underlying data is updated.

Typically.


I'm pretty sure most engines use the term "materialized views" for eventual consistency tables. The only db I've seen with that kind of ACID materialized view is MS SQL, which calls them "indexed views".


Perhaps he means it will bog down on refresh.


Maybe? Not sure.


Another thing I'm waiting for in Postgres is lifting and decoupling from the connection limit...


If one wanted to do server side rendering in Java with something like Turbo links, in 2020 - what would one use? JSP? Grails? JSF? Or just hit the bar instead? :-)


It depends!

JSF really is meant for quickly building internal applications that don't have to withstand "web scale" loads. It's focused on churning out data driven applications quickly. Add something like Bootsfaces or Primefaces and you can produce these things in very little time. Thats not to say you couldn't use JSF to make a "Web Scale" project, but you would have to dive into your server pretty far to carefully watch state management and session creation. Not impossible, but just probably not its primary purpose.

For external facing applications that need to withstand a "web scale" load, Eclipse Krazo (aka MVC spec 1.2) and JSP are what you're looking for. These things are lightning fast and give you a lot of control over session creation by default. Render times are usually under a few ms. This is probably the fastest and least resource intensive stack available (no benchmark provided, take it for what you paid for the comment).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: