Encrypted DNS and NTP = Deadlock

CaliforniaKarl · on Dec 30, 2022

Assuming devices like this get their IP via DHCP, there is a solution that does not involve hard-coding IPs into software.

DHCP option 42 (defined in RFC 2132) can be used to specify multiple NTP server IPv4 addresses.

(There’s also DHCP option 4, but that’s used to specify the IP for the older RFC 868 time protocol.)

DHCPv6 has option 31 for SNTP (via deprecated RFC 4075), and option 56 for NTP (via RFC 5908).

So, that would probably be the best option: Get an NTP address from DHCP or DHCPv6, use that to set your clock, bring up DNS over TLS/HTTPS, re-configure NTP with your preferred source, re-sync your clock, and then continue booting!

magicalhippo · on Dec 30, 2022

I've got a small Raspberry Pi clone set up with a GSP module running chronyd[2], advertised via DHCP as mentioned by GP. Didn't take much time or effort.

I also added a RTC module so it could hold reasonable time during power-loss but I haven't had a chance to verify that it works as expected.

[1]: https://www.aliexpress.com/item/1005001623104119.html The 7M and M8N are both much better than the 6M, regardless get a module with PPS output pin, not all have it. And grab an active antenna, you'll probably need it.

[2]: https://austinsnerdythings.com/2021/04/19/microsecond-accura...

gh02t · on Dec 30, 2022

I did something similar but went a bit more overboard because I could partially write it off as an experiment for work. I found this carrier board for the CM4:

https://www.ebay.com/itm/384876455621

which is pretty nice because it has an onboard RTC and battery backup. I then added on a timing specific GPS module that was made to fit onto the GPIO pins (https://www.tindie.com/products/nsayer/gps-timing-module-bre...). The CM4 is interesting because it supports hardware timestamping on the ethernet so it can actually run PTP for ultra-precise synchronization. Also added one of those SPI 8 digit seven segment displays and wrote a crappy little program to update the time as fast as possible for some bling. It sucks the CM4 ended up being the biggest cost of the project since I ended up paying ~80 bucks for it. Still far cheaper than the cost of a "real" PTP host.

magicalhippo · on Dec 30, 2022

> The CM4 is interesting because it supports hardware timestamping on the ethernet so it can actually run PTP for ultra-precise synchronization.

Cool! I've been reading up on it but hadn't realized the CM4 supported that. Will need to look into that a bit deeper, thanks!

a2tech · on Dec 30, 2022

When you say Raspberry Pi clone, is it a real clone? Or is it just another single board computer?

magicalhippo · on Dec 30, 2022

Sorry, yeah I mean Pi-like SBC, ie with GPIO. In my case a NanoPi NEO-LTS[1], mainly because I had one available and I didn't need much hardware just to run the NTP (it's all it does).

It runs Armbian[2] and has been sitting there doing its thing for a few years now.

All you need is a GPIO for PPS, but it's nice to have the UART so it can parse the time and such.

edit: since it's slightly different on Armbian from a Raspberry Pi, you need to enable the pps-gpio overlay similarly[3], but the pin parameter is different[4], for the H3 in the NanoPi NEO for example see here[5].

[1]: https://www.friendlyelec.com/index.php?route=product/product...

[2]: https://www.armbian.com/nanopi-neo/

[3]: https://forum.armbian.com/topic/9901-pps-gpio-no-longer-defa...

[4]: https://docs.armbian.com/User-Guide_Armbian_overlays/#overla...

[5]: https://github.com/armbian/sunxi-DT-overlays/blob/d925cfbb0c...

nine_k · on Dec 30, 2022

Potentially this has a hole: if a wrong time is set initially, then wrong certificates may be trusted, so spoofed TLS sessions with compromised (and expired) certificates would work, potentially giving a way to install something malicious. It requires some massive spoofing on the outside though.

If the initial NTP addresses are within a datacenter LAN (say, 10.x.x.x, etc), they are likely harder to spoof. Should be fine their case (servers).

cyounkins · on Dec 30, 2022

Yeah I guess in general if the security of TLS depends on correct timekeeping (eg a compromised key enables an attacker to use an old cert), then in theory we should secure the time sync protocol. The NIST servers page [1] describes an authenticated+encrypted NTP for VIPs, but I don't know a solution for the layperson.

[1] https://tf.nist.gov/tf-cgi/servers.cgi

nine_k · on Dec 30, 2022

For TLS, you need a roughly correct time; being many minutes off is usually acceptable. No need for GPS clocks and other such stuff.

Ideally your machine should have a functioning battety-backed RTC. The vast majority of larger machines do.

In a data center, DHCP or well-known local addresses should offer hard-to-spoof poiners to local NTP servers for bootstrapping.

I don't see a large problem here; a reasonable startup sequence that makes sure a correct time is set before attempting TLS connections should just work. DNS requiring TLS and thus a correct system time is slightly novel, so approaches ignoring it expectedly fail.

RedShift1 · on Dec 30, 2022

What do you do with embedded devices where the RTC battery has died? Replace the battery but you have no way to set the clock and the device won't connect to anything anymore because its clock is off, so just throw it away then? Doesn't sound environmentally friendly... And yes CR2032 batteries can last a long time but they can also fail, just a a month ago I had to replace one that was only a year old.

nine_k · on Dec 30, 2022

Attach a com port, set the date correctly, since you came to replace the battery anyway :-/

Frankly, there is a balancing act between the desire for a device to work unattended for the longest time, and security implications of running an outdated or degraded device. Now that the RTC becomes more important, equip it with better batteries :(

RedShift1 · on Dec 30, 2022

You can't expect everyone to do that. Some devices don't have an UART hell some even don't have the pins soldered on to them.

johannes1234321 · on Dec 30, 2022

... and then you have consumers, who don't know what UART, serial port or whatever might be, but just want their "smart" appliance to work.

zokier · on Dec 30, 2022

If you control dhcp, then you could also just include time info in the response, for example option 152.

ilyt · on Dec 30, 2022

Do clients support it ? I never have seen option to take time from DHCP request in client settings

cyounkins · on Dec 30, 2022

Sounds pretty good in theory, but this is for a router and somehow I doubt many ISPs are doing what you describe.

tinus_hn · on Dec 30, 2022

They’d have to acquire old certificates for every domain you’d like to visit using an encrypted connection. Not very likely.

CGamesPlay · on Dec 30, 2022

I think there should be an option in TLS clients to ignore expiration times, explicitly for cases like this. All other validations are performed, just the not-before and not-after times are waived.

Of all the validations, this one seems like the one that causes problems in edge cases most frequently. I'm definitely not saying that expiry times should be ignored by default; just that client should have the option to do it.

This is obviously a problem for embedded devices, but it even goes to the browser level: when a certificate expired yesterday but is otherwise valid, I as a user want to be able to ignore that and only that error, particularly for a pinned certificate, since it's more likely to be incompetent sysadmin than malicious attacker.

infotogivenm · on Dec 30, 2022

This would be the solution I’d go with. For the first DoH call made by NTP after startup, just ignore expiry timestamps.

jedisct1 · on Dec 30, 2022

You can add IP addresses of the NTP servers being used in the captive-portals.txt file, which serves records even before the operating system considers a network interface as active.

Or add cert_ignore_timestamp=true to the main configuration file. Initially, cert expiration won't be checked, but as soon as a DNS server will be reachable, this feature will automatically disable itself.

This is for dnscrypt-proxy. Alternative clients may have something similar.

raggi · on Dec 30, 2022

https://datatracker.ietf.org/doc/html/draft-ietf-ntp-roughti... roughtime has been in development since agl proposed it in 2016. It'd be nice to see it get over the line - until then tlsdate is easy to apply for approximate bootstrapping.

mvip · on Dec 30, 2022

We've seen this (and similar TLS related issues) a fair bit at Screenly when working with Raspberry Pis.

The best workaround that we've found is to use the date in a HTTP Header to set the initial time (if we detect this condition):

$ curl -sI http://api.screenlyapp.com | grep Date

Date: Fri, 30 Dec 2022 08:43:56 GMT

With this set, you should be able to trigger the initial NTP service to start and set the date.

There's a Rust library here that can parse these dates for you: https://docs.rs/httpdate/0.3.2/httpdate/

tjoff · on Dec 30, 2022

But that assumes you have regular dns?

So then you could just query one of the ntp servers using dns instead?

mvip · on Dec 30, 2022

Ah, good point! That was just to show the header. We have a bit different logic in our actual application where we check a number of IPs too, including 1.1.1.1:

$ curl -sI http://1.1.1.1 | grep Date

ianai · on Dec 30, 2022

Do you just pipe that into a date set command?

mvip · on Dec 30, 2022

No, we parse it and set it over DBUS. If you want to do something more turnkey, take a look at htpdate (https://linux.die.net/man/8/htpdate).

ThePowerOfFuet · on Dec 30, 2022

Why does Screenly have their API available on port 80 at all? Great way to leak cookies or tokens...

mvip · on Dec 30, 2022

We obviously don't use HTTP for serving our API. The HTTP end-point is just a 301, but it still contains the Date field:

$ curl -I http://api.screenlyapp.com

HTTP/1.1 301 Moved Permanently

Date: Fri, 30 Dec 2022 09:01:44 GMT [...]

ThePowerOfFuet · on Jan 3, 2023

Regardless of it being a 301, the cookie or token is already sent, and thus already burned, by the time the response code is returned.

Your API should not even be listening on port 80, period.

miyuru · on Dec 30, 2022

> It would be great to see Google or Cloudflare use their infrastructure to provide anycasted NTP IP addresses.

Google, Cloudflare and Facebook has vanity IPv6 address, pretty sure they are all static anycast IPs.

time.google.com - 2001:4860:4806::

time.cloudflare.com - 2606:4700:f1::123

time.facebook.com - 2a03:2880:ff0c::123

mike_d · on Dec 30, 2022

I also run a pair of worldwide anycasted NTP instances if you don't want to deal with smearing or depending on the same company for DNS and time.

45.127.112.2

45.127.113.2

cyounkins · on Dec 30, 2022

Cool! I searched but couldn't find any docs - looks like it might have been for ntpjs? What was the process like to get an ASN to do this?

bobdvb · on Jan 3, 2023

One of the problems with Anycast is that you can get inconsistencies in the responses because you're potentially hitting multiple servers with different RTTs. So you'd want to ensure they're a higher Stratum number than your final, accurate servers.

WirelessGigabit · on Dec 30, 2022

Mind you that Google smeared the leap second. Not sure if the others do, but I think is important to recognize when selecting an NTP.

zokier · on Dec 30, 2022

As long as they do not start smearing leap days I don't think it matters much for bootstrapping. You only need time accurate enough for validating certs, after that you can use whatever pool you want.

plantain · on Dec 30, 2022

Smearing is probably what most people actually want...

oittaa · on Dec 30, 2022

Yeah, this comes up every now and then in these discussions. Both Amazon and Google have explained quite well why smearing is probably the best way to handle leap seconds.

jedisct1 · on Dec 30, 2022

As for IPv4, time.google.com has been 216.239.35.0 since 2016, so it's unlikely to change anytime soon either.

dblitt · on Dec 30, 2022

A similar issue happens when a laptop (with a bad battery) loses its time and can't connect to 802.1x WPA2 enterprise wifi.

At my high school, we had laptop carts that were notorious for losing their time, and nobody could log in because they were bound to AD over wifi. The system was offline because it would reject the RADIUS certificate of the wifi network due to the time being wrong. We had to manually log in as local admin to change the time or plug them all into ethernet until they could connect to NTP.

Bender · on Dec 30, 2022

There may be a clunky work around, depending on what else is on the embedded devices. If it has curl, one could create a local host entry for a known site yes, bad practice I know and then curl --head over plain http to get the date header and use hwclock to bootstrap the system time prior to starting up NTP. Some NTP daemons also have a way to do something like this. Another method would be to bootstrap NTP with a few known dedicated public NTP servers that are in /etc/hosts then switch to the pool. These are all clunky methods but I have seen them used out of desperation for a myriad of "but you shouldn't do that" reasons. A cron job could check a management site for the latest configuration from a json or plain text file so that the device does not fall too far out of sync.

RedShift1 · on Dec 30, 2022

There are multiple "shouldn't do that" and "bad practice" things that saved my ass out in the wild. The "don't do that" people are usually not the ones out in the field having to keep things working...

KirillPanov · on Dec 30, 2022

Same problem with Wireguard and NTP.

You can't (usefully) tunnel NTP inside of wireguard, because if your clock is wrong your peers won't talk to you anymore.

This is my personal pet peeve.

IMHO the wireguard handshake needs to be extended to allow one peer (the one that didn't reboot) to reply to a packet with a non-monotonically-increasing nonce with some signal saying "hey, here is the last nonce I got from you". Obviously this reply would be encrypted.

Then hazmat-free hardware could use these replies to reset its nonce (for that particular peer only) if we haven't had a successful handshake with that peer since the last reboot and/or the system clock is implausible. Obviously this behavior would be off-by-default. I would enable it for my batteryless routers.

ilyt · on Dec 30, 2022

I think the solution here is to get on finally securing NTP, not try to hack around it

KirillPanov · on Dec 30, 2022

That's a fairly glib comment; have you thought it through?

Most of the "secure this" wrappers like TLS+X.509 assume a clock.

bobdvb · on Jan 3, 2023

I think hard coding IPs is generally a bad idea, it might work for one or two users, but if this became standard practice then it would cause issues. I think it would be saner to say "if you don't have valid time (e.g. less than system/kernel build date) then don't use encrypted DNS.". Then NTP domains can be looked up, the answer will be correct enough to set a clock.

Alternatively it would be good to use an anycast IP for NTP. This is normally a bad idea because it makes calculating skew hard/unreliable, but that really should just mean a poorly sync'ed clock. So set the Anycast clock to be an intentionally high/poor Stratum score, list this along with a DNS based address so it's used until the encrypted DNS can be resolved with a better Stratum score.

So, Dear Akamai/Cloudflare/MANGA/etc. please provide a high stratum, Anycast address for basic, approximate NTP.

gonzo · on Dec 30, 2022

Back in the day, Unix wrote the time of day in the superblock for the root fs before unmounting it and rebooting.

phh · on Dec 30, 2022

It still does:

sudo LC_ALL=C /sbin/tune2fs -l /dev/nvme0n1 |grep 2022 tune2fs 1.46.6-rc1 (12-Sep-2022) Last mount time: Thu Dec 29 20:05:22 2022 Last write time: Fri Dec 30 11:14:35 2022 Last checked: Wed Aug 31 09:37:57 2022

(It looks like "Last write time" is actually "Last umount time", because it is not refreshed during usage of the FS)

Jenda_ · on Dec 31, 2022

This is handled for example by fake-hwclock package in Debian (also RaspberryPi OS, installed by default) - it saves it in file, and even updates it every hour (so you won't teleport more than a hour back in time after unclean reboot). However, it of course does not work when you mount read-only because you don't want your microSD card to fail.

cyounkins · on Dec 30, 2022

Interesting! Unfortunately due to write endurance many networking devices mount their filesystem read-only. This means they also sometimes lack a log file describing why the device shut down!

rsimmons · on Dec 30, 2022

Can we change the title to Deadclock

tristor · on Dec 30, 2022

> It would be great to see Google or Cloudflare use their infrastructure to provide anycasted NTP IP addresses.

They do. Cloudflare does anyway. time.cloudflare.com is backed by a set of anycasted IPs covering around 275 POPs.

cyounkins · on Dec 30, 2022

That's great! But I'd want to see it documented - I searched but couldn't find IP addresses, only the domain.

For me time.cloudflare.com = 162.159.200.123

tristor · on Dec 30, 2022

The great thing about anycast is the IPs are the same for everyone, it’s just which POP routing converges on that is different. DNS and anycast are combined primarily for load balancing and failover. Anyone can get and use the IPs directly with dig.

cyounkins · on Dec 30, 2022

Right, but unless it's documented how do you know that the DNS entry won't change? They could change infra so it's DNS load-balanced instead of anycast and still at time.cloudflare.com

It wasn't clear but I provided the IP so others could validate it's the same for them and (in the future) that it hasn't changed.

Joel_Mckay · on Dec 30, 2022

In general, a time-travel sanity check at bootup is wise to handle when the RTC clock battery fails or is just installed (i.e. if the hardware jumps decades back into the past, than the local NTP/DNS/DHCP service is briefly paused while an abnormally large leap-forwards is forced from the time configuration commands. During this process, both client and server side SSL will usually be dropped due to the abnormality.)

Rookie mistake, like not using UTC time on the servers. =)

cyounkins · on Dec 30, 2022

If your clock was just reset, how do you know what time it is? How do you trust that your own clock is accurate? How do you know your clock wasn't incorrect before? Usually the answer to all these is NTP.

Joel_Mckay · on Dec 30, 2022

The epoch "zero time" on most hardware RTC is decades in the past.

Thus, it is generally a safe assumption to check if the system release date is in the future. One often doesn't want to trip this fix every boot as it can have collateral consequences in poorly written software.

We have a no time-travelers policy in most situations, and recommend others try to also constrain transactional-validity temporal-windows. Some hardware fakes an RTC by writing the current time to a cache file on disk on power cycles, but it would take a hour to explain why this is unwise (even if using GPS stratum time). =)

thayne · on Dec 30, 2022

Another potential way to work around this is to use unencrypted DNS for the NTP lookup only. In most cases you probably don't care about your lookup for an ntp server being confidential.

remram · on Dec 30, 2022

I thought the title read "encrypted (DNS and NTP)", e.g. "encrypted DNS and encrypted NTP".

eniac111 · on Dec 30, 2022

Happens to me several times a year. My PiHole is running on LXD :)

notwokeno · on Jan 3, 2023

DNS is a public database. Encrypting it is a bit silly anyway IMO. Certainly between the recursive and authoritative resolvers.

exabrial · on Dec 30, 2022

DNS over TCP/TLS is a stupid idea for a lot of reasons. First off, DNSSec already takes care of integrity protection. No need to re-implement the wheel by adding TLS on top of it… and let’s not forget we’re doing in 150 packets which one we previously did in about two packets.

DNS needs to be connectionless; it’s building block protocol for TCP. DnsCurve is much closer to what we actually need.

cyounkins · on Dec 30, 2022

I've written a bit about DNSSEC. Adoption is only around 1-5%, deployment is a footgun [1], and validation currently imposes significant performance penalties [2], although that could be improved.

I've also written about performance of DNS over TLS [3] and found it to be negligible. The TLS setup is only done infrequently.

[1] https://ianix.com/pub/dnssec-outages.html [2] https://cyounkins.medium.com/costs-and-benefits-of-local-dns... [3] https://cyounkins.medium.com/performance-of-dns-over-tls-4f4...

gsich · on Dec 30, 2022

Negligible only if you have enough DNS queries to keep the connection alive. Last I checked Quad9 and Cloudflare will close the connection quickly, regardless of EDNS keepalive setting.

bcrl · on Dec 30, 2022

DNS over TCP is fine, and I'd far prefer it to be the default over UDP, as any packet loss with DNS over UDP results in rather lengthy timeouts (typically >1 second). DNS over TCP fixes that (or at least reduces the delay to RTT). It's DNS over TLS that's the problem. Adding HTTPS into the mix is just sheer madness.

tptacek · on Dec 30, 2022

DNSSEC isn't "connectionless"; DNSSEC responses frequently exceed the maximum UDP packet size.

TLS DNS provides confidentiality, in addition to hop-by-hop integrity; DNSSEC provides no integrity, which has led to a decade of rationalizing by its advocates about DNS not "needing" confidentiality.

nalllar · on Dec 30, 2022

DNSSEC has the same bootstrap issue. I've had a device fail to sync its time because DNSSEC validation was on and all DNS requests were failing, which prevented lookup of the NTP server address.

zinekeller · on Dec 30, 2022

> DNS needs to be connectionless; it’s building block protocol for TCP.

This doesn't make any sense as you could use DNS and TCP separately (for example finding a hostname using DNS to connect your video streaming ingestion server running via UDP and hardcoded addresses to bootstrap installation files via TCP respectively).

philsnow · on Dec 30, 2022

Another reason it's nonsensical: when a DNS response is too large to fit in the payload of a UDP datagram, the server sets the TC bit in the response header (alongside whatever truncated results it feels like including), notifying the client of the truncation. The client optionally (but SHOULD) falls back to retrying the query over TCP.

https://serverfault.com/a/698254

zinekeller · on Dec 30, 2022

Slowly looks at musl's direction.

(musl doesn't even try DNS/TCP after receiving a TC packet)

tptacek · on Dec 30, 2022

Yes, this is a pretty grave flaw.

liveoneggs · on Dec 30, 2022

DNSSEC breaks DNS by giving fake answers to solve problems no one ever had.