For some time recently, I was zooming in on Bosch's The Garden of Earthly Delights. The floor's level of interactivity would be so nice there. At least on this floor, I can guess what's going on quite reliably. The experience is quite similar at some level though. I saw Bosch's originals (or 1-to-1 by size repros) many years ago and without zooming in, it was incomprehensible. With zoom, the details are overwhelming.
Note this digitisation was by a company called Mad Pixel, and supported by Google in 2009. It was the first experiment that later became the Google Art Project in 2011 (now Google Arts & Culture).
Agreed. I had to run Windows recovery only once over the last 5+ years, after running some debloating script with many thousands stars on GitHub.
I think the Pro version is enough for reasonable experience, most of the terrible stories originate from the Home version, which should be avoided like the plague.
I've spent several days trying to get Pro version to usable state.
By usable I mean that it doesn't kill my work session because some random app I've never installed, used or asked for fails its auto-update in the middle of the day and kills the WSL process.
It still has magically resetting settings so if you are not careful telemetry/ads/spying will be back on the menu. It still has hostile settings to keep your computer connected when it sleeps which are very hard to turn off.
There are multiple settings in Windows that are hidden which only appear in the menu when you add a registry entry.
There are so many anti-patterns in Windows it feels like defending against a determined hacker who tries to make your life worse and is hunting for a slight misstep to turn the shit back on.
Group Policy Edit is the way to restrict many things. Disabling automatic updates helps. I have had forced reboots very rarely, I believe that were severe vulnerability fixes.
But my use case is never 24/7, I hibernate it overnight and every time I leave for longer than going to a grocery shop, and I have several Proxmox boxes with proper OSes for hosting stuff. Windows + WSL is my dev/media/web/files/OneDrive machine, a compact silent SFF box that is powerful enough for 90+% of my daily tasks. Lately I try Linux Desktop on Fedora/Ubuntu with every major version, however RDP server and secure boot that I can trust to work and not break myself - these things remain unsatisfactory.
I disabled auto updates by pinning the target version in group policy and then finding some hacks on the web to make it always ask before download. I've run many other random scripts and then found Windhawk to remove more annoyances (taskbar and sections of start menu).
I then shut down more things and disabled Bluetooth on lock. It is now usable and doesn't crash but feels very fragile. I will soon face dilemma of allowing "feature" updates or be out of security ones.
Then at a lower level and smaller latencies it's often interrupt moderation that must be disabled. Conceptually similar idea to the Nagle algo - coalesce overheads by waiting, but on the receiving end in hardware.
I could do half-screen nested array formulas when Excel was before the ribbon (and screen resolutions were smaller), out of necessity and because I could. It was in quite demanding uni home calculations and then mostly when working as intern in IB. But then having a life is also important...
The only thing I still enjoy is that any data smaller than 1M rows is sliced and diced almost without thinking. I am sometimes really grateful that MS did not break the shortcuts, while almost breaking the product overall. The muscle memory works perfectly.
Just yesterday B1M published an interesting video about the future longest tunnel between Lyon, France and Turin, Italy. It will be more than 50km, deeply below the Alps. The project has finally secured funding, from both countries and EU, and is on track.
It would be brilliant. Currently the Paris-Milan train line is barely competitive with flying between the two; knocking off 2-3 hours from the trip would make it around 4 hours in total, which is very competitive with flying (1h30 flight, but both CDG and Malpensa are big airports far outside the city, with significant time wasted getting to them, through security, etc). And of course it would be massive for Lyon - Turin, and Lyon - Milan too, where flying wouldn't even make sense any more.
Tunnels are actually pretty safe in earthquakes, Japan for example is criss crossed with them.
A tunnel is actually the least likely to shake; if you shake a jello with fruit inside it, the surface moves a lot but the interior fruit won’t move all that much.
The 57 km Gotthard Base Tunnel has been in operation since 2016. There's also a 3km long tunnel between France and Italy that opened in 1882. Nowadays there's probably hundreds of 1km+ tunnels in the Alps.
Italy isn't a puny country, it's over 1000kms between Sicily and the Alps (Like LA to Albuquerque), seems the fault lines reaches northern Italy (about 100km from the alps) but the amount of larger quakes seems smaller there.
The main bike rental Velib in Paris has the app not working, but the bikes can be taken with NFC. However, my station, which is always full at this time, is now empty, with only 2 bad bikes. It maybe related. Yet, push notifications are working.
I'm going to take the metro now and thinking how long do we have until the entire transit network goes down because of a similar incident.
Unsafety means different things. In C#, SIMD is possible via `ref`s, which maintains GC safety (no GC holes), but removes bounds safety (array length check). The API is called appropriately Vector.LoadUnsafe
It looks suspicious at 25x. Even 2.5x would be suspicious unless reading very small records.
I assume both cases have the file cached in RAM already fully, with a tiny size of 100MB. But the file read based version actually copies the data into a given buffer, which involves cache misses to get data from RAM to L1 for copying. The mmap version just returns the slice and it's discarded immediately, the actual data is not touched at all. Each record is 2 cache lines and with random indices is not prefetched. For the CPU AMD Ryzen 7 9800X3D mentioned in the repo, just reading 100 bytes from RAM to L1 should take ~100 nanos.
The benchmark compares actually getting data vs getting data location. Single digit nanos is the scale of good hash tables lookups with data in CPU caches, not actual IO. For fairness, both should use/touch the data, eg copy it.
doing these sorts of benchmarks is actually quite tricky. you must clear the page cache by allocating >1x physical ram before each attempt.
moreover, mmap by default will load lazy, where mmap with MAP_POPULATE will prefetch. in the former case, reporting average operation times is not valid because the access time distributions are not gaussian (they have a one time big hit at first touch). with MAP_POPULATE (linux only), there is long loading delay when mmap is first called, but then the average access times will be very low. when pages are released will be determined by the operating system page cache eviction policy.
the data structure on top is best chosen based on desired runtime characteristics. if it's all going in ram, go ahead and use a standard randomized hash table. if it's too big to fit in ram, designing a structure that is aware of lru style page eviction semantics may make sense (ie, a hash table or other layout that preserves locality for things that are expected to be accessed in a temporally local fashion.)
> For the CPU AMD Ryzen 7 9800X3D mentioned in the repo, just reading 100 bytes from RAM to L1 should take ~100 nanos.
I think this is the wrong order of magnitude. One core of my Ryzen 5 3500U seems to be able to run memcpy() at 10 gigabytes per second (0.1 nanoseconds per byte) and memset() at 31 gigabytes per second (0.03 nanoseconds per byte). I'd expect a sequential read of 100 bytes to take about 3 nanoseconds, not 100 nanoseconds.
However, I think random accesses do take close to 100 nanoseconds to transmit the starting row and column address and open the row. I haven't measured this on this hardware because I don't have a test I'm confident in.
100 nanoseconds from RAM is correct. Latency != bandwidth. 3 nanoseconds would be from cache or so on a Ryzen. You ain't gonna get the benefits of prefetching on the first 100 bytes.
Yes, my comment clearly specified that I was talking about sequential reads, which do get the benefits of prefetching, and said, "I think random accesses do take close to 100 nanoseconds".
If you're doing large amounts of sequential reads from a filesystem, it's probably not in cache. You only get latency that low if you're doing nothing else that stresses the memory subsystem, which is rather unlikely. Real applications have overhead, which is why microbenchmarks like this are useless. Microbenchmarks are not the best first order estimate for programmers to think of.
Yes, I went into more detail on those issues in https://news.ycombinator.com/item?id=45689464, but overhead is irrelevant to the issue we were discussing, which is about how long it takes to read 100 bytes from memory. Microbenchmarks are generally exactly the right way to answer that question.
Memory subsystem bottlenecks are real, but even in real applications, it's common for the memory subsystem to not be the bottleneck. For example, in this case we're discussing system call overhead, which tends to move the system bottleneck inside the CPU (even though a significant part of that effect is due to L1I cache evictions).
Moreover, even if the memory subsystem is the bottleneck, on the system I was measuring, it will not push the sequential memory access time anywhere close to 1 nanosecond per byte. I just don't have enough cores to oversubscribe the memory bus 30×. (1.5×, I think.) Having such a large ratio of processor speed to RAM interconnect bandwidth is in fact very unusual, because it tends to perform very poorly in some workloads.
If microbenchmarks don't give you a pretty good first-order performance estimate, either you're doing the wrong microbenchmarks or you're completely mistaken about what your application's major bottlenecks are (plural, because in a sequential program you can have multiple "bottlenecks", colloquially, unlike in concurrent systens where you almost always havr exactly one bottleneck.) Both of these problems do happen often, but the good news is that they're fixable. But giving up on microbenchmarking will not fix them.
If you're bottlenecked on a 100 byte read, the app is probably doing something really stupid, like not using syscalls the way they're supposed to. Buffered I/O has existed from fairly early on in Unix history, and it exists because it is needed to deal with the mismatch between what stupid applications want to do versus the guarantees the kernel has to provide for file I/O.
The main benefit from the mmap approach is that the fast path then avoids all the code the kernel has to execute, the data structures the kernel has to touch, and everything needed to ensure the correctness of the system. In modern systems that means all kinds of synchronization and serialization of the CPU needed to deal with $randomCPUdataleakoftheweek (pipeline flushes ftw!).
However, real applications need to deal with correctness. For example, a real database is not just going to just do 100 byte reads of records. It's going to have to take measures (locks) to ensure the data isn't being written to by another thread.
Rarely is it just a sequential read of the next 100 bytes from a file.
I'm firmly in the camp that focusing on microbenchmarks like this is frequently a waste of time in the general case. You have to look at the application as a whole first. I've implemented optimizations that looked great in a microbenchmark, but showed absolutely no difference whatsoever at the application level.
Moreover, my main hatred for mmap() as a file I/O mechanism is that it moves the context switches when the data is not present in RAM from somewhere obvious (doing a read() or pread() system call) to somewhere implicit (reading 100 bytes from memory that happens to be mmap()ed and was passed as a pointer to a function written by some other poor unknowing programmer). Additionally, read ahead performance for mmap()s when bringing data into RAM is quite a bit slower than on read()s in large part because it means that the application is not providing a hint (the size argument to the read() syscall) to the kernel for how much data to bring in (and if everything is sequential as you claim, your code really should know that ahead of time).
So, sure, your 100 byte read in the ideal case when everything is cached is faster, but warming up the cache is now significantly slower. Is shifting costs that way always the right thing to do? Rarely in my experience.
And if you don't think about it (as there's no obvious pread() syscall anymore), those microseconds and sometimes milliseconds to fault in the page for that 100 byte read will hurt you. It impacts your main event loop, the size of your pool of processes / threads, etc. The programmer needs to think about these things, and the article mentioned none of this. This makes me think that the author is actually quite naive and merely proud in thinking that he discovered the magic Go Faster button without having been burned by the downsides that arise in the Real World from possible overuse of mmap().
Perhaps surprisingly, I agree with your entire comment from beginning to end.
Sometimes mmap can be a real win, though. The poster child for this is probably LMDB. Varnish also does pretty well with mmap, though see my caveat on that in my linked comment.
Varish was very well done. It's disappointing that with HTTPS-first nowadays there is very little oppourtunity to make good use of local web caches of web content across browsers / clients. Caches would have been a godsend back in the 1990s when we had to use shared dialup to connect to the internet while using NetScape in a classroom full of computers.
> For the CPU AMD Ryzen 7 9800X3D mentioned in the repo, just reading 100 bytes from RAM to L1 should take ~100 nanos.
It's important to note that throughput is not just an inverse of latency, because modern OoO cpus with modern memory subsystems can have hundreds of requests in flight. If your code doesn't serialize accesses, latency numbers are irrelevant to throughput.
That's such an obvious error in their benchmark code. In my benchmark code I make sure to touch the data so at least the 1st page is actually paged in from disk.
I wonder if most complaints are about pre-installed OEM Windows Home (the one with Candy Crush and 10s of other crap, including from a vendor) and bundled crappy cut-off OneDrive? I have Windows Pro and Office 365 Family option (5 accounts, full Office and 1TB OneDrive each). Most user-hidden Windows settings are in Group Policy Editor, or registry still works. OneDrive proper has toggles for every folder (Desktop, Documents, Puctures) discussed in the post.
After I lost 8 months of photos with a phone ~10 years ago, being sure it was all backed to Google Photos, I would rather trust Microsoft, than risk losing data, and now backup to both clouds. The paid Office+OneDrive is great value.
It just works. Yes, defaults are annoying, but could be changed. I recently enabled a blocked-by-default outgoing firewall, and I have much more questions to JetBrains Rider trying to ignore my system DNS setting and so to bypass Pi-Hole multiple times per minute, than to Microsoft.
That is probably the case, but Windows has always been schizophrenic when it comes to settings - there's the UI, the control panel, the second new control panel, the cli, group policy, the registry...
Frame it as "it's 2025 and this is my first look at Windows", it's pretty bad, and it sucks because if they installed on a Home SKU, we end up having to tell them to reinstall to get control.
Microsoft is notoriously bad with naming. In this case likely intentionally. The SKUs: Home = Crap, Pro = OKish Windows, Enterprise = Pro. But people who do not care about lack of RDP server, Hyper-V, BitLocker do not care about the rest, probably. Then confusion araises from "first look at Windows" by pro users.
https://en.wikipedia.org/wiki/The_Garden_of_Earthly_Delights...