> I do wonder why there isn't an API for "lazy munmap()"... it would behave exactly like munmap(), except that the pages might remain accessible in other threads until the end of their timeslices, when the kernel can apply queued TLB flushes.
That does exist, it's called MADV_DONTNEED and most operating systems have implemented it. However, MADV_DONTNEED on Linux was incorrectly implemented and (from memory) would always result in the page being unmapped immediately -- making it roughly equivalent to MADV_FREE. To quote the man page:
> All of the advice values listed here have analogs in the POSIX-specified posix_madvise(3) function, and the values have the same meanings, with the exception of MADV_DONTNEED.
MADV_DONTNEED Allows the VM system to decrease the in-memory priority
of pages in the specified address range. Consequently,
future references to this address range are more likely
to incur a page fault.
Neither of these suggest that the contents of the memory content can be discarded (as Linux does), only that it can be swapped out.
So it doesn't appear that this provides the "lazy unmap to avoid shootdown" behavior on any OS.
After reading that I wondered what happens on Linux if you actually call posix_madvise(POSIX_MADV_DONTNEED). The answer, from glibc:
/* We have one problem: the kernel's MADV_DONTNEED does not
correspond to POSIX's POSIX_MADV_DONTNEED. The former simply
discards changes made to the memory without writing it back to
disk, if this would be necessary. The POSIX behavior does not
allow this. There is no functionality mapping the POSIX behavior
so far so we ignore that advice for now. */
if (advice == POSIX_MADV_DONTNEED)
return 0;
That does exist, it's called MADV_DONTNEED and most operating systems have implemented it. However, MADV_DONTNEED on Linux was incorrectly implemented and (from memory) would always result in the page being unmapped immediately -- making it roughly equivalent to MADV_FREE. To quote the man page:
> All of the advice values listed here have analogs in the POSIX-specified posix_madvise(3) function, and the values have the same meanings, with the exception of MADV_DONTNEED.