Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I do wonder why there isn't an API for "lazy munmap()"... it would behave exactly like munmap(), except that the pages might remain accessible in other threads until the end of their timeslices, when the kernel can apply queued TLB flushes.

That does exist, it's called MADV_DONTNEED and most operating systems have implemented it. However, MADV_DONTNEED on Linux was incorrectly implemented and (from memory) would always result in the page being unmapped immediately -- making it roughly equivalent to MADV_FREE. To quote the man page:

> All of the advice values listed here have analogs in the POSIX-specified posix_madvise(3) function, and the values have the same meanings, with the exception of MADV_DONTNEED.



AFAICT the POSIX behavior for MADV_DONTNEED is just a hint that the memory won't be accessed soon.

https://pubs.opengroup.org/onlinepubs/009695399/functions/po... says:

    POSIX_MADV_DONTNEED
        Specifies that the application expects that it will not access the specified
        range in the near future.
https://www.freebsd.org/cgi/man.cgi?query=madvise&sektion=2 says:

     MADV_DONTNEED    Allows the VM system to decrease the in-memory priority
        of pages in the specified address range. Consequently,
        future references to this address range are more likely
        to incur a page fault.
Neither of these suggest that the contents of the memory content can be discarded (as Linux does), only that it can be swapped out.

So it doesn't appear that this provides the "lazy unmap to avoid shootdown" behavior on any OS.


After reading that I wondered what happens on Linux if you actually call posix_madvise(POSIX_MADV_DONTNEED). The answer, from glibc:

    /* We have one problem: the kernel's MADV_DONTNEED does not
       correspond to POSIX's POSIX_MADV_DONTNEED.  The former simply
       discards changes made to the memory without writing it back to
       disk, if this would be necessary.  The POSIX behavior does not
       allow this.  There is no functionality mapping the POSIX behavior
       so far so we ignore that advice for now.  */
    if (advice == POSIX_MADV_DONTNEED)
      return 0;




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: