Intel's menu of page sizes is an artifact of its page table structure.
On x86 in 64-bit mode, page table entries are 64 bits each; the lowest level in the hierarchy (L1) is a 4K page containing 512 64-bit of PTEs which in total map 2M of memory, which is not coincidentally the large page size.
The L1 page table pages are themselves found via a PTE in a L2 page table; one L2 page table page maps 512*2M = 1G of virtual address space, which is again, not coincidentally, the huge page size.
Large pages are mapped by a L2 PTE (sometimes called a PDE, "page directory entry") with a particular bit set indicating that the PTE points at the large page rather than a PTE page. The hardware page table walker just stops at that point.
And huge pages are similarly mapped by an L3 PTE with a bit set indicating that the L3 PTE is a huge page.
Shoehorning an intermediate size would complicate page table updates or walks or probably both.
Note that an OS can, of its own accord independent of hardware maintain allocations as a coarser granularity and sometimes get some savings out of this. For one historic example, the VAX had a tiny 512-byte page size; IIRC, BSD unix pretended it had a 1K page size and always updated PTEs in pairs.
Hmm? Pretending the page size is larger than it is would not yield the primary performance benefits of reduced TLB misses. Unless I am missing something, that seems more like a hack to save a tiny bit of kernel memory on a constrained system by having two PTE’s backed by the same internal page structure.
Unless we can change the size of the smallest page entry on Intel, I doubt there is room to do anything interesting there. If we could do like ARM and just multiply all the page sizes by 4 you would avoid any “shoehorning”.
The smallest page size tends to get entrenched in the rest of the system (for things like linker page sizes, IOMMU interfaces, etc.,); growing the smallest page size might not be a viable option in existing systems and it might be easier to introduce intermediate-size TLB entries, perhaps formed by consolidating adjacent contiguous PTE's..
On x86 in 64-bit mode, page table entries are 64 bits each; the lowest level in the hierarchy (L1) is a 4K page containing 512 64-bit of PTEs which in total map 2M of memory, which is not coincidentally the large page size.
The L1 page table pages are themselves found via a PTE in a L2 page table; one L2 page table page maps 512*2M = 1G of virtual address space, which is again, not coincidentally, the huge page size.
Large pages are mapped by a L2 PTE (sometimes called a PDE, "page directory entry") with a particular bit set indicating that the PTE points at the large page rather than a PTE page. The hardware page table walker just stops at that point.
And huge pages are similarly mapped by an L3 PTE with a bit set indicating that the L3 PTE is a huge page.
Shoehorning an intermediate size would complicate page table updates or walks or probably both.
Note that an OS can, of its own accord independent of hardware maintain allocations as a coarser granularity and sometimes get some savings out of this. For one historic example, the VAX had a tiny 512-byte page size; IIRC, BSD unix pretended it had a 1K page size and always updated PTEs in pairs.