linux/mm
Adam Litke 348ea204cc hugetlb: split alloc_huge_page into private and shared components
Hugetlbfs implements a quota system which can limit the amount of memory that
can be used by the filesystem.  Before allocating a new huge page for a file,
the quota is checked and debited.  The quota is then credited when truncating
the file.  I found a few bugs in the code for both MAP_PRIVATE and MAP_SHARED
mappings.  Before detailing the problems and my proposed solutions, we should
agree on a definition of quotas that properly addresses both private and
shared pages.  Since the purpose of quotas is to limit total memory
consumption on a per-filesystem basis, I argue that all pages allocated by the
fs (private and shared) should be charged against quota.

Private Mappings
================

The current code will debit quota for private pages sometimes, but will never
credit it.  At a minimum, this causes a leak in the quota accounting which
renders the accounting essentially useless as it is.  Shared pages have a one
to one mapping with a hugetlbfs file and are easy to account by debiting on
allocation and crediting on truncate.  Private pages are anonymous in nature
and have a many to one relationship with their hugetlbfs files (due to copy on
write).  Because private pages are not indexed by the mapping's radix tree,
thier quota cannot be credited at file truncation time.  Crediting must be
done when the page is unmapped and freed.

Shared Pages
============

I discovered an issue concerning the interaction between the MAP_SHARED
reservation system and quotas.  Since quota is not checked until page
instantiation, an over-quota mmap/reservation will initially succeed.  When
instantiating the first over-quota page, the program will receive SIGBUS.
This is inconsistent since the reservation is supposed to be a guarantee.  The
solution is to debit the full amount of quota at reservation time and credit
the unused portion when the reservation is released.

This patch series brings quotas back in line by making the following
modifications:
 * Private pages
   - Debit quota in alloc_huge_page()
   - Credit quota in free_huge_page()
 * Shared pages
   - Debit quota for entire reservation at mmap time
   - Credit quota for instantiated pages in free_huge_page()
   - Credit quota for unused reservation at munmap time

This patch:

The shared page reservation and dynamic pool resizing features have made the
allocation of private vs.  shared huge pages quite different.  By splitting
out the private/shared-specific portions of the process into their own
functions, readability is greatly improved.  alloc_huge_page now calls the
proper helper and performs common operations.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:39 -08:00
..
Kconfig small documentation fixes 2007-10-20 02:46:58 +02:00
Makefile memory unplug: page isolation 2007-10-16 09:43:02 -07:00
allocpercpu.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
backing-dev.c mm: per device dirty threshold 2007-10-17 08:42:45 -07:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap.c Remove broken ptrace() special-case code from file mapping 2007-10-31 09:19:46 -07:00
filemap_xip.c mm: write iovec cleanup 2007-10-16 09:42:54 -07:00
fremap.c remap_file_pages: kernel-doc corrections 2007-10-17 08:43:07 -07:00
highmem.c Create the ZONE_MOVABLE zone 2007-07-17 10:22:59 -07:00
hugetlb.c hugetlb: split alloc_huge_page into private and shared components 2007-11-14 18:45:39 -08:00
internal.h Breakout page_order() to internal.h to avoid special knowledge of the buddy allocator 2007-10-16 09:43:01 -07:00
madvise.c speed up madvise_need_mmap_write() usage 2007-07-16 09:05:36 -07:00
memory.c hugetlb: follow_hugetlb_page() for write access 2007-11-14 18:45:39 -08:00
memory_hotplug.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
mempolicy.c Migration: find correct vma in new_vma_page() 2007-11-14 18:45:38 -08:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c Typo fixes retrun -> return 2007-10-20 02:13:26 +02:00
mincore.c [PATCH] mincore: vma crossing fix 2007-02-15 09:57:03 -08:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c NOMMU: mm/nommu.c needs linux/module.h 2007-10-29 07:53:26 -07:00
oom_kill.c oom_kill bug 2007-10-20 15:04:06 -07:00
page-writeback.c mm: speed up writeback ramp-up on clean systems 2007-11-14 18:45:38 -08:00
page_alloc.c Revert "Bias the placement of kernel pages at lower PFNs" 2007-11-12 14:14:44 -08:00
page_io.c Drop 'size' argument from bio_endio and bi_end_io 2007-10-10 09:25:57 +02:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c Quicklists for page table pages 2007-05-07 12:12:54 -07:00
readahead.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
rmap.c Migration: find correct vma in new_vma_page() 2007-11-14 18:45:38 -08:00
shmem.c fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE 2007-10-30 08:06:55 -07:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
slab.c slab: fix typo in allocation failure handling 2007-11-14 18:45:36 -08:00
slob.c Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
slub.c SLUB: killed the unused "end" variable 2007-11-12 10:32:29 -08:00
sparse-vmemmap.c mm/sparse-vmemmap.c: make sure init_mm is included 2007-10-30 08:06:55 -07:00
sparse.c Revert "x86_64: allocate sparsemem memmap above 4G" 2007-10-29 14:05:37 -07:00
swap.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
swap_state.c mm: clarify __add_to_swap_cache locking 2007-10-16 09:42:53 -07:00
swapfile.c Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
truncate.c Drop some headers from mm.h 2007-10-17 08:42:55 -07:00
util.c Slab allocators: fail if ksize is called with a NULL parameter 2007-10-16 09:42:53 -07:00
vmalloc.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmscan.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmstat.c oom: change all_unreclaimable zone member to flags 2007-10-17 08:42:45 -07:00