linux/mm
Mel Gorman de74f1cc3b mm: have order > 0 compaction start near a pageblock with free pages
Commit 7db8889ab0 ("mm: have order > 0 compaction start off where it
left") introduced a caching mechanism to reduce the amount work the free
page scanner does in compaction.  However, it has a problem.  Consider
two process simultaneously scanning free pages

					    			C
	Process A		M     S     			F
			|---------------------------------------|
	Process B		M 	FS

	C is zone->compact_cached_free_pfn
	S is cc->start_pfree_pfn
	M is cc->migrate_pfn
	F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to "end_of_zone - one_pageblock" and runs this check

                if (cc->order > 0 && (!cc->wrapped ||
                                      zone->compact_cached_free_pfn >
                                      cc->start_free_pfn))
                        pfn = min(pfn, zone->compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner
skips almost the entire space it should have scanned.  When there are
multiple processes compacting it can end in a situation where the entire
zone is not being scanned at all.  Further, it is possible for two
processes to ping-pong update to compact_cached_free_pfn which is just
random.

Overall, the end result wrecks allocation success rates.

There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.

First, it gets rid of the skip logic because it's not clear that it
matters if two free scanners happen to be in the same block but with
racing updates it's too easy for it to skip over blocks it should not.

Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.

If a scanner has wrapped, it updates compact_cached_free_pfn to the end
	of the zone. When a wrapped scanner isolates a page, it updates
	compact_cached_free_pfn to point to the highest pageblock it
	can isolate pages from.

If a scanner has not wrapped when it has finished isolated pages it
	checks if compact_cached_free_pfn is pointing to the end of the
	zone. If so, the value is updated to point to the highest
	pageblock that pages were isolated from. This value will not
	be updated again until a free page scanner wraps and resets
	compact_cached_free_pfn.

This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-21 16:45:03 -07:00
..
Kconfig mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
Makefile mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
backing-dev.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
bootmem.c bootmem: make ___alloc_bootmem_node_nopanic() really nopanic 2012-07-17 16:21:29 -07:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: have order > 0 compaction start near a pageblock with free pages 2012-08-21 16:45:03 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c mm, fadvise: don't return -EINVAL when filesystem cannot implement fadvise() 2012-07-31 18:42:42 -07:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c fs: Protect write paths by sb_start_write - sb_end_write 2012-07-31 09:45:47 +04:00
filemap_xip.c fs: Protect write paths by sb_start_write - sb_end_write 2012-07-31 09:45:47 +04:00
fremap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
frontswap.c mm/frontswap: cleanup doc and comment error 2012-07-23 11:16:20 -04:00
highmem.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
huge_memory.c mm/memcg: apply add/del_page to lruvec 2012-05-29 16:22:28 -07:00
hugetlb.c mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables 2012-07-31 18:42:50 -07:00
hugetlb_cgroup.c hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate 2012-07-31 18:42:41 -07:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h netvm: allow skb allocation to use PFMEMALLOC reserves 2012-07-31 18:42:46 -07:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: Hold a file reference in madvise_remove 2012-07-06 10:34:38 -07:00
memblock.c mm/memblock.c:memblock_double_array(): cosmetic cleanups 2012-07-31 18:42:41 -07:00
memcontrol.c memcg: add mem_cgroup_from_css() helper 2012-07-31 18:42:49 -07:00
memory-failure.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
memory.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
memory_hotplug.c mm/hotplug: free zone->pageset when a zone becomes empty 2012-07-31 18:42:44 -07:00
mempolicy.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-07-30 11:32:24 -07:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: memcg: fix compaction/migration failing due to memcg limits 2012-07-31 18:42:48 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c mm: change nr_ptes BUG_ON to WARN_ON 2012-08-21 16:45:02 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: mmu_notifier: fix freed page still mapped in secondary MMU 2012-07-31 18:42:49 -07:00
mmzone.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: account the total_vm in the vm_stat_account() 2012-07-31 18:42:39 -07:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
nommu.c nommu: fix compilation of nommu.c 2012-06-04 17:17:31 -04:00
oom_kill.c mm, memcg: move all oom handling to memcontrol.c 2012-07-31 18:42:45 -07:00
page-writeback.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
page_alloc.c mm: correct page->pfmemalloc to fix deactivate_slab regression 2012-08-21 16:45:03 -07:00
page_cgroup.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c memory-hotplug: fix kswapd looping forever problem 2012-07-31 18:42:45 -07:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP 2012-05-09 10:13:29 -07:00
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm: move readahead syscall to mm/readahead.c 2012-05-29 16:22:23 -07:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c tmpfs: distribute interleave better across nodes 2012-07-31 18:42:50 -07:00
slab.c mm: micro-optimise slab to avoid a function call 2012-07-31 18:42:46 -07:00
slab.h mm, sl[aou]b: Use a common mutex definition 2012-07-09 12:13:41 +03:00
slab_common.c mm: Fix build warning in kmem_cache_create() 2012-07-30 13:15:40 +03:00
slob.c slob: Fix early boot kernel crash 2012-07-12 10:13:22 +03:00
slub.c mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks 2012-07-31 18:42:45 -07:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm/sparse: remove index_init_lock 2012-07-31 18:42:49 -07:00
swap.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swapfile.c mm: swapfile: clean up unuse_pte race handling 2012-07-31 18:42:48 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c new helper: vm_mmap_pgoff() 2012-06-01 10:37:18 -04:00
vmalloc.c mm: make vb_alloc() more foolproof 2012-07-31 18:42:39 -07:00
vmscan.c memcg: gix memory accounting scalability in shrink_page_list 2012-07-31 18:42:49 -07:00
vmstat.c mm: account for the number of times direct reclaimers get throttled 2012-07-31 18:42:46 -07:00