linux/mm
David Rientjes 3a5dda7a17 oom: prevent unnecessary oom kills or kernel panics
This patch prevents unnecessary oom kills or kernel panics by reverting
two commits:

	495789a5 (oom: make oom_score to per-process value)
	cef1d352 (oom: multi threaded process coredump don't make deadlock)

First, 495789a5 (oom: make oom_score to per-process value) ignores the
fact that all threads in a thread group do not necessarily exit at the
same time.

It is imperative that select_bad_process() detect threads that are in the
exit path, specifically those with PF_EXITING set, to prevent needlessly
killing additional tasks.  If a process is oom killed and the thread group
leader exits, select_bad_process() cannot detect the other threads that
are PF_EXITING by iterating over only processes.  Thus, it currently
chooses another task unnecessarily for oom kill or panics the machine when
nothing else is eligible.

By iterating over threads instead, it is possible to detect threads that
are exiting and nominate them for oom kill so they get access to memory
reserves.

Second, cef1d352 (oom: multi threaded process coredump don't make
deadlock) erroneously avoids making the oom killer a no-op when an
eligible thread other than current isfound to be exiting.  We want to
detect this situation so that we may allow that exiting thread time to
exit and free its memory; if it is able to exit on its own, that should
free memory so current is no loner oom.  If it is not able to exit on its
own, the oom killer will nominate it for oom kill which, in this case,
only means it will get access to memory reserves.

Without this change, it is easy for the oom killer to unnecessarily target
tasks when all threads of a victim don't exit before the thread group
leader or, in the worst case, panic the machine.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: <stable@kernel.org>		[2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 17:43:58 -07:00
..
Kconfig mm: compaction: don't depend on HUGETLB_PAGE 2011-01-26 10:50:02 +10:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
Makefile bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c 2011-02-24 14:43:05 +01:00
backing-dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
bootmem.c bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c 2011-02-24 14:43:06 +01:00
bounce.c bounce: call flush_dcache_page() after bounce_copy_vec() 2010-09-09 18:57:25 -07:00
compaction.c mm: compaction: prevent division-by-zero during user-requested compaction 2011-01-20 17:02:05 -08:00
debug-pagealloc.c
dmapool.c mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc() 2011-01-13 17:32:48 -08:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap.c mm: remove likely() from grab_cache_page_write_begin() 2011-01-13 17:32:36 -08:00
filemap_xip.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fremap.c Avoid pgoff overflow in remap_file_pages 2010-09-25 09:34:58 -07:00
highmem.c mm,x86: fix kmap_atomic_push vs ioremap_32.c 2010-10-27 18:03:05 -07:00
huge_memory.c thp+memcg-numa: fix BUG at include/linux/mm.h:370! 2011-03-14 08:29:50 -07:00
hugetlb.c hugetlb: fix handling of parse errors in sysfs 2011-01-13 17:32:49 -08:00
hwpoison-inject.c HWPOISON, hugetlb: support hwpoison injection for hugepage 2010-08-11 09:23:11 +02:00
init-mm.c mm: provide init_mm mm_context initializer 2010-08-09 20:44:54 -07:00
internal.h mm: export __get_user_pages 2011-03-17 13:08:27 -03:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: Allow kmemleak metadata allocations to fail 2011-01-27 18:32:06 +00:00
ksm.c ksm: drain pagevecs to lru 2011-01-13 17:32:49 -08:00
maccess.c MN10300: Save frame pointer in thread_info struct rather than global var 2010-10-27 17:29:01 +01:00
madvise.c thp: khugepaged: make khugepaged aware about madvise 2011-01-13 17:32:47 -08:00
memblock.c memblock: don't adjust size in memblock_find_base() 2011-02-11 16:12:20 -08:00
memcontrol.c memcg: fix event counting breakage from recent THP update 2011-02-02 16:03:19 -08:00
memory-failure.c mm: remove is_hwpoison_address 2011-03-17 13:08:27 -03:00
memory.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-03-18 10:37:40 -07:00
memory_hotplug.c Merge branch 'slub/hotplug' into slab/urgent 2011-01-15 13:28:17 +02:00
mempolicy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-03-18 10:37:40 -07:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: grab rcu read lock in move_pages() 2011-02-25 15:07:36 -08:00
mincore.c thp: mincore transparent hugepage support 2011-01-13 17:32:44 -08:00
mlock.c mlock: operate on any regions with protection != PROT_NONE 2011-02-02 10:20:50 +11:00
mm_init.c
mmap.c brk: fix min_brk lower bound computation for COMPAT_BRK 2011-01-13 17:32:48 -08:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c thp: mmu_notifier_test_young 2011-01-13 17:32:46 -08:00
mmzone.c mm: page allocator: adjust the per-cpu counter threshold when memory is low 2011-01-13 17:32:31 -08:00
mprotect.c thp: mprotect: transparent huge page support 2011-01-13 17:32:44 -08:00
mremap.c mm: fix possible cause of a page_mapped BUG 2011-02-23 21:55:06 -08:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nobootmem.c bootmem: Move __alloc_memory_core_early() to nobootmem.c 2011-02-24 14:43:06 +01:00
nommu.c mlock: do not hold mmap_sem for extended periods of time 2011-01-13 17:32:36 -08:00
oom_kill.c oom: prevent unnecessary oom kills or kernel panics 2011-03-22 17:43:58 -07:00
page-writeback.c writeback: avoid unnecessary determine_dirtyable_memory call 2011-01-13 17:32:38 -08:00
page_alloc.c mm: PageBuddy and mapcount robustness 2011-03-17 16:31:13 -07:00
page_cgroup.c kmemleak: Annotate false positive in init_section_page_cgroup() 2010-07-19 11:54:14 +01:00
page_io.c block: unify flags for struct bio and struct request 2010-08-07 18:20:39 +02:00
page_isolation.c mm: page_isolation: codeclean fix comment and rm unneeded val init 2010-10-26 16:52:11 -07:00
pagewalk.c thp: split_huge_page_mm/vma 2011-01-13 17:32:41 -08:00
percpu-km.c percpu: clear memory allocated with the km allocator 2010-10-02 10:28:42 +03:00
percpu-vm.c mm: remove gfp mask from pcpu_get_vm_areas 2011-01-13 17:32:34 -08:00
percpu.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
pgtable-generic.c mm/pgtable-generic.c: fix CONFIG_SWAP=n build 2011-01-26 10:49:58 +10:00
prio_tree.c
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead.c: fix comment 2010-05-25 08:07:00 -07:00
rmap.c thp: fix page_referenced to modify mapcount/vm_flags only if page is found 2011-03-13 15:35:57 -07:00
shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-03-18 10:37:40 -07:00
slab.c Merge branch 'slab/urgent' into slab/next 2011-03-11 18:11:19 +02:00
slob.c mm: Remove support for kmem_cache_name() 2011-01-23 21:00:05 +02:00
slub.c slub: Add statistics for this_cmpxchg_double failures 2011-03-22 20:48:04 +02:00
sparse-vmemmap.c tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
sparse.c thp: remove PG_buddy 2011-01-13 17:32:43 -08:00
swap.c Revert "mm: simplify code of swap.c" 2011-01-17 14:42:34 -08:00
swap_state.c thp: split_huge_page paging 2011-01-13 17:32:41 -08:00
swapfile.c mm: swap: unlock swapfile inode mutex before closing file on bad swapfiles 2011-03-22 17:43:58 -07:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c memcg: more mem_cgroup_uncharge() batching 2011-02-25 15:07:37 -08:00
util.c kernel: kmem_ptr_validate considered harmful 2011-01-07 17:50:16 +11:00
vmalloc.c Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2011-01-13 20:15:35 -08:00
vmscan.c mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT 2011-02-25 15:07:36 -08:00
vmstat.c thp: transparent hugepage vmstat 2011-01-13 17:32:43 -08:00