linux/arch
Shaohua Li 70e4a36973 x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
with a maximum of 32 vectors.

We currently only have 8 vectors for TLB invalidation and that is clearly
inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
tlbstate_lock is used to protect them. flush_tlb_page() is
heavily used in page reclaim, which will cause a lot of lock
contention for tlbstate_lock.

Andi Kleen suggested increasing the vectors number to 32, which should be
good for current typical systems to reduce the tlbstate_lock contention.

My test system has 4 sockets and 64G memory, and 64 CPUs. My
workload creates 64 processes. Each process mmap reads a big
empty sparse file. The total size of the files are 2*total_mem,
so this will cause a lot of page reclaim.

Below is the result I get from perf call-graph profiling:

 without the patch:
 ------------------

    24.25%           usemem  [kernel]                                   [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |
                        |--42.15%-- native_flush_tlb_others

 with the patch:
 ------------------

    14.96%           usemem  [kernel]                                   [k] _raw_spin_lock
                     |
                     --- _raw_spin_lock
                        |--13.89%-- native_flush_tlb_others

So this heavily reduces the tlbstate_lock contention.

Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 13:03:08 +01:00
..
alpha alpha: Use generic irq Kconfig 2011-01-21 11:55:31 +01:00
arm ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.h 2011-02-11 10:25:57 +09:00
avr32 avr32: add missing include causing undefined pgtable_page_* references 2011-01-26 12:35:15 +01:00
blackfin serial: bfin_5xx: split uart RX lock from uart port lock to avoid deadlock 2011-02-03 14:44:54 -08:00
cris cris: Use generic irq Kconfig 2011-01-21 11:55:25 +01:00
frv frv: Use generic irq Kconfig 2011-01-21 11:55:32 +01:00
h8300 h8300: Use generic irq Kconfig 2011-01-21 11:55:24 +01:00
ia64 ia64: Use generic irq Kconfig 2011-01-21 11:55:32 +01:00
m32r m32r: Fixup last __do_IRQ leftover 2011-02-05 21:46:35 +01:00
m68k m68k/amiga: Fix "debug=mem" 2011-01-23 11:24:42 +01:00
m68knommu m68knommu: Use generic irq Kconfig 2011-01-21 11:55:32 +01:00
microblaze microblaze: Fix msr instruction detection 2011-02-07 19:13:01 +01:00
mips genirq: Remove __do_IRQ 2011-01-21 11:55:31 +01:00
mn10300 mn10300: Use generic irq Kconfig 2011-01-21 11:55:33 +01:00
parisc console: rename acquire/release_console_sem() to console_lock/unlock() 2011-01-26 10:50:06 +10:00
powerpc powerpc: Fix hcall tracepoint recursion 2011-02-07 13:06:08 +11:00
s390 [S390] reset default for CONFIG_CHSC_SCH 2011-01-31 11:30:21 +01:00
score score: Use generic irq Kconfig 2011-01-21 11:55:34 +01:00
sh Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 2011-01-26 09:00:17 +10:00
sparc sparc: Use generic irq Kconfig 2011-01-21 11:55:34 +01:00
tile tile: Use generic irq Kconfig 2011-01-21 11:55:34 +01:00
um um: Use generic irq Kconfig 2011-01-21 11:55:35 +01:00
x86 x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32 2011-02-14 13:03:08 +01:00
xtensa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
.gitignore
Kconfig [S390] mutex: Introduce arch_mutex_cpu_relax() 2011-01-05 12:47:31 +01:00