linux

Commit Graph

Author	SHA1	Message	Date
Christoph Lameter	b263295dbf	x86: 64-bit, make sparsemem vmemmap the only memory model Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:47 +01:00
Yinghai Lu	3f6e5a12b8	x86: use core id bits for apicid_to_node initialization We shoud use core id bits instead of max cores, in case later with AMD downcores Quad core Opteron. [ tglx: arch/x86 adaptation ] Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:39 +01:00
Thomas Gleixner	7462894a7c	x86: fixup numa 64 namespace Using a variable name, which is the same as a macro name is not really smart. Change the variable names and fixup all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:38 +01:00
Thomas Gleixner	e3cfe529dd	x86: cleanup numa_64.c Clean it up before applying more patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:37 +01:00
Thomas Gleixner	0b9c99b6f2	x86: cleanup tlbflush.h variants Bring the tlbflush.h variants into sync to prepare merging and paravirt support. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:35 +01:00
Thomas Gleixner	3abf024d2a	x86: nuke a ton of unused exports Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:28 +01:00
Thomas Gleixner	4ec08da02f	x86: remove the duplicated arch/x86/ia32/mmap32.c Use mmap_32.c in arch/x86/mm instead Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:26 +01:00
Thomas Gleixner	f8eeae6821	x86: clean up arch/x86/mm/mmap_32/64.c White space and coding style clenaup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:25 +01:00
Thomas Gleixner	aaa64e04f9	x86: move numa related declarations More stuff shuffeled to the correct place Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:17 +01:00
Thomas Gleixner	718fc13b46	x86: move debug related declarations to kdebug.h Move them and fixup some users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:17 +01:00
Thomas Gleixner	c9ff03428f	x86: move k8 related declarations Move k8 related declarations to k8.h and fix numa_64.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-30 13:30:16 +01:00
Ingo Molnar	23be8c7ddf	x86: fix boot crash on HIGHMEM4G && SPARSEMEM Denys Fedoryshchenko reported a bootup crash when he upgraded his system from 3GB to 4GB RAM: http://lkml.org/lkml/2008/1/7/9 the bug is due to HIGHMEM4G && SPARSEMEM kernels making pfn_to_page() to return an invalid pointer when the pfn is in a memory hole. The 256 MB PCI aperture at the end of RAM was not mapped by sparsemem, and hence the pfn was not valid. But set_highmem_pages_init() iterated this range without checking the pfn's validity first. this bug was probably present in the sparsemem code ever since sparsemem has been introduced in v2.6.13. It was masked due to HIGHMEM64G using larger memory regions in sparsemem_32.h: #ifdef CONFIG_X86_PAE #define SECTION_SIZE_BITS 30 #define MAX_PHYSADDR_BITS 36 #define MAX_PHYSMEM_BITS 36 #else #define SECTION_SIZE_BITS 26 #define MAX_PHYSADDR_BITS 32 #define MAX_PHYSMEM_BITS 32 #endif which creates 1GB sparsemem regions instead of 64MB sparsemem regions. So in practice we only ever created true sparsemem holes on x86 with HIGHMEM4G - but that was rarely used by distros. ( btw., we could probably save 2MB of mem_map[]s on X86_PAE if we reduced the sparsemem region size to 256 MB. ) Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-15 16:44:37 +01:00
KAMEZAWA Hiroyuki	b6fd6ecb83	memory hotplug x86_64: fix section mismatch in init_memory_mapping() Changes __meminit to __init_refok. WARNING: vmlinux.o(.text+0x1d07c): Section mismatch: reference to .init.text:find_e820_area (between 'init_memory_mapping' and 'arch_add_memory') Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-29 09:24:54 -08:00
Linus Torvalds	a7e1e001f4	Merge branch 'v2.6.24-rc1-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'v2.6.24-rc1-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: fix a typo in the __lock_acquire comment sched: fix unconditional irq lock lockdep: fixup irq tracing	2007-11-03 12:42:52 -07:00
Adrian Bunk	fb8c177fe0	x86: mm/discontig_32.c: make code static node0_bdata and paddr_to_nid() can become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-30 00:22:22 +01:00
Linus Torvalds	6a22c57b8d	Revert "x86_64: allocate sparsemem memmap above 4G" This reverts commit `2e1c49db4c`. First off, testing in Fedora has shown it to cause boot failures, bisected down by Martin Ebourne, and reported by Dave Jobes. So the commit will likely be reverted in the 2.6.23 stable kernels. Secondly, in the 2.6.24 model, x86-64 has now grown support for SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the bug is not visible any more, it's become invisible due to the code just being irrelevant and no longer enabled on the only architecture that this ever affected. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Martin Ebourne <fedora@ebourne.me.uk> Cc: Zou Nan hai <nanhai.zou@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-29 14:05:37 -07:00
Peter Zijlstra	143a5d325d	lockdep: fixup irq tracing Ensure we fixup the IRQ state before we hit any locking code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-25 14:01:10 +02:00
Alexey Dobriyan	eec407c9ac	x86: fix bogus KERN_ALERT on oops fix this: printing eip: f881b9f3 pdpt = 0000000000003001 <1>pde = 000000000480a067 *pte = 0000000000000000 ^^^ [ mingo: added KERN_CONT as suggested by Pekka Enberg ] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-24 12:58:02 +02:00
Keshavamurthy, Anil S	a9c55b3ba8	Intel IOMMU: clflush_cache_range now takes size param Introduce the size param for clflush_cache_range(). Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:18 -07:00
Linus Torvalds	c00046c279	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits) fix do_sys_open() prototype sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake Documentation: Fix typo in SubmitChecklist. Typo: depricated -> deprecated Add missing profile=kvm option to Documentation/kernel-parameters.txt fix typo about TBI in e1000 comment proc.txt: Add /proc/stat field small documentation fixes Fix compiler warning in smount example program from sharedsubtree.txt docs/sysfs: add missing word to sysfs attribute explanation documentation/ext3: grammar fixes Documentation/java.txt: typo and grammar fixes Documentation/filesystems/vfs.txt: typo fix include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros trivial copy_data_pages() tidy up Fix typo in arch/x86/kernel/tsc_32.c file link fix for Pegasus USB net driver help remove unused return within void return function Typo fixes retrun -> return x86 hpet.h: remove broken links ...	2007-10-19 20:36:17 -07:00
Simon Arlott	676b1855de	spelling fixes: arch/x86_64/ Spelling fixes in arch/x86_64/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:25:36 +02:00
Simon Arlott	27b46d7661	spelling fixes: arch/i386/ Spelling fixes in arch/i386/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:13:56 +02:00
Linus Torvalds	60812a4a99	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86 * ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (33 commits) x86: convert cpuinfo_x86 array to a per_cpu array x86: introduce frame_pointer() and stack_pointer() x86 & generic: change to __builtin_prefetch() i386: do not BUG_ON() when MSR is unknown x86: acpi use cpu_physical_id x86: convert cpu_llc_id to be a per cpu variable x86: convert cpu_to_apicid to be a per cpu variable i386: introduce "used_vectors" bitmap which can be used to reserve vectors. x86: use raw locks during oopses x86: honor _PAGE_PSE bit on page walks i386: do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE. x86: implement missing x86_64 function smp_call_function_mask() x86: use descriptor's functions instead of inline assembly i386: consolidate show_regs and show_registers for i386 i386: make callgraph use dump_trace() on i386/x86_64 x86: enable iommu_merge by default i386: i386 add AMD64 Barcelona PMU MSR definitions to msr.h x86: Unify i386 and x86-64 early quirks x86: enable HPET on ICH3 and ICH4 x86: force enable HPET on VT8235/8237 chipsets ... Manually fix trivial conflict with task pid container helper changes in arch/x86/kernel/process_32.c	2007-10-19 15:06:00 -07:00
Linus Torvalds	26790656d7	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86 * ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: x86: fix global_flush_tlb() bug	2007-10-19 11:57:05 -07:00
Alexey Dobriyan	19c5870c0e	Use helpers to obtain task pid in printks (arch code) One of the easiest things to isolate is the pid printed in kernel log. There was a patch, that made this for arch-independent code, this one makes so for arch/xxx files. It took some time to cross-compile it, but hopefully these are all the printks in arch code. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:43 -07:00
Serge E. Hallyn	b460cbc581	pid namespaces: define is_global_init() and is_container_init() is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Mike Travis	71fff5e6ca	x86: convert cpu_to_apicid to be a per cpu variable This patch converts the x86_cpu_to_apicid array to be a per cpu variable. This saves sizeof(apicid) * NR unused cpus. Access is mostly from startup and CPU HOTPLUG functions. MP_processor_info() is one of the functions that require access to the x86_cpu_to_apicid array before the per_cpu data area is setup. For this case, a pointer to the __initdata array is initialized in setup_arch() and removed in smp_prepare_cpus() after the per_cpu data area is initialized. A second change is included to change the initial array value of ARCH i386 from 0xff to BAD_APICID to be consistent with ARCH x86_64. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:03 +02:00
Jan Beulich	b1992df3f0	x86: honor _PAGE_PSE bit on page walks [ tglx: arch/x86 adaptation ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:03 +02:00
Andi Kleen	b3f5dde15e	x86: don't zero pad addresses in segfault message don't zero pad addresses in segfault message. Matches the other trap messages. This leaves some more space for the new file name. [ mingo: arch/x86 adaptation ] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:02 +02:00
Andi Kleen	af93ebc0b3	x86: remove page_fault_trace Old debugging code that is not really needed anymore. If someone wants it it would be better replaced with a systemtap script or kprobe. This avoids a potential cache miss during page fault processing. [ mingo: arch/x86 adaptation ] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:02 +02:00
Ingo Molnar	9a24d04a3c	x86: fix global_flush_tlb() bug While we were reviewing pageattr_32/64.c for unification, Thomas Gleixner noticed the following serious SMP bug in global_flush_tlb(): down_read(&init_mm.mmap_sem); list_replace_init(&deferred_pages, &l); up_read(&init_mm.mmap_sem); this is SMP-unsafe because list_replace_init() done on two CPUs in parallel can corrupt the list. This bug has been introduced about a year ago in the 64-bit tree: commit `ea7322decb` Author: Andi Kleen <ak@suse.de> Date: Thu Dec 7 02:14:05 2006 +0100 [PATCH] x86-64: Speed and clean up cache flushing in change_page_attr down_read(&init_mm.mmap_sem); - dpage = xchg(&deferred_pages, NULL); + list_replace_init(&deferred_pages, &l); up_read(&init_mm.mmap_sem); the xchg() based version was SMP-safe, but list_replace_init() is not. So this "cleanup" introduced a nasty bug. why this bug never become prominent is a mystery - it can probably be explained with the (still) relative obscurity of the x86_64 architecture. the safe fix for now is to write-lock init_mm.mmap_sem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 12:19:26 +02:00
Linus Torvalds	d20ead9e86	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86 * ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (114 commits) x86: delete vsyscall files during make clean kbuild: fix typo SRCARCH in find_sources x86: fix kernel rebuild due to vsyscall fallout .gitignore update for x86 arch x86: unify include/asm/debugreg_32/64.h x86: unify include/asm/unwind_32/64.h x86: unify include/asm/types_32/64.h x86: unify include/asm/tlb_32/64.h x86: unify include/asm/siginfo_32/64.h x86: unify include/asm/bug_32/64.h x86: unify include/asm/mman_32/64.h x86: unify include/asm/agp_32/64.h x86: unify include/asm/kdebug_32/64.h x86: unify include/asm/ioctls_32/64.h x86: unify include/asm/floppy_32/64.h x86: apply missing DMA/OOM prevention to floppy_32.h x86: unify include/asm/cache_32/64.h x86: unify include/asm/cache_32/64.h x86: unify include/asm/dmi_32/64.h x86: unify include/asm/delay_32/64.h ...	2007-10-17 13:13:16 -07:00
Luiz Fernando N. Capitulino	de8aacbe6a	x86: convert mm_context_t semaphore to a mutex convert mm_context_t semaphore to a mutex. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:17:05 +02:00
Pavel Emelyanov	9aa8d7195a	i386: clean up oops/bug reports Typically the oops first lines look like this: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c049dfbd pde = 00000000 Oops: 0002 [#1] PREEMPT SMP ... Such output is gained with some ugly if (!nl) printk("\n"); code and besides being a waste of lines, this is also annoying to read. The following output looks better (and it is how it looks on x86_64): BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c049dfbd pde = 00000000 Oops: 0002 [#1] PREEMPT SMP ... [ tglx: arch/x86 adaptation ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:16:52 +02:00
Mike Travis	98c9e27a56	x86: fix cpu_to_node references In x86_64 and i386 architectures most arrays that are sized using NR_CPUS lay in local memory on node 0. Not only will most (99%?) of the systems not use all the slots in these arrays, particularly when NR_CPUS is increased to accommodate future very high cpu count systems, but a number of cache lines are passed unnecessarily on the system bus when these arrays are referenced by cpus on other nodes. Typically, the values in these arrays are referenced by the cpu accessing it's own values, though when passing IPI interrupts, the cpu does access the data relevant to the targeted cpu/node. Of course, if the referencing cpu is not on node 0, then the reference will still require cross node exchanges of cache lines. A common use of this is for an interrupt service routine to pass the interrupt to other cpus local to that node. Ideally, all the elements in these arrays should be moved to the per_cpu data area. In some cases (such as x86_cpu_to_apicid) the array is referenced before the per_cpu data areas are setup. In this case, a static array is declared in the __initdata area and initialized by the booting cpu (BSP). The values are then moved to the per_cpu area after it is initialized and the original static array is freed with the rest of the __initdata. This patch: Fix four instances where cpu_to_node is referenced by array instead of via the cpu_to_node macro. This is preparation to moving it to the per_cpu data area. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:16:37 +02:00
H. Peter Anvin	6619a8fb59	x86: Create clflush() inline, remove hardcoded wbinvd Create an inline function for clflush(), with the proper arguments, and use it instead of hard-coding the instruction. This also removes one instance of hard-coded wbinvd, based on a patch by Bauder de Oliveira Costa. [ tglx: arch/x86 adaptation ] Cc: Andi Kleen <andi@firstfloor.org> Cc: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:16:12 +02:00
Adrian Bunk	59659f14b6	i386: make some variables static This patch makes some needlessly global variables static. [ tglx: arch/x86 adaptation ] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:15:56 +02:00
Yoann Padioleau	83e83d546c	x86: 0 -> NULL, for arch/x86_64 When comparing a pointer, it's clearer to compare it to NULL than to 0. [ tglx: arch/x86 adaptation ] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Signed-off-by: Andi Kleen <ak@suse.de> Cc: ak@suse.de Cc: discuss@x86-64.org Cc: akpm@linux-foundation.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:15:52 +02:00
Huang, Ying	84e0fdb175	x86: NX bit handling in change_page_attr() This patch fixes a bug of change_page_attr/change_page_attr_addr on Intel x86_64 CPUs. After changing page attribute to be executable with these functions, the page remains un-executable on Intel x86_64 CPU. Because on Intel x86_64 CPU, only if the "NX" bits of all four level page tables are cleared, the corresponding page is executable (refer to section 4.13.2 of Intel 64 and IA-32 Architectures Software Developer's Manual). So, the bug is fixed through clearing the "NX" bit of PMD when splitting the huge PMD. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:15:43 +02:00
Prarit Bhargava	27eb0b288f	x86: stop nmi softlockup warnings in show_mem() When dumping memory via sysrq-m it is possible to take a bogus NMI watchdog or softlockup watchdog because the dump can take a long time on big memory systems. Occasionally tickle the watchdog when doing the dump. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 20:15:41 +02:00
Ingo Molnar	509a80c49c	x86: fix CONFIG_PAGEALLOC related boot hangs/OOMs if CONFIG_PAGEALLOC is enabled then X86_FEATURE_PSE is disabled and all the kernel physical RAM pagetables are set up as 4K pages. This is needed so that CONFIG_PAGEALLOC can do finegrained mapping and unmapping of pages. as a side-effect though, the total size of memory allocated as kernel pagetables increases significantly. All these pagetables are allocated via alloc_bootmem_low_pages(), straight out of the lowmem DMA pool. If the system has enough RAM and a large kernel image then almost all of the 16 MB lowmem DMA pool is allocated to the image and to pagetables - leaving no space for __GFP_DMA allocations. this results in drivers failing and the bootup hanging: swapper invoked oom-killer: gfp_mask=0x80d1, order=0, oomkilladj=0 [<4015059f>] out_of_memory+0x17f/0x1c0 [<40151f3c>] __alloc_pages+0x37c/0x3a0 [<40168cd7>] slob_new_page+0x37/0x50 [<40168dff>] slob_alloc+0x10f/0x190 [<40169010>] __kmalloc_node+0x80/0x90 [<405a17e3>] scsi_host_alloc+0x33/0x2c0 [<405a1a82>] scsi_register+0x12/0x60 [<40d5889e>] aha1542_detect+0x9e/0x940 [<405c5ba5>] ultrastor_detect+0x265/0x5f0 [<401352f5>] getnstimeofday+0x35/0xf0 [<40d58751>] init_this_scsi_driver+0x41/0xf0 [<40d0b856>] kernel_init+0x136/0x310 [<40d58710>] init_this_scsi_driver+0x0/0xf0 [<40d0b720>] kernel_init+0x0/0x310 [<40105547>] kernel_thread_helper+0x7/0x10 ======================= the fix is to first allocate from above the DMA pool, and if that fails (for example due to it being a machine with less than 16 MB of RAM), allocate from the DMA pool as a fallback. With this fix applied i was able to boot a PAGEALLOC=y kernel that would hang before. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:15:39 +02:00
Jan Beulich	aa506dc7b1	i386: avoid temporarily inconsistent pte-s One more of these issues (which were considered fixed a few releases back): other than on x86-64, i386 allows set_fixmap() to replace already present mappings. Consequently, on PAE, care must be taken to not update the high half of a pte while the low half is still holding the old value. [ tglx: arch/x86 adaptation ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> arch/x86/mm/pgtable_32.c \| 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)	2007-10-17 20:15:28 +02:00
Linus Torvalds	fb9fc39517	Merge branch 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'xen-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xfs: eagerly remove vmap mappings to avoid upsetting Xen xen: add some debug output for failed multicalls xen: fix incorrect vcpu_register_vcpu_info hypercall argument xen: ask the hypervisor how much space it needs reserved xen: lock pte pages while pinning/unpinning xen: deal with stale cr3 values when unpinning pagetables xen: add batch completion callbacks xen: yield to IPI target if necessary Clean up duplicate includes in arch/i386/xen/ remove dead code in pgtable_cache_init paravirt: clean up lazy mode handling paravirt: refactor struct paravirt_ops into smaller pv_*_ops	2007-10-17 11:10:11 -07:00
Linus Torvalds	5c8e191e84	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup: Remove magic macros for screen_info structure members [x86] remove uses of magic macros for boot_params access	2007-10-17 09:00:30 -07:00
Christoph Lameter	4ba9b9d0ba	Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void object, struct kmem_cache s, unsigned long flags) to ctor(struct kmem_cache s, void object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:45 -07:00
H. Peter Anvin	30c826451d	[x86] remove uses of magic macros for boot_params access Instead of using magic macros for boot_params access, simply use the boot_params structure. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2007-10-16 17:38:31 -07:00
Jeremy Fitzhardinge	4f81784774	remove dead code in pgtable_cache_init The conversion from using a slab cache to quicklist left some residual dead code. I note that in the conversion it now always allocates a whole page for the pgd, rather than the 32 bytes needed for a PAE pgd. Was this intended? Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>	2007-10-16 11:51:29 -07:00
KAMEZAWA Hiroyuki	48e94196a5	fix memory hot remove not configured case. Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess. This patch cleans up them. This is against 2.6.23-rc6-mm1. - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case. - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(), which returns -EINVAL. - removed remove_pages() only used in powerpc. - removed no-op remove_memory() in i386, sh, sparc64, x86_64. - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it to return -EINVAL. Note: Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other archs if there are requirements and testers. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:43:02 -07:00
Will Schmidt	dcca2bde4f	During VM oom condition, kill all threads in process group We have had complaints where a threaded application is left in a bad state after one of it's threads is killed when we hit a VM: out_of_memory condition. Killing just one of the process threads can leave the application in a bad state, whereas killing the entire process group would allow for the application to restart, or be otherwise handled, and makes it very obvious that something has gone wrong. This change allows the entire process group to be taken down, rather than just the one thread. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Christoph Lameter	0889eba5b3	x86_64: SPARSEMEM_VMEMMAP 2M page size support x86_64 uses 2M page table entries to map its 1-1 kernel space. We also implement the virtual memmap using 2M page table entries. So there is no additional runtime overhead over FLATMEM, initialisation is slightly more complex. As FLATMEM still references memory to obtain the mem_map pointer and SPARSEMEM_VMEMMAP uses a compile time constant, SPARSEMEM_VMEMMAP should be superior. With this SPARSEMEM becomes the most efficient way of handling virt_to_page, pfn_to_page and friends for UP, SMP and NUMA on x86_64. [apw@shadowen.org: code resplit, style fixups] [apw@shadowen.org: vmemmap x86_64: ensure end of section memmap is initialised] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:51 -07:00

1 2

53 Commits (2c0b8a7578f7653e1e5312a5232e8ead563cf477)