Commit graph

281 commits

Author SHA1 Message Date
Ingo Molnar
30cd324e97 Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core
Conflicts:
	include/linux/ftrace.h
2008-12-19 09:42:40 +01:00
Hiroshi Shimamoto
9f22149599 x86: ia32.h: remove unused struct sigfram32 and rt_sigframe32
Impact: cleanup

Remove struct sigfram32 and rt_sigframe32 because there is no user.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 15:01:25 -08:00
Hiroshi Shimamoto
b2fa739c06 x86: sigframe.h: include headers for dependency
Impact: cleanup

Include following headers for dependency.
asm/sigcontext.h
asm/siginfo.h
asm/ucontext.h

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 15:01:22 -08:00
Jaswinder Singh
d1769d5475 x86: traps.c declare functions before they get used
Impact: cleanup

 In asm/traps.h :-
 do_double_fault : added under X86_64
 sync_regs : added under X86_64
 math_error : moved out from X86_32 as it is common for both 32 and 64 bit
 math_emulate : moved from X86_32 as it is common for both 32 and 64 bit
 smp_thermal_interrupt : added under X86_64
 mce_threshold_interrupt : added under X86_64

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 22:33:13 +01:00
venkatesh.pallipadi@intel.com
2520bd3123 x86: PAT: add pgprot_writecombine() interface for drivers - v3
Impact: New mm functionality.

Add pgprot_writecombine. pgprot_writecombine will be aliased to
pgprot_noncached when not supported by the architecture.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 13:30:16 -08:00
venkatesh.pallipadi@intel.com
8a7b12f70f x86: PAT: change pgprot_noncached to uc_minus instead of strong uc - v3
Impact: mm behavior change.

Make pgprot_noncached uc_minus instead of strong UC. This will make
pgprot_noncached to be in line with ioremap_nocache() and all the other
APIs that map page uc_minus on uc request.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 13:30:16 -08:00
venkatesh.pallipadi@intel.com
5899329b19 x86: PAT: implement track/untrack of pfnmap regions for x86 - v3
Impact: New mm functionality.

Hookup remap_pfn_range and vm_insert_pfn and corresponding copy and free
routines with reserve and free tracking.

reserve and free here only takes care of non RAM region mapping. For RAM
region, driver should use set_memory_[uc|wc|wb] to set the cache type and
then setup the mapping for user pte. We can bypass below
reserve/free in that case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 13:30:16 -08:00
Hiroshi Shimamoto
5c2628e8b4 x86: sigframe.h: add guard macro
Impact: cleanup

Add missing guard macro _ASM_X86_SIGFRAME_H.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 22:04:13 +01:00
Jaswinder Singh
7c9c160c54 x86: tls.c declare sys_set_thread_area and sys_get_thread_area before they get used
Impact: cleanup

In asm/syscalls.h move out sys_set_thread_area() and sys_get_thread_area()
as they are common for both 32 and 64 bit.

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 13:33:48 +01:00
Mike Travis
a775a38b13 x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
Impact: fix potential APIC crash

In determining the destination apicid, there are usually three cpumasks
that are considered: the incoming cpumask arg, cfg->domain and the
cpu_online_mask.  Since we are just introducing the cpu_mask_to_apicid_and
function, make sure it includes the cpu_online_mask in it's evaluation.
[Added with this patch.]

There are two io_apic.c functions that did not previously use the
cpu_online_mask:  setup_IO_APIC_irq and msi_compose_msg.  Both of these
simply used cpu_mask_to_apicid(cfg->domain & TARGET_CPUS), and all but
one arch (NUMAQ[*]) returns only online cpus in the TARGET_CPUS mask,
so the behavior is identical for all cases.

[*: NUMAQ bug?]

Note that alloc_cpumask_var is only used for the 32-bit cases where
it's highly likely that the cpumask set size will be small and therefore
CPUMASK_OFFSTACK=n.  But if that's not the case, failing the allocate
will cause the same return value as the default.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 11:59:24 +01:00
Hiroshi Shimamoto
c85c2ff877 x86: signal: prepare to include from ia32_signal.c
Impact: cleanup, prepare to use from ia32_signal.c

Make struct sigframe_ia32 and rt_sigframe_ia32 visible to ia32_signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 11:28:55 +01:00
Hiroshi Shimamoto
41af86fad3 x86: signal: move sigframe.h to arch/x86/include/asm
Impact: cleanup, move header file

Move arch/x86/kernel/sigframe.h to arch/x86/include/asm/sigframe.h.
It will be used in arch/x86/ia32/ia32_signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 11:28:54 +01:00
Jeremy Fitzhardinge
cfb80c9eae x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
swiotlb on 32 bit will be used by Xen domain 0 support.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 18:58:19 +01:00
Ingo Molnar
855caa37b9 Merge branch 'x86/crashdump' into cpus4096
Conflicts:
	arch/x86/kernel/crash.c

Merged for semantic conflict:
	arch/x86/kernel/reboot.c
2008-12-17 13:24:52 +01:00
Ingo Molnar
9466d6036f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/travis/linux-2.6-cpus4096-for-ingo into cpus4096 2008-12-17 13:08:34 +01:00
Ingo Molnar
1f3f424a6b Merge branch 'linus' into cpus4096 2008-12-17 13:07:48 +01:00
Mike Travis
83b19597f7 x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
Impact: new API

The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:59 -08:00
Mike Travis
bcda016edd x86: cosmetic changes apic-related files.
This patch simply changes cpumask_t to struct cpumask and similar
trivial modernizations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:57 -08:00
Mike Travis
d7b381bb7b x86: fixup_irqs() doesnt need an argument.
Impact: cleanup, remove on-stack cpumask.

The "map" arg is always cpu_online_mask.  Importantly, set_affinity
always ands the argument with cpu_online_mask anyway, so we don't need
to do it in fixup_irqs(), avoiding a temporary.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:57 -08:00
Mike Travis
6eeb7c5a99 x86: update add-cpu_mask_to_apicid_and to use struct cpumask*
Impact: use updated APIs

Various API updates for x86:add-cpu_mask_to_apicid_and

(Note: separate because previous patch has been "backported" to 2.6.27.)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:56 -08:00
Mike Travis
95d313cf1c x86: Add cpu_mask_to_apicid_and
Impact: new API

Add a helper function that takes two cpumask's, and's them and then
returns the apicid of the result.  This removes a need in io_apic.c
that uses a temporary cpumask to hold (mask & cfg->domain).

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-16 17:40:56 -08:00
Mike Travis
e7986739a7 x86 smp: modify send_IPI_mask interface to accept cpumask_t pointers
Impact: cleanup, change parameter passing

  * Change genapic interfaces to accept cpumask_t pointers where possible.

  * Modify external callers to use cpumask_t pointers in function calls.

  * Create new send_IPI_mask_allbutself which is the same as the
    send_IPI_mask functions but removes smp_processor_id() from list.
    This removes another common need for a temporary cpumask_t variable.

  * Functions that used a temp cpumask_t variable for:

	cpumask_t allbutme = cpu_online_map;

	cpu_clear(smp_processor_id(), allbutme);
	if (!cpus_empty(allbutme))
		...

    become:

	if (!cpus_equal(cpu_online_map, cpumask_of_cpu(cpu)))
		...

  * Other minor code optimizations (like using cpus_clear instead of
    CPU_MASK_NONE, etc.)

Applies to linux-2.6.tip/master.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 17:40:56 -08:00
Cyrill Gorcunov
d680fe4477 x86: entry_64 - introduce FTRACE_ frame macro v2
Impact: clean up

Itroduce MCOUNT_SAVE/RESTORE_FRAME which allow us to
save a number of lines on source level.

Also fix a comment in ftrace.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 00:26:38 +01:00
Russ Anderson
c8182f0016 sgi-xp: xpc needs to pass the physical address, not virtual
Impact: fix crash

xpc needs to pass the physical address, not virtual.

Testing uncovered this problem.  The virtual address happens to work
most of the time due to the way bios was masking off the node bits.
Passing the physical address makes it work all of the time.

Signed-off-by: Russ Anderson <rja@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:04:24 +01:00
Jack Steiner
189f67c440 x86: UV fix for global physical addresses
Impact: fix UV boot crash

This fixes a UV bug related to generating global memory addresses
on partitioned systems. Partition systems do not have physical memory
at address 0. Instead, a chunk of high memory is remapped by the chipset
so that it appears to be at address 0. This remapping is INVISIBLE to most
of the OS. The only OS functions that need to be aware of the remaping are
functions that directly interface to the chipset. The GRU is one example.

Also, delete a couple of unused macros related to global memory addresses.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 22:54:03 +01:00
Ingo Molnar
c15cb37cc4 Merge commit 'v2.6.28-rc8' into x86/uv 2008-12-16 22:53:53 +01:00
Ingo Molnar
78f902ccc5 Merge commit 'v2.6.28-rc8' into x86/doc 2008-12-16 22:04:48 +01:00
Jeremy Fitzhardinge
ecbf29cdb3 xen: clean up asm/xen/hypervisor.h
Impact: cleanup

hypervisor.h had accumulated a lot of crud, including lots of spurious
#includes.  Clean it all up, and go around fixing up everything else
accordingly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:50:31 +01:00
Jeremy Fitzhardinge
a79b7a2a75 x86: remove unused iommu_nr_pages
Impact: cleanup, remove dead code

The last usage was removed by the patch set culminating in

| commit e3c449f526
| Author: Joerg Roedel <joerg.roedel@amd.com>
| Date:   Wed Oct 15 22:02:11 2008 -0700
|
|     x86, AMD IOMMU: convert driver to generic iommu_num_pages function

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:31:37 +01:00
Jaswinder Singh
c0195b6da0 x86: ldt.c declare sys_modify_ldt before they get used
Impact: cleanup

In asm/syscalls.h moved out sys_modify_ldt from CONFIG_X86_32 as it is
common for both 32 and 64 bit.

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:18:21 +01:00
Jaswinder Singh
7b5b50f1be x86: signal.c declare do_notify_resume before they get used
Impact: cleanup

In asm/signal.h moved out do_notify_resume from __i386__ as it is common
for both 32 and 64 bit.

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 arch/x86/include/asm/signal.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
2008-12-16 21:10:28 +01:00
Jaswinder Singh
aab02f0ae2 x86: process_64.c declare __switch_to() and sys_arch_prctl before they get used
Impact: cleanup

In asm/system.h moved out __switch_to from CONFIG_X86_32 as it is common for
both 32 and 64 bit.

In asm/pctl.h defined sys_arch_prctl
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:10:27 +01:00
Ingo Molnar
7e91a122b1 Merge branch 'x86/cpufeature' into x86/tsc
Merge itto in x86/tsc because an upcoming patch relies on a new
cpuid bit defined in the x86/cpufeature branch.
2008-12-16 21:02:13 +01:00
Ingo Molnar
d437797406 x86: support always running TSC on Intel CPUs, add cpufeature definition
Impact: add new synthetic-cpuid bit definition

add X86_FEATURE_NONSTOP_TSC to the cpufeature bits - this is in
preparation of Venki's always-running-TSC patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:01:15 +01:00
Ingo Molnar
dd7a5230cd Merge commit 'v2.6.28-rc8' into x86/cpufeature 2008-12-16 20:57:41 +01:00
Andreas Herrmann
29d0887ffd x86: microcode_amd: replace inline asm by common rdmsr/wrmsr functions
Impact: cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:04 +01:00
Ingo Molnar
b6fd6f2673 x86, mm: limit MAXMEM on 64-bit
on 64-bit x86 the physical memory limit is controlled by the sparsemem
bits - which are 44 bits right now. But MAXMEM (the max pfn number
e820 parsing will allow to enter our sizing routines) is set to
0x00003fffffffffff, i.e. 46 bits - that's too large because it overlaps
into the vmalloc range.

So couple MAXMEM to MAX_PHYSMEM_BITS, and add a comment that the
maximum of MAX_PHYSMEM_BITS is 45 bits.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:31:52 +01:00
Jan Beulich
cfc319833b x86, 32-bit: improve lazy TLB handling code
Impact: micro-optimize the 32-bit TLB flush code

Use the faster x86_{read,write}_percpu() accessors here.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:47:17 +01:00
Jan Beulich
b93a531e31 allow bug table entries to use relative pointers (and use it on x86-64)
Impact: reduce bug table size

This allows reducing the bug table size by half. Perhaps there are
other 64-bit architectures that could also make use of this.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:40:32 +01:00
Jan Beulich
1796316a8b x86: consolidate __swp_XXX() macros
Impact: cleanup, code robustization

The __swp_...() macros silently relied upon which bits are used for
_PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
were reported that could have been avoided if these macros properly
used the symbolic constants. Since, as pointed out earlier, for Xen
Dom0 support mainline likewise will need to eliminate the conflict
between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
adjustments, plus it introduces a mechanism to check consistency
between MAX_SWAPFILES_SHIFT and the actual encoding macros.

This also fixes a latent bug in that x86-64 used a 6-bit mask in
__swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
seemingly unrelated) linux/swap.h, this would have resulted in a
collision with _PAGE_FILE.

Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
pgoff_to_pte() calculations.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:34:51 +01:00
Ken Chen
205516c12d x86: convert rdtscll() to use __native_read_tsc
Impact: micro-optimization

Is there any reason why x86 rdtscll have to use the out of line
function instead of inline __native_read_tsc()?  native_read_tsc and
__native_read_tsc is essentially the same functions.

Patch to let x86 rdtscll() to use the inline version of read_tsc.

Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 10:17:02 +01:00
Zachary Amsden
ae8d04e2ec x86 Fix VMI crash on boot in 2.6.28-rc8
VMI initialiation can relocate the fixmap, causing early_ioremap to
malfunction if it is initialized before the relocation.  To fix this,
VMI activation is split into two phases; the detection, which must
happen before setting up ioremap, and the activation, which must happen
after parsing early boot parameters.

This fixes a crash on boot when VMI is enabled under VMware.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-14 16:24:38 -08:00
Ingo Molnar
8299608f14 Merge branches 'irq/sparseirq', 'x86/quirks' and 'x86/reboot' into cpus4096
We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the
cpus4096 tree because the io-apic changes in the sparseirq change
conflict with the cpumask changes in the cpumask tree, and we
want to resolve those.
2008-12-12 13:49:24 +01:00
Ingo Molnar
45ab6b0c76 Merge branch 'sched/core' into cpus4096
Conflicts:
	include/linux/ftrace.h
	kernel/sched.c
2008-12-12 13:48:57 +01:00
Ingo Molnar
81444a7995 Merge branch 'tracing/fastboot' into cpus4096 2008-12-12 12:43:05 +01:00
Hiroshi Shimamoto
915b0d0104 x86: hardirq: introduce inc_irq_stat()
Impact: cleanup

Introduce inc_irq_stat() macro and unify irq_stat accounting code.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:59:49 +01:00
Ingo Molnar
fd10902797 Merge commit 'v2.6.28-rc8' into x86/irq 2008-12-12 11:59:39 +01:00
Hiroshi Shimamoto
8f2466f45f x86: kill #ifdef for exit_idle()
Impact: cleanup

Introduce helper inline function in arch/x86/include/asm/idle.h
to remove #ifdefs around exit_idle().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:58:36 +01:00
Hiroshi Shimamoto
16855f878d x86: uaccess: return value of __{get|put}_user() can be int
Impact: cleanup

The type of return value of __{get|put}_user() can be int.
There is no user to refer the return value of __{get|put}_user() as long.
This reduces code size a bit on 64-bit.

 $ size vmlinux.*
     text	   data	    bss	    dec	    hex	filename
  4509265	 479988	 673588	5662841	 566879	vmlinux.new
  4511462	 479988	 673588	5665038	 56710e	vmlinux.old

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:54:43 +01:00
Markus Metzger
c2724775ce x86, bts: provide in-kernel branch-trace interface
Impact: cleanup

Move the BTS bits from ptrace.c into ds.c.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 08:08:12 +01:00
Ingo Molnar
50dd94e017 sparseirq: fix typo in !CONFIG_IO_APIC case
Impact: build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 18:47:51 +01:00
Hiroshi Shimamoto
4217458daf x86: signal: change type of paramter for sys_rt_sigreturn()
Impact: cleanup on 32-bit

Peter pointed this parameter can be changed.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:21:35 +01:00
Ingo Molnar
aa9c9b8c58 Merge branch 'linus' into x86/quirks 2008-12-08 15:07:49 +01:00
Yinghai Lu
be5d5350a9 x86: MSI start irq numbering from nr_irqs_gsi
Impact: sanitize MSI irq number ordering from top-down to bottom-up

Increase new MSI IRQs starting from nr_irqs_gsi (which is somewhere below
256), instead of decreasing from NR_IRQS. (The latter method can result
in confusingly high IRQ numbers - if NR_CPUS is set to a high value and
NR_IRQS scales up to a high value.)

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:54 +01:00
Yinghai Lu
99d093d128 x86: use NR_IRQS_LEGACY
Impact: cleanup

Introduce NR_IRQS_LEGACY instead of hard coded number.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:52 +01:00
Yinghai Lu
0b8f1efad3 sparse irq_desc[] array: core kernel and x86 changes
Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:51 +01:00
Rafael J. Wysocki
3e1e9002aa x86: change static allocation of trampoline area
Impact: fix trampoline sizing bug, save space

While debugging a suspend-to-RAM related issue it occured to me that
if the trampoline code had grown past 4 KB, we would have been
allocating too little memory for it, since the 4 KB size of the
trampoline is hardcoded into arch/x86/kernel/e820.c .  Change that
by making the kernel compute the trampoline size and allocate as much
memory as necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 13:49:45 +01:00
Linus Torvalds
e948990f95 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix early panic with boot option "nosmp"
  x86/oprofile: fix Intel cpu family 6 detection
  oprofile: fix CPU unplug panic in ppro_stop()
  AMD IOMMU: fix possible race while accessing iommu->need_sync
  AMD IOMMU: set device table entry for aliased devices
  AMD IOMMU: struct amd_iommu remove padding on 64 bit
  x86: fix broken flushing in GART nofullflush path
  x86: fix dma_mapping_error for 32bit x86
2008-12-04 21:40:08 -08:00
Joe Korty
affa219b60 x86: change thread_info's flag field back to 32 bits
Impact: pack struct thread_info more tightly

Change x86_64's thread_info 'flags' field back to __u32.

This was changed to 'unsigned long' when the thread_info*.h
for i386 and x86_64 were merged.  Change it back.  We can
do this as only 27 bits of 'flags' are actually used.

This change actually packs down thread_info by 64 bits:
32 bits are saved by the smaller flags, and 32 bits are
saved by the following 'mm_segment_t field' becoming
naturally 64-bit aligned.

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 11:06:03 +01:00
Ingo Molnar
c0515566f3 Merge commit 'v2.6.28-rc7' into x86/cleanups 2008-12-04 11:05:26 +01:00
Ingo Molnar
b8307db247 Merge commit 'v2.6.28-rc7' into tracing/core 2008-12-04 09:07:19 +01:00
Ingo Molnar
cb9c34e6d0 Merge commit 'v2.6.28-rc7' into core/locking 2008-12-04 08:52:14 +01:00
Ingo Molnar
c36910c147 Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2008-12-03 12:54:45 +01:00
Richard Kennedy
eac9fbc6a9 AMD IOMMU: struct amd_iommu remove padding on 64 bit
Remove 16 bytes of padding from struct amd_iommu on 64bit builds
reducing its size to 120 bytes, allowing it to span one fewer
cachelines.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
FUJITA Tomonori
181de82ee3 x86: remove dead BIO_VMERGE_BOUNDARY definition
Impact: cleanup, remove dead code

The block layer dropped the virtual merge feature
(b8b3e16cfe).

BIO_VMERGE_BOUNDARY definition is meaningless now.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:26:40 +01:00
Ingo Molnar
6083aa485c Merge branch 'x86/io' into x86/iommu
Merge x86/io into x86/iommu due to a small patch conflict in io.h.
2008-12-03 08:25:59 +01:00
Ingo Molnar
a64d31baed Merge branch 'linus' into cpus4096
Conflicts:
	kernel/trace/ring_buffer.c
2008-12-02 20:09:50 +01:00
FUJITA Tomonori
50cec5c51c x86: fix dma_mapping_error for 32bit x86, cleanup
This removes ifdef CONFIG_X86_64 in dma_mapping_error():

1) Xen people plan to use swiotlb on X86_32 for Dom0 support. swiotlb
   uses ops->mapping_error so X86_32 also needs to check
   ops->mapping_error.

2) Removing #ifdef hack is almost always a good thing.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-01 20:36:17 +01:00
Ingo Molnar
f6d2e6f57b Merge branch 'x86/urgent' into x86/iommu 2008-12-01 20:36:13 +01:00
Mahesh Salgaonkar
43714539ea sched: don't export sched_mc_power_savings in laptops
Impact: do not expose a control that has no effect

Fix to prevent sched_mc_power_saving from being exported through sysfs
on single-socket systems. (Say multicore single socket (Laptop))

CPU core map of the boot cpu should be equal to possible number
of cpus for single socket system.

This fix has been developed at FOSS.in kernel workout.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-01 08:44:00 +01:00
Linus Torvalds
66a45cc4cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: always define DECLARE_PCI_UNMAP* macros
  x86: fixup config space size of CPU functions for AMD family 11h
  x86, bts: fix wrmsr and spinlock over kmalloc
  x86, pebs: fix PEBS record size configuration
  x86, bts: turn macro into static inline function
  x86, bts: exclude ds.c from build when disabled
  arch/x86/kernel/pci-calgary_64.c: change simple_strtol to simple_strtoul
  x86: use limited register constraint for setnz
  xen: pin correct PGD on suspend
  x86: revert irq number limitation
  x86: fixing __cpuinit/__init tangle, xsave_cntxt_init()
  x86: fix __cpuinit/__init tangle in init_thread_xstate()
  oprofile: fix an overflow in ppro code
2008-11-30 13:01:04 -08:00
Christoph Hellwig
96b8936a9e remove __ARCH_WANT_COMPAT_SYS_PTRACE
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).

Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 11:00:15 -08:00
Ingo Molnar
93093d099e x86: provide readq()/writeq() on 32-bit too, complete
if HAVE_READQ/HAVE_WRITEQ are defined, the full range of readq/writeq
APIs has to be provided to drivers:

 drivers/infiniband/hw/amso1100/c2.c: In function 'c2_tx_ring_alloc':
 drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function '__raw_writeq'

So provide them on 32-bit as well. Also, map all the APIs to the
strongest ordering variant. It's way too easy to mess such details
up in drivers and the difference between "memory" and "" constrained
asm() constructs is in the noise range.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 10:26:18 +01:00
Ingo Molnar
a0b1131e47 x86: provide readq()/writeq() on 32-bit too, cleanup
Impact: cleanup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 09:38:19 +01:00
Hitoshi Mitake
2c5643b1c5 x86: provide readq()/writeq() on 32-bit too
Impact: add new API for drivers

Add implementation of readq/writeq to x86_32, and add config value to
the x86 architecture to determine existence of readq/writeq.

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 09:38:16 +01:00
Thomas Bogendoerfer
7b1dedca42 x86: fix dma_mapping_error for 32bit x86
Devices like b44 ethernet can't dma from addresses above 1GB. The driver
handles this cases by falling back to GFP_DMA allocation. But for detecting
the problem it needs to get an indication from dma_mapping_error.
The bug is triggered by using a VMSPLIT option of 2G/2G.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 21:00:38 +01:00
Ingo Molnar
3bdae4f464 Merge branch 'x86/debug' into x86/irq
We merge this branch because x86/debug touches code that we started
cleaning up in x86/irq. The two branches started out independent,
but as unexpected amount of activity went into x86/irq, they became
dependent. Resolve that by this cross-merge.
2008-11-28 15:00:48 +01:00
Joerg Roedel
1d9b16d169 x86: move GART specific stuff from iommu.h to gart.h
Impact: cleanup

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 13:06:27 +01:00
Joerg Roedel
b627c8b17c x86: always define DECLARE_PCI_UNMAP* macros
Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off

Currently these macros evaluate to a no-op except the kernel is compiled
with GART or Calgary support. But we also need these macros when we have
SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at
least with SWIOTLB we can define these macros always.

This patch is also for stable backport for the same reason the SWIOTLB
default selection patch is.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 12:44:08 +01:00
Frederic Weisbecker
fb52607afc tracing/function-return-tracer: change the name into function-graph-tracer
Impact: cleanup

This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 01:59:45 +01:00
Markus Metzger
6abb11aecd x86, bts, ptrace: move BTS buffer allocation from ds.c into ptrace.c
Impact: restructure DS memory allocation to be done by the usage site of DS

Require pre-allocated buffers in ds.h.

Move the BTS buffer allocation for ptrace into ptrace.c.
The pointer to the allocated buffer is stored in the traced task's
task_struct together with the handle returned by ds_request_bts().

Removes memory accounting code.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:12 +01:00
Markus Metzger
ca0002a179 x86, bts: base in-kernel ds interface on handles
Impact: generalize the DS code to shared buffers

Change the in-kernel ds.h interface to identify the tracer via a
handle returned on ds_request_~().

Tracers used to be identified via their task_struct.

The changes are required to allow DS to be shared between different
tasks, which is needed for perfmon2 and for ftrace.

For ptrace, the handle is stored in the traced task's task_struct.
This should probably go into a (arch-specific) ptrace context some
time.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:11 +01:00
Ingo Molnar
7d55718b0c Merge branches 'tracing/core', 'x86/urgent' and 'x86/ptrace' into tracing/hw-branch-tracing
This pulls together all the topic branches that are needed
for the DS/BTS/PEBS tracing work.
2008-11-25 17:30:30 +01:00
Markus Metzger
e5e8ca633b x86, bts: turn macro into static inline function
Impact: cleanup

Replace a macro with a static inline function.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:28:51 +01:00
Ingo Molnar
943f3d0300 Merge branches 'sched/core', 'core/core' and 'tracing/core' into cpus4096 2008-11-24 17:46:57 +01:00
Ingo Molnar
b19b3c74c7 Merge branches 'core/debug', 'core/futexes', 'core/locking', 'core/rcu', 'core/signal', 'core/urgent' and 'core/xen' into core/core 2008-11-24 17:44:55 +01:00
Cyrill Gorcunov
3b6c52b5b6 x86: introduce ENTRY(KPROBE_ENTRY)_X86 assembly helpers to catch unbalanced declaration v3
Impact: make ENTRY()/END() macros more capable

It's usefull to catch unbalanced or messed or mixed declarations of ENTRY and
KPROBES. These macros would help a bit.

For example the following code would compile without problems

        ENTRY_X86(mcount)
                retq
        END_X86(mcount)

But if you forget and mess the following form

        ENTRY_X86(mcount)
                retq
        END(mcount)

        ENTRY_X86(ftrace_caller)

The assembler will issue the following message:
Error: ENTRY_X86/KPROBE_X86 unbalanced,missed,mixed

Actually the checking is performed at every _X86 macro
so maybe it's good idea to put ENTRY_KPROBE_FINAL_X86
at the end of .S file to be sure you didn't miss anything.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 19:57:01 +01:00
Hannes Eder
050dc6944b x86: remove duplicate #define from 'cpufeature.h'
Impact: cleanup

Remove duplicate #define from 'cpufeature.h'.

This also fixes the following sparse warning:

 arch/x86/kernel/cpu/capflags.c:54:3: warning: Initializer entry defined twice
 arch/x86/kernel/cpu/capflags.c:58:3:   also defined here

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 13:38:50 +01:00
Cyrill Gorcunov
8a2503fa4a x86: move dwarf2 related macro to dwarf2.h
Impact: cleanup

Move recently introduced dwarf2 macros to dwarf2.h file.
It allow us to not duplicate them in assembly files.

Active usage of _cfi macros don't make assembly files
more obvious to understand but we already have a lot of
macros there which requires to search the definitions
of them *anyway*. But at least it make every cfi usage
one line shorter.

Also some code alignment is done.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 13:20:52 +01:00
Frederic Weisbecker
f201ae2356 tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically
Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:17:26 +01:00
Ingo Molnar
a0a70c735e Merge branches 'tracing/profiling', 'tracing/options' and 'tracing/urgent' into tracing/core 2008-11-23 09:10:32 +01:00
Linus Torvalds
0260da162f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: uaccess_64: fix return value in __copy_from_user()
  x86: quirk for reboot stalls on a Dell Optiplex 330
2008-11-20 13:09:32 -08:00
Ingo Molnar
c032a2de4c Merge branch 'x86/cleanups' into x86/irq
[ merged x86/cleanups into x86/irq to enable a wider IRQ entry code
  patch to be applied, which depends on a cleanup patch in x86/cleanups. ]
2008-11-20 10:48:31 +01:00
Ingo Molnar
90accd6fab Merge branch 'linus' into x86/memory-corruption-check 2008-11-20 09:03:38 +01:00
Ingo Molnar
fbc2a06056 Merge branch 'linus' into x86/uv 2008-11-20 09:02:39 +01:00
Linus Torvalds
3108864e2d Merge branch 'x86/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: make NUMA on 32-bit depend on EXPERIMENTAL again
  x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set
2008-11-19 18:53:02 -08:00
Linus Torvalds
4f7dbc7ff4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: more general identifier for Phoenix BIOS
  AMD IOMMU: check for next_bit also in unmapped area
  AMD IOMMU: fix fullflush comparison length
  AMD IOMMU: enable device isolation per default
  AMD IOMMU: add parameter to disable device isolation
  x86, PEBS/DS: fix code flow in ds_request()
  x86: add rdtsc barrier to TSC sync check
  xen: fix scrub_page()
  x86: fix es7000 compiling
  x86, bts: fix unlock problem in ds.c
  x86, voyager: fix smp generic helper voyager breakage
  x86: move iomap.h to the new include location
2008-11-19 18:51:56 -08:00
Ulrich Drepper
de11defebf reintroduce accept4
Introduce a new accept4() system call.  The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.

The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument.  Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)

SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4().  This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec().  More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).

The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.

Here's a test program.  Works on x86-32.  Should work on x86-64, but
I (mtk) don't have a system to hand to test with.

It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().

I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.

/* test_accept4.c

  Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
       <mtk.manpages@gmail.com>

  Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>

#define PORT_NUM 33333

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

/**********************************************************************/

/* The following is what we need until glibc gets a wrapper for
  accept4() */

/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC    O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK   O_NONBLOCK
#endif

#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif

static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
   printf("Calling accept4(): flags = %x", flags);
   if (flags != 0) {
       printf(" (");
       if (flags & SOCK_CLOEXEC)
           printf("SOCK_CLOEXEC");
       if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
           printf(" ");
       if (flags & SOCK_NONBLOCK)
           printf("SOCK_NONBLOCK");
       printf(")");
   }
   printf("\n");

#if USE_SOCKETCALL
   long args[6];

   args[0] = fd;
   args[1] = (long) sockaddr;
   args[2] = (long) addrlen;
   args[3] = flags;

   return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
   return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}

/**********************************************************************/

static int
do_test(int lfd, struct sockaddr_in *conn_addr,
       int closeonexec_flag, int nonblock_flag)
{
   int connfd, acceptfd;
   int fdf, flf, fdf_pass, flf_pass;
   struct sockaddr_in claddr;
   socklen_t addrlen;

   printf("=======================================\n");

   connfd = socket(AF_INET, SOCK_STREAM, 0);
   if (connfd == -1)
       die("socket");
   if (connect(connfd, (struct sockaddr *) conn_addr,
               sizeof(struct sockaddr_in)) == -1)
       die("connect");

   addrlen = sizeof(struct sockaddr_in);
   acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                      closeonexec_flag | nonblock_flag);
   if (acceptfd == -1) {
       perror("accept4()");
       close(connfd);
       return 0;
   }

   fdf = fcntl(acceptfd, F_GETFD);
   if (fdf == -1)
       die("fcntl:F_GETFD");
   fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
              ((closeonexec_flag & SOCK_CLOEXEC) != 0);
   printf("Close-on-exec flag is %sset (%s); ",
           (fdf & FD_CLOEXEC) ? "" : "not ",
           fdf_pass ? "OK" : "failed");

   flf = fcntl(acceptfd, F_GETFL);
   if (flf == -1)
       die("fcntl:F_GETFD");
   flf_pass = ((flf & O_NONBLOCK) != 0) ==
              ((nonblock_flag & SOCK_NONBLOCK) !=0);
   printf("nonblock flag is %sset (%s)\n",
           (flf & O_NONBLOCK) ? "" : "not ",
           flf_pass ? "OK" : "failed");

   close(acceptfd);
   close(connfd);

   printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
   return fdf_pass && flf_pass;
}

static int
create_listening_socket(int port_num)
{
   struct sockaddr_in svaddr;
   int lfd;
   int optval;

   memset(&svaddr, 0, sizeof(struct sockaddr_in));
   svaddr.sin_family = AF_INET;
   svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
   svaddr.sin_port = htons(port_num);

   lfd = socket(AF_INET, SOCK_STREAM, 0);
   if (lfd == -1)
       die("socket");

   optval = 1;
   if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                  sizeof(optval)) == -1)
       die("setsockopt");

   if (bind(lfd, (struct sockaddr *) &svaddr,
            sizeof(struct sockaddr_in)) == -1)
       die("bind");

   if (listen(lfd, 5) == -1)
       die("listen");

   return lfd;
}

int
main(int argc, char *argv[])
{
   struct sockaddr_in conn_addr;
   int lfd;
   int port_num;
   int passed;

   passed = 1;

   port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;

   memset(&conn_addr, 0, sizeof(struct sockaddr_in));
   conn_addr.sin_family = AF_INET;
   conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
   conn_addr.sin_port = htons(port_num);

   lfd = create_listening_socket(port_num);

   if (!do_test(lfd, &conn_addr, 0, 0))
       passed = 0;
   if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
       passed = 0;
   if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
       passed = 0;
   if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
       passed = 0;

   close(lfd);

   exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}

[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19 18:49:57 -08:00
Ingo Molnar
9676e73a9e Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ftrace.c

[ We conflicted here because we backported a few fixes to
  tracing/urgent - which has different internal APIs. ]
2008-11-19 10:04:25 +01:00
Hiroshi Shimamoto
20a4a236c7 x86: uaccess_64: fix return value in __copy_from_user()
__copy_from_user() will return invalid value 16 when it fails to
access user space and the size is 10.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 22:28:58 +01:00
Yinghai Lu
b5fe363b7d x86: use update_genapic to get rid of ES7000_CLUSTERED_APIC v2
Impact: clean up

We can autodetect those system that need cluster apic, and update genapic
accordingly.

We can also remove wakeup.h for e7000, because it's default one is now
the same as overall default mach_wakecpu.h

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 17:35:40 +01:00
Ingo Molnar
73f56c0d35 Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2008-11-18 16:48:49 +01:00
Ingo Molnar
cbe9ee00ce Merge branch 'x86/urgent' into x86/cleanups 2008-11-18 15:41:36 +01:00
Frederic Weisbecker
0231022cc3 tracing/function-return-tracer: add the overrun field
Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 11:11:00 +01:00
Yinghai Lu
54ac14a8e9 x86: fix wakeup_cpu with numaq/es7000, v2, fix
Impact: fix wakeup_secondary_cpu with hotplug

We can not put that into x86_quirks, because that is __initdata.
So try to move that to genapic, and add update_genapic in x86_quirks.

later we even could use that stub to:

 1. autodetect CONFIG_ES7000_CLUSTERED_APIC
 2. more correct inquire_remote_apic with apic_verbosity setting.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 00:27:24 +01:00
Yinghai Lu
569712b2b0 x86: fix wakeup_cpu with numaq/es7000, v2
Impact: fix secondary-CPU wakeup/init path with numaq and es7000

While looking at wakeup_secondary_cpu for WAKE_SECONDARY_VIA_NMI:

|#ifdef WAKE_SECONDARY_VIA_NMI
|/*
| * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
| * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
| * won't ... remember to clear down the APIC, etc later.
| */
|static int __devinit
|wakeup_secondary_cpu(int logical_apicid, unsigned long start_eip)
|{
|        unsigned long send_status, accept_status = 0;
|        int maxlvt;
|...
|        if (APIC_INTEGRATED(apic_version[phys_apicid])) {
|                maxlvt = lapic_get_maxlvt();

I noticed that there is no warning about undefined phys_apicid...

because WAKE_SECONDARY_VIA_NMI and WAKE_SECONDARY_VIA_INIT can not be
defined at the same time. So NUMAQ is using wrong wakeup_secondary_cpu.

WAKE_SECONDARY_VIA_NMI, WAKE_SECONDARY_VIA_INIT and
WAKE_SECONDARY_VIA_MIP are variants of a weird and fragile
preprocessor-driven "HAL" mechanisms to specify the kind of secondary-CPU
wakeup strategy a given x86 kernel will use.

The vast majority of systems want to use INIT for secondary wakeup - NUMAQ
uses an NMI, (old-style-) ES7000 uses 'MIP' (a firmware driven in-memory
flag to let secondaries continue).

So convert these mechanisms to x86_quirks and add a
->wakeup_secondary_cpu() method to specify the rare exception
to the sane default.

Extend genapic accordingly as well, for 32-bit.

While looking further, I noticed that functions in wakecup.h for numaq
and es7000 are different to the default in mach_wakecpu.h - but smpboot.c
will only use default mach_wakecpu.h with smphook.h.

So we need to add mach_wakecpu.h for mach_generic, to properly support
numaq and es7000, and vectorize the following SMP init methods:

	int trampoline_phys_low;
	int trampoline_phys_high;
	void (*wait_for_init_deassert)(atomic_t *deassert);
	void (*smp_callin_clear_local_apic)(void);
	void (*store_NMI_vector)(unsigned short *high, unsigned short *low);
	void (*restore_NMI_vector)(unsigned short *high, unsigned short *low);
	void (*inquire_remote_apic)(int apicid);

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-17 17:57:34 +01:00
Ingo Molnar
9dacc71ff3 Merge commit 'v2.6.28-rc5' into x86/cleanups 2008-11-17 10:46:18 +01:00
Steven Rostedt
31e889098a ftrace: pass module struct to arch dynamic ftrace functions
Impact: allow archs more flexibility on dynamic ftrace implementations

Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.

Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.

The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.

This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:

  ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
       a nop, where the original text is calling addr. (mod is the
       module struct if called by module init)

  ftrace_make_caller(rec, addr) - convert the code rec->ip that should
       be a nop into a caller to addr.

The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:36:02 +01:00
David Woodhouse
52168e60f7 Revert "x86: blacklist DMAR on Intel G31/G33 chipsets"
This reverts commit e51af66308, which was
wrongly hoovered up and submitted about a month after a better fix had
already been merged.

The better fix is commit cbda1ba898
("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
this blacklisting based on the DMI identification for the offending
motherboard, since sometimes this chipset (or at least a chipset with
the same PCI ID) apparently _does_ actually have an IOMMU.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 11:37:16 -08:00
Rafael J. Wysocki
97a70e548b x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set
Impact: fix crash during hibernation on 32-bit NUMA

The NUMA code on x86_32 creates special memory mapping that allows
each node's pgdat to be located in this node's memory.  For this
purpose it allocates a memory area at the end of each node's memory
and maps this area so that it is accessible with virtual addresses
belonging to low memory.  As a result, if there is high memory,
these NUMA-allocated areas are physically located in high memory,
although they are mapped to low memory addresses.

Our hibernation code does not take that into account and for this
reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
with high memory present.  Fix this by adding a special mapping for
the NUMA-allocated memory areas to the temporary page tables created
during the last phase of resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:28:51 +01:00
Eduardo Habkost
c370e5e089 x86 kdump: make nmi_shootdown_cpus() non-static
Impact: make API available to the rest of x86 platform code

Add prototype to asm/reboot.h.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:46 +01:00
Ingo Molnar
eb42c75878 Merge branch 'linus' into x86/crashdump 2008-11-12 15:43:39 +01:00
Ingo Molnar
708b8eae0f Merge branch 'linus' into core/locking 2008-11-12 12:39:21 +01:00
Len Brown
3e0fe36483 Merge branch 'misc' into release 2008-11-11 21:14:11 -05:00
Bjorn Helgaas
32836259ff ACPI: pci_link: remove acpi_irq_balance_set() interface
This removes the acpi_irq_balance_set() interface from the PCI
interrupt link driver.

x86 used acpi_irq_balance_set() to tell the PCI interrupt link
driver to configure links to minimize IRQ sharing.  But the link
driver can easily figure out whether to turn on IRQ balancing
based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of
that external interface.

It's better for the driver to figure this out at init-time.  If
we set it externally via the x86 code, the interface reduces
modularity, and we depend on the fact that acpi_process_madt()
happens before we process the kernel command line.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-11 21:12:05 -05:00
H. Peter Anvin
14d7ca5c57 x86: attempt reboot via port CF9 if we have standard PCI ports
Impact: Changes reboot behavior.

If port CF9 seems to be safe to touch, attempt it before trying the
keyboard controller.  Port CF9 is not available on all chipsets (a
significant but decreasing number of modern chipsets don't implement
it), but port CF9 itself should in general be safe to poke (no ill
effects if unimplemented) on any system which has PCI Configuration
Method #1 or #2, as it falls inside the PCI configuration port range
in both cases.  No chipset without PCI is known to have port CF9,
either, although an explicit "pci=bios" would mean we miss this and
therefore don't use port CF9.  An explicit "reboot=pci" can be used to
force the use of port CF9.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 16:19:48 -08:00
H. Peter Anvin
939b787130 x86: 64 bits: shrink and align IRQ stubs
Move the IRQ stub generation to assembly to simplify it and for
consistency with 32 bits.  Doing it in a C file with asm() statements
doesn't help clarity, and it prevents some optimizations.

Shrink the IRQ stubs down to just over four bytes per (we fit seven
into a 32-byte chunk.)  This shrinks the total icache consumption of
the IRQ stubs down to an even kilobyte, if all of them are in active
use.

The downside is that we end up with a double jump, which could have a
negative effect on some pipelines.  The double jump is always inside
the same cacheline on any modern chips.

To get the most effect, cache-align the IRQ stubs.

This makes the 64-bit code match changes already done to the 32-bit
code, and should open up irqinit*.c for unification.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 13:51:52 -08:00
H. Peter Anvin
4687518c4c x86: 32 bit: interrupt stub consistency with 64 bit
Don't generate interrupt stubs for interrupt vectors below
FIRST_EXTERNAL_VECTOR, and make the table of interrupt vectors
(interrupt[]) __initconst.  Both of these changes both conserve memory
and improve consistency with 64 bits.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 13:03:07 -08:00
Ivan Vecera
d3ec5cae09 x86: call machine_shutdown and stop all CPUs in native_machine_halt
Impact: really halt all CPUs on halt

Function machine_halt (resp. native_machine_halt) is empty for x86
architectures. When command 'halt -f' is invoked, the message "System
halted." is displayed but this is not really true because all CPUs are
still running.

There are also similar inconsistencies for other arches (some uses
power-off for halt or forever-loop with IRQs enabled/disabled).

IMO there should be used the same approach for all architectures OR
what does the message "System halted" really mean?

This patch fixes it for x86.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 14:50:02 +01:00
Frederic Weisbecker
caf4b323b0 tracing, x86: add low level support for ftrace return tracing
Impact: add infrastructure for function-return tracing

Add low level support for ftrace return tracing.

This plug-in stores return addresses on the thread_info structure of
the current task.

The index of the current return address is initialized when the task
is the first one (init) and when a process forks (the child). It is
not needed when a task does a sys_execve because after this syscall,
it still needs to return on the kernel functions it called.

Note that the code of return_to_handler has been suggested by Steven
Rostedt as almost all of the ideas of improvements in this V3.

For purpose of security, arch/x86/kernel/process_32.c is not traced
because __switch_to() changes the current task during its execution.
That could cause inconsistency in the stored return address of this
function even if I didn't have any crash after testing with tracing on
this function enabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 10:29:11 +01:00
Ingo Molnar
e0cb4ebcd9 Merge branch 'tracing/urgent' into tracing/ftrace
Conflicts:
	kernel/trace/trace.c
2008-11-11 09:40:18 +01:00
Harvey Harrison
19f47c634e x86: x86_32 has its own irq_regs definition
Impact: cleanup

Arches that have their own irq_regs definition are expected to
define ARCH_HAS_OWN_IRQ_REGS or else a generic (unused) set
will also be defined in lib/irq_regs.c

Sparse noticed the unused generic one had no prototype:
lib/irq_regs.c:15:1: warning: symbol 'per_cpu____irq_regs' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-10 08:41:47 +01:00
Ingo Molnar
4fcc50abdf x86: clean up vget_cycles()
Impact: remove unused variable

I forgot to remove the now unused "cycles_t cycles" parameter from
vget_cycles() - which triggers build warnings as tsc.h is included
in a number of files.

Remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-09 21:05:43 +01:00
Arjan van de Ven
3044646148 x86: move iomap.h to the new include location
a new file was accidentally added to include/asm-x86;
move it to the new arch/x86/include/asm location

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-11-09 10:07:58 -08:00
Ingo Molnar
cb9e35dce9 x86: clean up rdtsc_barrier() use
Impact: cleanup

Move rdtsc_barrier() use to vsyscall_64.c where it's relied on,
and point out its role in the context of its use.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08 20:27:00 +01:00
Ingo Molnar
895e031707 Merge branch 'linus' into x86/cleanups 2008-11-08 20:23:02 +01:00
Ingo Molnar
0d12cdd5f8 sched: improve sched_clock() performance
in scheduler-intense workloads native_read_tsc() overhead accounts for
20% of the system overhead:

 659567 system_call                              41222.9375
 686796 schedule                                 435.7843
 718382 __switch_to                              665.1685
 823875 switch_mm                                4526.7857
 1883122 native_read_tsc                          55385.9412
 9761990 total                                      2.8468

this is large part due to the rdtsc_barrier() that is done before
and after reading the TSC.

But sched_clock() is not a precise clock in the GTOD sense, using such
barriers is completely pointless. So remove the barriers and only use
them in vget_cycles().

This improves lat_ctx performance by about 5%.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08 16:48:19 +01:00
Ingo Molnar
a6b0786f7f Merge branches 'tracing/ftrace', 'tracing/fastboot', 'tracing/nmisafe' and 'tracing/urgent' into tracing/core 2008-11-08 09:34:35 +01:00
Linus Torvalds
a15a82f42c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  Revert "x86: default to reboot via ACPI"
  x86: align DirectMap in /proc/meminfo
  AMD IOMMU: fix lazy IO/TLB flushing in unmap path
  x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR
  x86: remove VISWS and PARAVIRT around NR_IRQS puzzle
  x86: mention ACPI in top-level Kconfig menu
  x86: size NR_IRQS on 32-bit systems the same way as 64-bit
  x86: don't allow nr_irqs > NR_IRQS
  x86/docs: remove noirqbalance param docs
  x86: don't use tsc_khz to calculate lpj if notsc is passed
  x86, voyager: fix smp_intr_init() compile breakage
  AMD IOMMU: fix detection of NP capable IOMMUs
2008-11-06 15:57:24 -08:00
Yinghai Lu
7db282fa67 x86: remove VISWS and PARAVIRT around NR_IRQS puzzle
Impact: fix warning message when PARAVIRT is set in config

Remove stale #ifdef components from our IRQ sizing logic.
x86/Voyager is the only holdout.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06 09:35:34 +01:00
Yinghai Lu
1b48976880 x86: size NR_IRQS on 32-bit systems the same way as 64-bit
Impact: make NR_IRQS big enough for system with lots of apic/pins

If lots of IO_APIC's are there (or can be there), size the same way
as 64-bit, depending on MAX_IO_APICS and NR_CPUS.

This fixes the boot problem reported by Ben Hutchings on a 32-bit
server with 5 IO-APICs and 240 IO-APIC pins.

Signed-off-by: Yinghai <yinghai@kernel.org>
Tested-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06 07:23:22 +01:00
Russ Anderson
23c357003b x86: uv: Add UV reserved page bios call
Add UV bios call to get the address of the reserved page.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-05 20:30:25 -08:00
Russ Anderson
e8929c8a6a x86: uv: Add UV memory protection bios call
Add UV bios call to change memory protections.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-05 20:30:20 -08:00
Russ Anderson
64ccf2f9a7 x86: uv: Add UV watchlist bios call
Add UV bios calls to allocate and free watchlists.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-05 20:30:15 -08:00
Uros Bizjak
838e8bb71d x86: Implement change_bit with immediate operand as "lock xorb"
Impact: Minor optimization.

Implement change_bit with immediate bit count as "lock xorb". This is
similar to  "lock orb" and "lock andb"  for set_bit and clear_bit
functions.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-05 09:51:02 -08:00
Ingo Molnar
9fcd18c9e6 sched: re-tune balancing
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems

Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
balancing defaults on NUMA and SMP systems.

Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
why we would not want to have wakeup affinity across nodes as well.
(we already do this in the standard NUMA template.)

lat_ctx on a NUMA box is particularly happy about this change:

before:

 |   phoenix:~/l> ./lat_ctx -s 0 2
 |   "size=0k ovr=2.60
 |   2 5.70

after:

 |   phoenix:~/l> ./lat_ctx -s 0 2
 |   "size=0k ovr=2.65
 |   2 2.07

a 2.75x speedup.

pipe-test is similarly happy about it too:

 |  phoenix:~/sched-tests> ./pipe-test
 |   18.26 usecs/loop.
 |   14.70 usecs/loop.
 |   14.38 usecs/loop.
 |   10.55 usecs/loop.              # +WAKE_AFFINE on domain0+domain1
 |   8.63 usecs/loop.
 |   8.59 usecs/loop.
 |   9.03 usecs/loop.
 |   8.94 usecs/loop.
 |   8.96 usecs/loop.
 |   8.63 usecs/loop.

Also:

 - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
 - enable SD_WAKE_BALANCE on SMP domains

Sysbench+postgresql improves all around the board, quite significantly:

           .28-rc3-11474e2c  .28-rc3-11474e2c-tune
-------------------------------------------------
    1:             571              688    +17.08%
    2:            1236             1206    -2.55%
    4:            2381             2642    +9.89%
    8:            4958             5164    +3.99%
   16:            9580             9574    -0.07%
   32:            7128             8118    +12.20%
   64:            7342             8266    +11.18%
  128:            7342             8064    +8.95%
  256:            7519             7884    +4.62%
  512:            7350             7731    +4.93%
-------------------------------------------------
  SUM:           55412            59341    +6.62%

So it's a win both for the runup portion, the peak area and the tail.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05 18:04:38 +01:00
Linus Torvalds
da4a22cba7 Merge branch 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  io mapping: clean up #ifdefs
  io mapping: improve documentation
  i915: use io-mapping interfaces instead of a variety of mapping kludges
  resources: add io-mapping functions to dynamically map large device apertures
  x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
2008-11-03 10:15:40 -08:00
Steven Rostedt
7e5e26a3d8 ftrace: fix hardirq header for non ftrace archs
Impact: build fix for non-ftrace architectures

Not all archs implement ftrace, and therefore do not have an asm/ftrace.h.
This patch corrects the problem.

The ftrace_nmi_enter/exit now must be defined for all archs that implement
dynamic ftrace. Currently, only x86 does.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03 11:03:43 +01:00
James Bottomley
73557af5bf x86, voyager: fix smp_intr_init() compile breakage
Impact: fix x86/Voyager build

Looks like this became static on the rest of x86.  Fix it up by adding
an external definition to mach-voyager/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03 10:52:21 +01:00
Ingo Molnar
7a895f53cd Merge branches 'tracing/ftrace', 'tracing/markers', 'tracing/mmiotrace', 'tracing/nmisafe', 'tracing/tracepoints' and 'tracing/urgent' into tracing/core 2008-11-03 10:34:23 +01:00
Alok Kataria
eca0cd028b x86: Add a synthetic TSC_RELIABLE feature bit.
Impact: Changes timebase calibration on Vmware.

Use the synthetic TSC_RELIABLE bit to workaround virtualization anomalies.

Virtual TSCs can be kept nearly in sync, but because the virtual TSC
offset is set by software, it's not perfect.  So, the TSC
synchronization test can fail. Even then the TSC can be used as a
clocksource since the VMware platform exports a reliable TSC to the
guest for timekeeping purposes. Use this bit to check if we need to
skip the TSC sync checks.

Along with this also set the CONSTANT_TSC bit when on VMware, since we
still want to use TSC as clocksource on VM running over hardware which
has unsynchronized TSC's (opteron's), since the hypervisor will take
care of providing consistent TSC to the guest.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-01 18:58:01 -07:00
Alok Kataria
88b094fb8d x86: Hypervisor detection and get tsc_freq from hypervisor
Impact: Changes timebase calibration on Vmware.

v3->v2 : Abstract the hypervisor detection and feature (tsc_freq) request
	 behind a hypervisor.c file
v2->v1 : Add a x86_hyper_vendor field to the cpuinfo_x86 structure.
	 This avoids multiple calls to the hypervisor detection function.

This patch adds function to detect if we are running under VMware.
The current way to check if we are on VMware is following,
#  check if "hypervisor present bit" is set, if so read the 0x40000000
   cpuid leaf and check for "VMwareVMware" signature.
#  if the above fails, check the DMI vendors name for "VMware" string
   if we find one we query the VMware hypervisor port to check if we are
   under VMware.

The DMI + "VMware hypervisor port check" is needed for older VMware products,
which don't implement the hypervisor signature cpuid leaf.
Also note that since we are checking for the DMI signature the hypervisor
port should never be accessed on native hardware.

This patch also adds a hypervisor_get_tsc_freq function, instead of
calibrating the frequency which can be error prone in virtualized
environment, we ask the hypervisor for it. We get the frequency from
the hypervisor by accessing the hypervisor port if we are running on VMware.
Other hypervisors too can add code to the generic routine to get frequency on
their platform.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-01 18:57:08 -07:00
Alok Kataria
49ab56ac6e x86: add X86_FEATURE_HYPERVISOR feature bit
Impact: Number declaration only.

Add X86_FEATURE_HYPERVISOR bit (CPUID level 1, ECX, bit 31).

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-01 18:45:37 -07:00
Alok Kataria
b2bcc7b299 x86: add a synthetic TSC_RELIABLE feature bit
Impact: None, bit reservation only

Add a synthetic TSC_RELIABLE feature bit which will be used to mark
TSC as reliable so that we could skip all the runtime checks for
TSC stablity, which have false positives in virtual environment.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-31 14:44:19 -07:00
Venki Pallipadi
2576c99917 x86: fix AMDC1E and XTOPOLOGY conflict in cpufeature
Impact: fix xsave slowdown regression

Fix two features from conflicting in feature bits.

Fixes this performance regression:

   Subject: cpu2000(both float and int) 13% regression with 2.6.28-rc1
   http://lkml.org/lkml/2008/10/28/36

Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Bisected-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 11:01:40 +01:00
Steven Rostedt
a26a2a2739 ftrace: nmi safe code clean ups
Impact: cleanup

This patch cleans up the NMI safe code for dynamic ftrace as suggested
by Andrew Morton.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 10:29:17 +01:00
Keith Packard
fd94093435 x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM

This takes the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic IO resource mapping.

These fixmaps are available even with CONFIG_HIGHMEM turned off.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 10:12:38 +01:00
Huang Ying
9868ee63b8 kexec/i386: setup kexec page table in C
Impact: change the kexec bootstrap code implementation from assembly to C

This patch transforms the kexec page tables setup code from assembler
code to C code in machine_kexec_prepare. This improves readability and
reduces code line number.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 10:01:57 +01:00
Huang Ying
92be3d6bdf kexec/i386: allocate page table pages dynamically
Impact: save .text size when kexec is built in but not loaded

This patch adds an architecture specific struct kimage_arch into
struct kimage. The pointers to page table pages used by kexec are
added to struct kimage_arch. The page tables pages are dynamically
allocated in machine_kexec_prepare instead of statically from BSS
segment. This will save up to 20k memory when kexec image is not
loaded.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31 10:01:56 +01:00
Linus Torvalds
74c75f524e Merge branch 'x86-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: cpu_index build fix
  x86/voyager: fix missing cpu_index initialisation
  x86/voyager: fix compile breakage caused by dc1e35c6e9
  x86: fix /dev/mem mmap breakage when PAT is disabled
  x86/voyager: fix compile breakage casued by x86: move prefill_possible_map calling early
  x86: use CONFIG_X86_SMP instead of CONFIG_SMP
  x86/voyager: fix boot breakage caused by x86: boot secondary cpus through initial_code
  x86, uv: fix compile error in uv_hub.h
  i386/PAE: fix pud_page()
  x86: remove debug code from arch_add_memory()
  x86: start annotating early ioremap pointers with __iomem
  x86: two trivial sparse annotations
  x86: fix init_memory_mapping for [dc000000 - e0000000) - v2
2008-10-30 18:33:46 -07:00