Commit Graph

17298 Commits (2b3b9bb03a9fb1e4c72947cc235771c6455ec7c9)

Author SHA1 Message Date
Olaf Hering 32068f6527 x86: Hyper-V: register clocksource only if its advertised
Enable hyperv_clocksource only if its advertised as a feature.
XenServer 6 returns the signature which is checked in
ms_hyperv_platform(), but it does not offer all features. Currently the
clocksource is enabled unconditionally in ms_hyperv_init_platform(), and
the result is a hanging guest.

Hyper-V spec Bit 1 indicates the availability of Partition Reference
Counter.  Register the clocksource only if this bit is set.

The guest in question prints this in dmesg:
 [    0.000000] Hypervisor detected: Microsoft HyperV
 [    0.000000] HyperV: features 0x70, hints 0x0

This bug can be reproduced easily be setting 'viridian=1' in a HVM domU
.cfg file. A workaround without this patch is to boot the HVM guest with
'clocksource=jiffies'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Link: http://lkml.kernel.org/r/1359940959-32168-1-git-send-email-kys@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 16:25:48 -08:00
Borislav Petkov 5e2a044daf x86, head_32: Give the 6 label a real name
Jumping here we are about to enable paging so rename the label
accordingly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 15:48:42 -08:00
Borislav Petkov c3a22a26d0 x86, head_32: Remove second CPUID detection from default_entry
We do that once earlier now and cache it into new_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 15:48:42 -08:00
Borislav Petkov 9efb58de91 x86: Detect CPUID support early at boot
We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 15:48:41 -08:00
Borislav Petkov 166df91daf x86, head_32: Remove i386 pieces
Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 15:48:40 -08:00
H. Peter Anvin 8ecba5af94 Linux 3.8-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q
 GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5
 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+
 Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo
 EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL
 ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M=
 =jjBc
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc7' into x86/asm

Merge in the updates to head_32.S from the previous urgent branch, as
upcoming patches will make further changes.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 15:47:45 -08:00
H. Peter Anvin ff52c3b02b x86, doc: Clarify the use of asm("%edx") in uaccess.h
Put in a comment that explains that the use of asm("%edx") in
uaccess.h doesn't actually necessarily mean %edx alone.

Cc: Jamie Lokier <jamie@shareable.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/511ACDFB.1050707@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 15:37:02 -08:00
Steven Rostedt f431b634f2 tracing/syscalls: Allow archs to ignore tracing compat syscalls
The tracing of ia32 compat system calls has been a bit of a pain as they
use different system call numbers than the 64bit equivalents.

I wrote a simple 'lls' program that lists files. I compiled it as a i686
ELF binary and ran it under a x86_64 box. This is the result:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/syscalls/enable
echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on

grep lls /debug/tracing/trace

[.. skipping calls before TS_COMPAT is set ...]

             lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
             lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
             lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
             lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
             lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
             lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
             lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
             lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
             lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
             lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
             lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
             lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
             lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
             lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
             lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
             lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
             lls-1127  [005] d...   936.409279: sys_close(fd: 3)
             lls-1127  [005] d...   936.421550: sys_close -> 0x200
             lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
             lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
             lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
             lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
             lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
             lls-1127  [005] d...   936.421580: sys_capget -> 0x0
             lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
             lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
             lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
             lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
             lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
             lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
             lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)

Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
obviously incorrect, and confusing.

Other efforts have been made to fix this:

https://lkml.org/lkml/2012/3/26/367

But the real solution is to rewrite the syscall internals and come up
with a fixed solution. One that doesn't require all the kluge that the
current solution has.

Thus for now, instead of outputting incorrect data, simply ignore them.
With this patch the changes now have:

 #> grep lls /debug/tracing/trace
 #>

Compat system calls simply are not traced. If users need compat
syscalls, then they should just use the raw syscall tracepoints.

For an architecture to make their compat syscalls ignored, it must
define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
define an arch_trace_is_compat_syscall() function that will return true
if the current task should ignore tracing the syscall.

I want to stress that this change does not affect actual syscalls in any
way, shape or form. It is only used within the tracing system and
doesn't interfere with the syscall logic at all. The changes are
consolidated nicely into trace_syscalls.c and asm/ftrace.h.

I had to make one small modification to asm/thread_info.h and that was
to remove the include of asm/ftrace.h. As asm/ftrace.h required the
current_thread_info() it was causing include hell. That include was
added back in 2008 when the function graph tracer was added:

 commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"

It does not need to be included there.

Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.home

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-02-12 17:46:28 -05:00
H. Peter Anvin 3578baaed4 x86, mm: Redesign get_user with a __builtin_choose_expr hack
Instead of using a bitfield, use an odd little trick using typeof,
__builtin_choose_expr, and sizeof.  __builtin_choose_expr is
explicitly defined to not convert its type (its argument is required
to be a constant expression) so this should be well-defined.

The code is still not 100% preturbation-free versus the baseline
before 64-bit get_user(), but the differences seem to be very small,
mostly related to padding and to gcc deciding when to spill registers.

Cc: Jamie Lokier <jamie@shareable.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/511A8922.6050908@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-12 12:46:40 -08:00
H. Peter Anvin 16640165c9 x86: Be consistent with data size in getuser.S
Consistently use the data register by name and use a sized assembly
instruction in getuser.S.  There is never any reason to macroize it,
and being inconsistent in the same file is just annoying.

No actual code change.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-02-11 23:14:48 -08:00
H. Peter Anvin b390784dc1 x86, mm: Use a bitfield to mask nuisance get_user() warnings
Even though it is never executed, gcc wants to warn for casting from
a large integer to a pointer.  Furthermore, using a variable with
__typeof__() doesn't work because __typeof__ retains storage
specifiers (const, restrict, volatile).

However, we can declare a bitfield using sizeof(), which is legal
because sizeof() is a constant expression.  This quiets the warning,
although the code generated isn't 100% identical from the baseline
before 96477b4 x86-32: Add support for 64bit get_user():

[x86-mb is baseline, x86-mm is this commit]

   text      data        bss     filename
113716147  15858380   35037184   tip.x86-mb/o.i386-allconfig/vmlinux
113716145  15858380   35037184   tip.x86-mm/o.i386-allconfig/vmlinux
 12989837   3597944   12255232   tip.x86-mb/o.i386-modconfig/vmlinux
 12989831   3597944   12255232   tip.x86-mm/o.i386-modconfig/vmlinux
  1462784    237608    1401988   tip.x86-mb/o.i386-noconfig/vmlinux
  1462837    237608    1401964   tip.x86-mm/o.i386-noconfig/vmlinux
  7938994    553688    7639040   tip.x86-mb/o.i386-pae/vmlinux
  7943136    557784    7639040   tip.x86-mm/o.i386-pae/vmlinux
  7186126    510572    6574080   tip.x86-mb/o.i386/vmlinux
  7186124    510572    6574080   tip.x86-mm/o.i386/vmlinux
103747269  33578856   65888256   tip.x86-mb/o.x86_64-allconfig/vmlinux
103746949  33578856   65888256   tip.x86-mm/o.x86_64-allconfig/vmlinux
 12116695  11035832   20160512   tip.x86-mb/o.x86_64-modconfig/vmlinux
 12116567  11035832   20160512   tip.x86-mm/o.x86_64-modconfig/vmlinux
  1700790    380524     511808   tip.x86-mb/o.x86_64-noconfig/vmlinux
  1700790    380524     511808   tip.x86-mm/o.x86_64-noconfig/vmlinux
 12413612   1133376    1101824   tip.x86-mb/o.x86_64/vmlinux
 12413484   1133376    1101824   tip.x86-mm/o.x86_64/vmlinux

Cc: Jamie Lokier <jamie@shareable.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130209110031.GA17833@n2100.arm.linux.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-02-11 17:26:51 -08:00
Mike Travis d924f947a4 x86, uv, uv3: Trim MMR register definitions after code changes for SGI UV3
This patch trims the MMR register definitions after the updates for the
SGI UV3 system have been applied.  Note that because these definitions
are automatically generated from the RTL we cannot control the length
of the names.  Therefore there are lines that exceed 80 characters.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194509.173026880@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-11 17:18:25 -08:00
Mike Travis 0af6352045 x86, uv, uv3: Update Time Support for SGI UV3
This patch updates time support for the SGI UV3 hub.  Since the UV2
and UV3 time support is identical, "is_uvx_hub" is used instead of
having both "is_uv2_hub" and "is_uv3_hub".

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.893907185@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-11 17:18:12 -08:00
Mike Travis b15cc4a12b x86, uv, uv3: Update x2apic Support for SGI UV3
This patch adds support for the SGI UV3 hub to the common x2apic
functions.  The primary changes are to account for the similarities
between UV2 and UV3 which are encompassed within the "UVX" nomenclature.

One significant difference within UV3 is the handling of the MMIOH
regions which are redirected to the target blade (with the device) in
a different manner.  It also now has two MMIOH regions for both small and
large BARs.  This aids in limiting the amount of physical address space
removed from real memory that's used for I/O in the max config of 64TB.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.752924185@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Steffen Persvold <sp@numascale.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-11 17:18:03 -08:00
Mike Travis 6edbd4714e x86, uv, uv3: Update Hub Info for SGI UV3
This patch updates the UV HUB info for UV3.  The "is_uv3_hub" and
"is_uvx_hub" (UV2 or UV3) functions are added as well as the addresses
and sizes of the MMR regions for UV3.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.610723192@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-11 17:17:50 -08:00
Mike Travis 526018bc5e x86, uv, uv3: Update ACPI Check to include SGI UV3
Add UV3 to exclusion list.  Instead of adding every new series of
SGI UV systems, just check oem_id to have a prefix of "SGI".

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.457937455@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-11 17:17:44 -08:00
Mike Travis 60fe7be34d x86, uv, uv3: Update MMR register definitions for SGI Ultraviolet System 3 (UV3)
This patch updates the MMR register definitions for the SGI UV3 system.
Note that because these definitions are automatically generated from
the RTL we cannot control the length of the names.  Therefore there are
lines that exceed 80 characters.

All the new MMR definitions are added in this patch.  The patches that
follow then update the references. The last patch is a "trim" patch
which reduces the size of the MMR definitions file by about a third.
This keeps "bi-sectability" in place as the intermediate patches would
not compile correctly if the trimmed MMR defines were done first.

Signed-off-by: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130211194508.326204556@gulag1.americas.sgi.com
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-11 17:17:33 -08:00
Rafael J. Wysocki 17b1639b30 Merge branch 'acpi-lpss'
* acpi-lpss:
  ACPI / platform: create LPSS clocks if Lynxpoint devices are found during scan
  clk: x86: add support for Lynxpoint LPSS clocks
  x86: add support for Intel Low Power Subsystem
  ACPI / platform: fix comment about the platform device name
  ACPI: add support for CSRT table
2013-02-11 13:21:27 +01:00
Rafael J. Wysocki 48694bdb38 Merge branch 'acpica'
* acpica: (56 commits)
  ACPICA: Update version to 20130117
  ACPICA: Update predefined info table for _MLS method
  ACPICA: Remove some extraneous newlines in ACPI_ERROR type calls
  ACPICA: iASL/Disassembler: Add option to ignore NOOP opcodes/operators
  ACPICA: AcpiGetSleepTypeData: Allow \_Sx to return either 1 or 2 integers
  ACPICA: Update ACPICA copyrights to 2013
  ACPICA: Update predefined info table
  ACPICA: Cleanup table handler naming conflicts.
  ACPICA: Source restructuring: split large files into 8 new files.
  ACPICA: Cleanup PM_TIMER_FREQUENCY definition.
  ACPICA: Cleanup ACPI_DEBUG_PRINT macros to fix potential build breakages.
  ACPICA: Update version to 20121220.
  ACPICA: Interpreter: Fix Store() when implicit conversion is not possible.
  ACPICA: Resources: Split interrupt share/wake bits into two fields.
  ACPICA: Resources: Support for ACPI 5 wake bit in ExtendedInterrupt descriptor.
  ACPICA: Interpreter: Add warning if 64-bit constant appears in 32-bit table.
  ACPICA: Update ACPICA initialization messages.
  ACPICA: Namespace: Eliminate dot...dot output during initialization.
  ACPICA: Resource manager: Add support for ACPI 5 wake bit in IRQ descriptor.
  ACPICA: Fix possible memory leak in dispatcher error path.
  ...
2013-02-11 13:20:33 +01:00
Stoney Wang cb214ede76 x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems
When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.

The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.

This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.

The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.

Current code handle x2apic when BIOS pass with xapic mode
enabled:

When user specifies x2apic_phys, or FADT indicates PHYSICAL:

1. During madt oem check, apic driver is set with xapic logical
   or xapic phys driver at first.

2. enable_IR_x2apic() will enable x2apic_mode.

3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
   will install the correct x2apic phys driver and use x2apic phys mode.
   Otherwise it will skip the driver will let x2apic_cluster_probe to
   take over to install x2apic cluster driver (wrong one) even though FADT
   indicates PHYSICAL, because x2apic_phys_probe does not check
   FADT PHYSICAL.

Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.

Signed-off-by: Stoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-11 11:13:00 +01:00
Shuah Khan 136867f517 x86/kvm: Fix compile warning in kvm_register_steal_time()
Fix the following compile warning in kvm_register_steal_time():

  CC      arch/x86/kernel/kvm.o
  arch/x86/kernel/kvm.c: In function ‘kvm_register_steal_time’: arch/x86/kernel/kvm.c:302:3:
  warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Wformat]

Introduced via:

  5dfd486c47 x86, kvm: Fix kvm's use of __pa() on percpu areas
  d765653445 x86, mm: Create slow_virt_to_phys()
  f3c4fbb68e x86, mm: Use new pagetable helpers in try_preserve_large_page()
  4cbeb51b86 x86, mm: Pagetable level size/shift/mask helpers
  a25b931684 x86, mm: Make DEBUG_VIRTUAL work earlier in boot

Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: shuahkhan@gmail.com
Cc: avi@redhat.com
Cc: gleb@redhat.com
Cc: mst@redhat.com
Link: http://lkml.kernel.org/r/1360119442.8356.8.camel@lorien2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-11 11:04:31 +01:00
Takuya Yoshikawa 7a905b1485 KVM: Remove user_alloc from struct kvm_memory_slot
This field was needed to differentiate memory slots created by the new
API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent,
KVM_SET_MEMORY_REGION, whose support was dropped long before:

  commit b74a07beed
  KVM: Remove kernel-allocated memory regions

Although we also have private memory slots to which KVM allocates
memory with vm_mmap(), !user_alloc slots in other words, the slot id
should be enough for differentiating them.

Note: corresponding function parameters will be removed later.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-11 11:52:00 +02:00
Yang Zhang 257090f702 KVM: VMX: disable apicv by default
Without Posted Interrupt, current code is broken. Just disable by
default until Posted Interrupt is ready.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-11 10:51:13 +02:00
Len Brown 27be457000 x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag
Remove 32-bit x86 a cmdline param "no-hlt",
and the cpuinfo_x86.hlt_works_ok that it sets.

If a user wants to avoid HLT, then "idle=poll"
is much more useful, as it avoids invocation of HLT
in idle, while "no-hlt" failed to do so.

Indeed, hlt_works_ok was consulted in only 3 places.

First, in /proc/cpuinfo where "hlt_bug yes"
would be printed if and only if the user booted
the system with "no-hlt" -- as there was no other code
to set that flag.

Second, check_hlt() would not invoke halt() if "no-hlt"
were on the cmdline.

Third, it was consulted in stop_this_cpu(), which is invoked
by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
all cases where the machine is being shutdown/reset.
The flag was not consulted in the more frequently invoked
play_dead()/hlt_play_dead() used in processor offline and suspend.

Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
indicating that it would be removed in 2012.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2013-02-10 03:32:22 -05:00
Len Brown 69fb3676df x86 idle: remove mwait_idle() and "idle=mwait" cmdline param
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT, starting on Pentium-4 HT-enabled processors.

But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
and no longer use mwait_idle().

Here we simplify the x86 native idle code by removing mwait_idle(),
and the "idle=mwait" bootparam used to invoke it.

Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
was invoked saying it would be removed in 2012.  This removal
was also noted in the (now removed:-) feature-removal-schedule.txt.

After this change, kernels configured with
(CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
that supports MWAIT will simply use HLT.  If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be enabled.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2013-02-10 03:03:41 -05:00
Len Brown 6a377ddc4e xen idle: make xen-specific macro xen-specific
This macro is only invoked by Xen,
so make its definition specific to Xen.

> set_pm_idle_to_default()
< xen_set_default_idle()

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: xen-devel@lists.xensource.com
2013-02-10 01:06:34 -05:00
Len Brown e022e7eb90 intel_idle: remove assumption of one C-state per MWAIT flag
Remove the assumption that cstate_tables are
indexed by MWAIT flag values.  Each entry
identifies itself via its own flags value.
This change is needed to support multiple states
that share the same MWAIT flags.

Note that this can have an effect on what state is described
by 'N' on cmdline intel_idle.max_cstate=N on some systems.

intel_idle.max_cstate=0 still disables the driver
intel_idle.max_cstate=1 still results in just C1(E)
However, "place holders" in the sparse C-state name-space
(eg. Atom) have been removed.

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-08 19:29:16 -05:00
Len Brown 137ecc779c intel_idle: remove use and definition of MWAIT_MAX_NUM_CSTATES
Cosmetic only.

Replace use of MWAIT_MAX_NUM_CSTATES with CPUIDLE_STATE_MAX.
They are both 8, so this patch has no functional change.

The reason to change is that intel_idle will soon be able
to export more than the 8 "major" states supported by MWAIT.
When we hit that limit, it is important to know
where the limit comes from.

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-08 19:28:10 -05:00
Len Brown 6792041834 tools/power turbostat: decode MSR_IA32_POWER_CTL
When verbose is enabled, print the C1E-Enable
bit in MSR_IA32_POWER_CTL.

also delete some redundant tests on the verbose variable.

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-08 19:26:16 -05:00
David S. Miller fd5023111c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-08 18:02:14 -05:00
Oleg Nesterov 74e59dfc6b uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain()
Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
what consumer->handler() needs but uprobe_get_swbp_addr() is not
exported.

This also simplifies the code and makes it more consistent across
the supported architectures. handle_swbp() becomes the only caller
of uprobe_get_swbp_addr().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
2013-02-08 17:47:11 +01:00
Oleg Nesterov cf31ec3f7f uprobes/x86: Change __skip_sstep() to actually skip the whole insn
__skip_sstep() doesn't update regs->ip. Currently this is correct
but only "by accident" and it doesn't skip the whole insn. Change
it to advance ->ip by the length of the detected 0x66*0x90 sequence.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:11 +01:00
Ville Syrjälä 96477b4cd7 x86-32: Add support for 64bit get_user()
Implement __get_user_8() for x86-32. It will return the
64-bit result in edx:eax register pair, and ecx is used
to pass in the address and return the error value.

For consistency, change the register assignment for all
other __get_user_x() variants, so that address is passed in
ecx/rcx, the error value is returned in ecx/rcx, and eax/rax
contains the actual value.

[ hpa: I modified the patch so that it does NOT change the calling
  conventions for the existing callsites, this also means that the code
  is completely unchanged for 64 bits.

  Instead, continue to use eax for address input/error output and use
  the ecx:edx register pair for the output. ]

This is a partial refresh of a patch [1] by Jamie Lokier from
2004. Only the minimal changes to implement 64bit get_user()
were picked from the original patch.

[1] http://article.gmane.org/gmane.linux.kernel/198823

Originally-by: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link:
http://lkml.kernel.org/r/1355312043-11467-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-07 15:07:28 -08:00
Kees Cook e575a86fdc x86: Do not leak kernel page mapping locations
Without this patch, it is trivial to determine kernel page
mappings by examining the error code reported to dmesg[1].
Instead, declare the entire kernel memory space as a violation
of a present page.

Additionally, since show_unhandled_signals is enabled by
default, switch branch hinting to the more realistic
expectation, and unobfuscate the setting of the PF_PROT bit to
improve readability.

[1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/

Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07 19:57:44 +01:00
Xiao Guangrong 24db2734ad KVM: MMU: cleanup __direct_map
Use link_shadow_page to link the sp to the spte in __direct_map

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06 22:42:09 -02:00
Xiao Guangrong f761620377 KVM: MMU: remove pt_access in mmu_set_spte
It is only used in debug code, so drop it

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06 22:42:08 -02:00
Xiao Guangrong 55dd98c3a8 KVM: MMU: cleanup mapping-level
Use min() to cleanup mapping_level

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06 22:42:08 -02:00
Xiao Guangrong caf6900f2d KVM: MMU: lazily drop large spte
Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them not present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06 22:28:01 -02:00
Gleb Natapov 5037878e22 KVM: VMX: cleanup vmx_set_cr0().
When calculating hw_cr0 teh current code masks bits that should be always
on and re-adds them back immediately after. Cleanup the code by masking
only those bits that should be dropped from hw_cr0. This allow us to
get rid of some defines.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-06 22:00:02 -02:00
H. Peter Anvin bb9b1a834f Retract MCE-specific UAPI exports which are unused and shouldn't be
used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7XZeAAoJEBLB8Bhh3lVKzAAQAKJPmNM4jK3o7QixkySFOKAN
 GKDmciwBCSLDvhDU5bXSKEzhxaWq+/Wp8aTy0u28TUtIq+Dnya8+9xAMZqe56lNv
 slBNk1N2IIJBqZoul1FCY7MyBnss19nh2h/jN9PChcY9dGpgkRnKc3rkjjy4UJ10
 YibUm/wLuzNGCskt6nGO4uVSDXyv5W6PCnpOb6IW5KFywHE0ojsVwZQM5Vnakjrd
 vsyicHHaIXSnRtyDWGZSX4JxY7HDK7+v03OHNvO3FtFzNjTJ8r1qU7LVbC1yBPP6
 uFN+piBUYn6q6Gsy67H1FcufGTz0IzFwyx/akW1MkgdN4FggrLiFf0a10Fh0Dx3s
 jd2tuCv72rfOQUZAYg9kobHk1NecCZt0Wt3NWul3WjZHZC35nFgznnUiznAPhoCI
 /sBKE2VQt0CejigNTIEpaToc/kHuQy3RzmMphknROEehXdm79TqdirNNHKrDkYug
 U9SV4nsOaRFcSZDPXK52xFIEb9Uc90zSWqNblhBRXt7sBC34EqZk95jYDgOrZxHQ
 v8iNieYgOezXzIyyUAHC7dlIUbkHlBsPG+8eEoWFcvOBI3UbGvOv3abMO18yBZqr
 28An+v8pMsT9Q5sUwHFwvblD/vDaj/NIO0OP+y40Z96KnXz8Cm4mpkzBvBdVVxUs
 cmaDId47QRqdvIW2mBZZ
 =2Zr4
 -----END PGP SIGNATURE-----

Merge tag 'ras_for_3.8' into x86/urgent

Retract MCE-specific UAPI exports which are unused and shouldn't be
used.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-06 14:18:53 -08:00
Jacob Shin 0fbdad078a perf/x86: Allow for architecture specific RDPMC indexes
Similar to config_base and event_base, allow architecture
specific RDPMC ECX values.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-6-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06 19:45:24 +01:00
Jacob Shin 4c1fd17a1c perf/x86: Move MSR address offset calculation to architecture specific files
Move counter index to MSR address offset calculation to
architecture specific files. This prepares the way for
perf_event_amd to enable counter addresses that are not
contiguous -- for example AMD Family 15h processors have 6 core
performance counters starting at 0xc0010200 and 4 northbridge
performance counters starting at 0xc0010240.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-5-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06 19:45:24 +01:00
Jacob Shin 9f19010af8 perf/x86/amd: Use proper naming scheme for AMD bit field definitions
Update these AMD bit field names to be consistent with naming
convention followed by the rest of the file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-4-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06 19:45:23 +01:00
Robert Richter 4dd4c2ae55 perf/x86/amd: Generalize northbridge constraints code for family 15h
Generalize northbridge constraints code for family 10h so that
later we can reuse the same code path with other AMD processor
families that have the same northbridge event constraints.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-3-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06 19:45:23 +01:00
Robert Richter 2c53c3dd0b perf/x86/amd: Rework northbridge event constraints handler
Code simplification. No functional changes.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06 19:45:22 +01:00
Gleb Natapov b0da5bec30 KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05 23:29:46 -02:00
Dongxiao Xu c08800a56c KVM: VMX: disable SMEP feature when guest is in non-paging mode
SMEP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.

We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0. This is because KVM uses an identity
mapping page table to emulate the non-paging mode, where the page
table is set with USER flag. If SMEP is still enabled in this case,
guest will meet unhandlable page fault and then crash.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05 23:28:07 -02:00
Gleb Natapov 834be0d83f Revert "KVM: MMU: split kvm_mmu_free_page"
This reverts commit bd4c86eaa6.

There is not user for kvm_mmu_isolate_page() any more.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-05 22:47:39 -02:00
Ingo Molnar b2c77a57e4 This implements the cputime accounting on full dynticks CPUs.
Typical cputime stats infrastructure relies on the timer tick and
 its periodic polling on the CPU to account the amount of time
 spent by the CPUs and the tasks per high level domains such as
 userspace, kernelspace, guest, ...
 
 Now we are preparing to implement full dynticks capability on
 Linux for Real Time and HPC users who want full CPU isolation.
 This feature requires a cputime accounting that doesn't depend
 on the timer tick.
 
 To implement it, this new cputime infrastructure plugs into
 kernel/user/guest boundaries to take snapshots of cputime and
 flush these to the stats when needed. This performs pretty
 much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
 and cputime snaphots are synchronized between write and read
 side such that the latter can safely retrieve the pending tickless
 cputime of a task and add it to its latest cputime snapshot to
 return the correct result to the user.
 
 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRBsKnAAoJEIUkVEdQjox3lMgP/2R6DU2f8PyGIao3hne4M3Pu
 L3q+mAG53b24Dy014KeW7gd8yv45fE7wp/rs8CGLte9VzbLkRCDSFQPgBuXVagRj
 tV5nfAuqD0wHTnA+HhBE3l3C2RKAPGIu79rBpnIR/QIPPl8Z3Dby8YgmxEQKDf8G
 j7MEBu2LthSuqEi2ZXemnO5r0oEnQAzAp4TTi/M38k0Fmt59nOGyjLnI+xHYCBMa
 1pnz7j3jjR9NJExGu8iVvbo+jupuQngP8qmkLXHvYnj/TEJNwzO1hHVoSwOpjYpS
 9ycl+T8IKQLbAkBywLtq3Mzde43xt/t8wYyGZ0oAV+Z7MIpz/9YIfDJwqQeqoNbD
 dAdbNjKMbsxCgmrnyqSagfMQg/r3CPZ4vf40TMCaN4gNUJC4Ie+E4kPRKRh59+PB
 Ukthmqujn0f40LAa+HXTUuzafd3b0s/ewH+8FuQ6LAG9b5+WnoN8JTJ5u6+ydokO
 ZleeOowuRZZEg+abQ8Sm2GRm/BzN29gi/npb//I+ZDXWv/+3yccgsiPjCRzCAAaO
 g1RmYryFSRUwHQbGNNypVWVuOLWvrBQ4jqbGO7BBuBByZMSHryKxR6mb+inH3qLE
 xIDM9SdSJisc292OzoFKwVZki4MaXaadJXJduVvqYlZQvXXs7eAa4wo3euhtVITD
 NLQO5OZXE4oIQmDFb0FV
 =1Tzp
 -----END PGP SIGNATURE-----

Merge tag 'full-dynticks-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core

Pull full-dynticks (user-space execution is undisturbed and
receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready,
from Frederic Weisbecker:

 "This implements the cputime accounting on full dynticks CPUs.

  Typical cputime stats infrastructure relies on the timer tick and
  its periodic polling on the CPU to account the amount of time
  spent by the CPUs and the tasks per high level domains such as
  userspace, kernelspace, guest, ...

  Now we are preparing to implement full dynticks capability on
  Linux for Real Time and HPC users who want full CPU isolation.
  This feature requires a cputime accounting that doesn't depend
  on the timer tick.

  To implement it, this new cputime infrastructure plugs into
  kernel/user/guest boundaries to take snapshots of cputime and
  flush these to the stats when needed. This performs pretty
  much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
  and cputime snaphots are synchronized between write and read
  side such that the latter can safely retrieve the pending tickless
  cputime of a task and add it to its latest cputime snapshot to
  return the correct result to the user."

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05 13:10:33 +01:00
Gleb Natapov eb3fce87cc KVM: MMU: drop superfluous is_present_gpte() check.
Gust page walker puts only present ptes into ptes[] array. No need to
check it again.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov 116eb3d30e KVM: MMU: drop superfluous min() call.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov 2c9afa52ef KVM: MMU: set base_role.nxe during mmu initialization.
Move base_role.nxe initialisation to where all other roles are initialized.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov 9bb4f6b15e KVM: MMU: drop unneeded checks.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Gleb Natapov feb3eb704a KVM: MMU: make spte_is_locklessly_modifiable() more clear
spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are present on spte. Make it more explicit.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04 23:24:28 -02:00
Linus Torvalds 3f4e5aacf7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three small fixlets"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up annoying warning
  x86, doc: Boot protocol 2.12 is in 3.8
  x86-64: Replace left over sti/cli in ia32 audit exit code
2013-02-05 07:59:44 +11:00
Linus Torvalds 51c1abb95f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three fixlets and two small (and low risk) hw-enablement changes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix event group context move
  x86/perf: Add IvyBridge EP support
  perf/x86: Fix P6 driver section warning
  arch/x86/tools/insn_sanity.c: Identify source of messages
  perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
2013-02-05 07:57:09 +11:00
Borislav Petkov f76e39c531 x86/intel/cacheinfo: Shut up annoying warning
I've been getting the following warning when doing randbuilds
since forever. Now it finally pissed me off just the perfect
amount so that I can fix it.

  arch/x86/kernel/cpu/intel_cacheinfo.c:489:27: warning: ‘cache_disable_0’ defined but not used [-Wunused-variable]
  arch/x86/kernel/cpu/intel_cacheinfo.c:491:27: warning: ‘cache_disable_1’ defined but not used [-Wunused-variable] arch/x86/kernel/cpu/intel_cacheinfo.c:524:27: warning: ‘subcaches’ defined but not used [-Wunused-variable]

It happens because in randconfigs where CONFIG_SYSFS is not set,
the whole sysfs-interface to L3 cache index disabling is
remaining unused and gcc correctly warns about it. Make it
optional, depending on CONFIG_SYSFS too, as is the case with
other sysfs-related machinery in this file.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/1359969195-27362-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-04 11:29:52 +01:00
Thomas Gleixner 90889a635a Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux into timers/core
Trivial conflict in arch/x86/Kconfig

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-04 11:03:03 +01:00
Al Viro 5b3eb3ade4 x86: switch to generic old sigaction
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:27 -05:00
Al Viro 29fd448084 x86: switch to generic compat rt_sigaction()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:26 -05:00
Al Viro d7c43e4afb x86: switch to generic compat sched_rr_get_interval()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:26 -05:00
Al Viro 15ce1f7154 x86,um: switch to generic old sigsuspend()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:26 -05:00
Al Viro 7b83d1a297 x86: switch to generic compat rt_sigqueueinfo()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:25 -05:00
Al Viro f45adb0499 x86: switch to generic compat rt_sigpending()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:25 -05:00
Al Viro 49cb25e929 x86: get rid of pt_regs argument in vm86/vm86old
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:24 -05:00
Al Viro 3fe26fa34d x86: get rid of pt_regs argument in sigreturn variants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:24 -05:00
Al Viro b3af11afe0 x86: get rid of pt_regs argument of iopl(2)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:24 -05:00
Al Viro ea93a6e2e7 amd64: get rid of useless RESTORE_TOP_OF_STACK in stub_execve()
we are not going to return via SYSRET anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 18:16:23 -05:00
Al Viro 574c4866e3 consolidate kernel-side struct sigaction declarations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 15:09:22 -05:00
Al Viro 92a3ce4a1e consolidate declarations of k_sigaction
Only alpha and sparc are unusual - they have ka_restorer in it.
And nobody needs that exposed to userland.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 15:09:22 -05:00
Al Viro eaca6eae3e sanitize rt_sigaction() situation a bit
Switch from __ARCH_WANT_SYS_RT_SIGACTION to opposite
(!CONFIG_ODD_RT_SIGACTION); the only two architectures that
need it are alpha and sparc.  The reason for use of CONFIG_...
instead of __ARCH_... is that it's needed only kernel-side
and doing it that way avoids a mess with include order on many
architectures.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 15:09:18 -05:00
H. Peter Anvin 68d00bbebb Merge remote-tracking branch 'origin/x86/mm' into x86/mm2
Explicitly merging these two branches due to nontrivial conflicts and
to allow further work.

Resolved Conflicts:
	arch/x86/kernel/head32.c
	arch/x86/kernel/head64.c
	arch/x86/mm/init_64.c
	arch/x86/realmode/init.c

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-01 02:28:36 -08:00
H. Peter Anvin b5831174f9 Linux 3.8-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRCxWdAAoJEHm+PkMAQRiG3bAH/28D2NbRLIDo6QIzk3seCCh3
 A6vqDWbX671JRa0RO38DrWhqv6L/EisvjNZMVxBNaN635+3+yC7wsKziXYxA4+AL
 Ef9JISiwXykRWwrC8Q34YBWfWgeFTmaau71IRv45x2OTwiijd6cvjXhLDrOhHEGL
 rdAGJxIdBOfuASw1zpn9PxzfAtg1j/FGP03B4yy+JFhYzuDp3pvnDIysAnXefLml
 UheuBELdNf8Jx7NtW80PTa+TQtruWPC8otoiJevV4TrAWmAOctJiM1izL+VOZqqD
 I/v5aGf9mCN+cxLq06imwgEHWmP1I7yDdjjTHrr4wwIMws1vg8PNsJqG3AsPcwM=
 =dR8S
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc6' into x86/urgent

Linux 3.8-rc6

Merged in order to add a documentation update versus new code in
upstream.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 20:22:57 -08:00
H. Peter Anvin 07f4207a30 x86-32, mm: Remove reference to alloc_remap()
We have removed the remap allocator for x86-32, and x86-64 never had
it (and doesn't need it).  Remove residual reference to it.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQVn6_QZi3fNQ-JHYiR-7jeDJ5hT0SyT_%2BzVvfOj=PzF3w@mail.gmail.com
2013-01-31 14:12:30 -08:00
H. Peter Anvin bb112aec5e x86-32, mm: Remove reference to resume_map_numa_kva()
Remove reference to removed function resume_map_numa_kva().

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com
2013-01-31 14:12:30 -08:00
Dave Hansen f03574f2d5 x86-32, mm: Rip out x86_32 NUMA remapping code
This code was an optimization for 32-bit NUMA systems.

It has probably been the cause of a number of subtle bugs over
the years, although the conditions to excite them would have
been hard to trigger.  Essentially, we remap part of the kernel
linear mapping area, and then sometimes part of that area gets
freed back in to the bootmem allocator.  If those pages get
used by kernel data structures (say mem_map[] or a dentry),
there's no big deal.  But, if anyone ever tried to use the
linear mapping for these pages _and_ cared about their physical
address, bad things happen.

For instance, say you passed __GFP_ZERO to the page allocator
and then happened to get handed one of these pages, it zero the
remapped page, but it would make a pte to the _old_ page.
There are probably a hundred other ways that it could screw
with things.

We don't need to hang on to performance optimizations for
these old boxes any more.  All my 32-bit NUMA systems are long
dead and buried, and I probably had access to more than most
people.

This code is causing real things to break today:

	https://lkml.org/lkml/2013/1/9/376

I looked in to actually fixing this, but it requires surgery
to way too much brittle code, as well as stuff like
per_cpu_ptr_to_phys().

[ hpa: Cc: this for -stable, since it is a memory corruption issue.
  However, an alternative is to simply mark NUMA as depends BROKEN
  rather than EXPERIMENTAL in the X86_32 subclause... ]

Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2013-01-31 14:12:30 -08:00
Boris Ostrovsky f0322bd341 x86, AMD: Enable WC+ memory type on family 10 processors
In some cases BIOS may not enable WC+ memory type on family 10
processors, instead converting what would be WC+ memory to CD type.
On guests using nested pages this could result in performance
degradation. This patch enables WC+.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Link: http://lkml.kernel.org/r/1359495169-23278-1-git-send-email-ostr@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:35:38 -08:00
Boris Ostrovsky 6bf08a8dcd x86, AMD: Clean up init_amd()
Clean up multiple declarations of variable used for rd/wrmsr.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Link: http://lkml.kernel.org/r/1359495136-23244-1-git-send-email-ostr@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:35:32 -08:00
Fenghua Yu da76f64e7e x86/Kconfig: Make early microcode loading a configuration feature
MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new
configurations to enable or disable the feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-13-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:20:42 -08:00
Fenghua Yu cd745be89e x86/mm/init.c: Copy ucode from initrd image to kernel memory
Before initrd image is freed, copy valid ucode patches from initrd image
to kernel memory. The saved ucode will be used to update AP in resume
or hotplug.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-12-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:20:26 -08:00
Fenghua Yu feddc9de8b x86/head64.c: Early update ucode in 64-bit
This updates ucode on BSP in 64-bit mode. Paging and virtual address are
working now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-11-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:20:24 -08:00
Fenghua Yu 63b553c68d x86/head_32.S: Early update ucode in 32-bit
This updates ucode in 32-bit kernel on BSP and AP. At this point, there is no
paging and no virtual address yet.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:20 -08:00
Fenghua Yu ec400ddeff x86/microcode_intel_early.c: Early update ucode on Intel's CPU
Implementation of early update ucode on Intel's CPU.

load_ucode_intel_bsp() scans ucode in initrd image file which is a cpio format
ucode followed by ordinary initrd image file. The binary ucode file is stored
in kernel/x86/microcode/GenuineIntel.bin in the cpio data. All ucode
patches with the same model as BSP are saved in memory. A matching ucode patch
is updated on BSP.

load_ucode_intel_ap() reads saved ucoded patches and updates ucode on AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:18 -08:00
Fenghua Yu 086fc8f803 x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled()
This function is called in __native_flush_tlb_global() and after
apply_microcode_early().

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:16 -08:00
Fenghua Yu e666dfa273 x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
Define interfaces microcode_sanity_check() and get_matching_microcode(). They
are called both in early boot time and in microcode Intel driver.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:14 -08:00
Fenghua Yu a8ebf6d1d6 x86/microcode_core_early.c: Define interfaces for early loading ucode
Define interfaces load_ucode_bsp() and load_ucode_ap() to load ucode on BSP and
AP in early boot time. These are generic interfaces. Internally they call
vendor specific implementations.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:12 -08:00
Fenghua Yu e6ebf5deaa x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP
In 64 bit, load ucode on AP in cpu_init().

In 32 bit, show ucode loading info on AP in cpu_init(). Microcode has been
loaded earlier before paging. Now it is safe to show the loading microcode
info on this AP.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:19:06 -08:00
Fenghua Yu d288e1cf8e x86/common.c: Make have_cpuid_p() a global function
Remove static declaration in have_cpuid_p() to make it a global function. The
function will be called in early loading microcode.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:18:58 -08:00
Fenghua Yu 9cd4d78e21 x86/microcode_intel.h: Define functions and macros for early loading ucode
Define some functions and macros that will be used in early loading ucode. Some
of them are moved from microcode_intel.c driver in order to be called in early
boot phase before module can be called.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31 13:18:50 -08:00
Thomas Gleixner a9037430c6 Merge branch 'timers/for-arm' into timers/core 2013-01-31 22:17:10 +01:00
Sukadev Bhattiprolu 2663960c15 perf: Make EVENT_ATTR global
Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is
available to all architectures.

Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass
in the variable name as a parameter.

Changelog[v2]
	- [Jiri Olsa] No need to define PMU_EVENT_PTR()

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062422.GC13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-31 13:07:50 -03:00
Lee, Chun-Yi deb94101c4 x86, efi: Allow slash in file path of initrd
When initrd file didn't put at the same place with stub kernel, we
need give the file path of initrd, but need use backslash to separate
directory and file. It's not friendly to unix/linux user, and not so
intuitive for bootloader forward paramters to efi stub kernel by
chainloading.

This patch add support to handle_ramdisks for allow slash in file path
of initrd, it convert slash to backlash when parsing path.

In additional, this patch also separates print code of efi_char16_t from
efi_printk, and print out the path/filename of initrd when failed to open
initrd file. It's good for debug and discover typo.

Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-31 14:44:44 +00:00
Borislav Petkov 1e9209edc7 x86/numa: Use __pa_nodebug() instead
... and fix the following warning:

  arch/x86/mm/numa.c: In function ‘setup_node_data’:
  arch/x86/mm/numa.c:222:3: warning: passing argument 1 of ‘__phys_addr_nodebug’ makes integer from pointer without a cast

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1359245901-8512-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-31 11:40:55 +01:00
Jan Beulich 40a1ef95da x86-64: Replace left over sti/cli in ia32 audit exit code
For some reason they didn't get replaced so far by their
paravirt equivalents, resulting in code to be run with
interrupts disabled that doesn't expect so (causing, in the
observed case, a BUG_ON() to trigger) when syscall auditing is
enabled.

David (Cc-ed) came up with an identical fix, so likely this can
be taken to count as an ack from him.

Reported-by: Peter Moody <pmoody@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/5108E01902000078000BA9C5@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Tested-by: Peter Moody <pmoody@google.com>
2013-01-31 10:36:01 +01:00
Linus Torvalds 04c2eee5b9 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI fixes from Peter Anvin:
 "This is a collection of fixes for the EFI support.  The controversial
  bit here is a set of patches which bumps the boot protocol version as
  part of fixing some serious problems with the EFI handover protocol,
  used when booting under EFI using a bootloader as opposed to directly
  from EFI.  These changes should also make it a lot saner to support
  cross-mode 32/64-bit EFI booting in the future.  Getting these changes
  into 3.8 means we avoid presenting an inconsistent ABI to bootloaders.

  Other changes are display detection and fixing efivarfs."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: remove attribute check from setup_efi_pci
  x86, build: Dynamically find entry points in compressed startup code
  x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode
  x86, efi: Fix 32-bit EFI handover protocol entry point
  x86, efi: Fix display detection in EFI boot stub
  x86, boot: Define the 2.12 bzImage boot protocol
  x86/boot: Fix minor fd leakage in tools/relocs.c
  x86, efi: Set runtime_version to the EFI spec revision
  x86, efi: fix 32-bit warnings in setup_efi_pci()
  efivarfs: Delete dentry from dcache in efivarfs_file_write()
  efivarfs: Never return ENOENT from firmware
  efi, x86: Pass a proper identity mapping in efi_call_phys_prelog
  efivarfs: Drop link count of the right inode
2013-01-31 17:10:36 +11:00
Linus Torvalds bdb0ae6a76 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is a collection of miscellaneous fixes, the most important one is
  the fix for the Samsung laptop bricking issue (auto-blacklisting the
  samsung-laptop driver); the efi_enabled() changes you see below are
  prerequisites for that fix.

  The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI
  debugging, and requiring CAP_SYS_RAWIO for MSR references, just as
  with I/O port references."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  samsung-laptop: Disable on EFI hardware
  efi: Make 'efi_enabled' a function to query EFI facilities
  smp: Fix SMP function call empty cpu mask race
  x86/msr: Add capabilities check
  x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
  x86/olpc: Fix olpc-xo1-sci.c build errors
  arch/x86/platform/uv: Fix incorrect tlb flush all issue
  x86-64: Fix unwind annotations in recent NMI changes
  x86-32: Start out cr0 clean, disable paging before modifying cr3/4
2013-01-31 17:08:43 +11:00
Eric Dumazet 3b58908a92 x86: bpf_jit_comp: add pkt_type support
Supporting access to skb->pkt_type is a bit tricky if we want
to have a generic code, allowing pkt_type to be moved in struct sk_buff

pkt_type is a bit field, so compiler cannot really help us to find
its offset. Let's use a helper for this : It will throw a one time
message if pkt_type no longer starts at a byte boundary or is
no longer a 3bit field.

Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 22:38:34 -05:00
H. Peter Anvin becbd66080 Various urgent EFI fixes and some warning cleanups for v3.8
* EFI boot stub fix for Macbook Pro's from Maarten Lankhorst
   * Fix an oops in efivarfs from Lingzhu Xiang
   * 32-bit warning cleanups from Jan Beulich
   * Patch to Boot on >512GB RAM systems from Nathan Zimmer
   * Set efi.runtime_version correctly
   * efivarfs updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRCBrMAAoJEC84WcCNIz1VTdcP/2u3ZqohOKJAwwMkyzB3nkrQ
 1mhxKGFDitAAvGQQCOq3oIMgBZHOevKznH3hZtX+hxBxwu7AuNL+qw6Baz8GYZpz
 guFvAZjm2JX2ko1PgtNvPUFZ1krw7TObLW2YstTWhSDoOlRK5kqmA+idaJf1aHDe
 /cwV6Mr6u5N/egyBBcQI1ydKLA6ogmx1zfDsS9b2Vzavw168RGqfrpH3ybcokYND
 /E2NtcRVZagBw35eZHEDNKcoPt5z+skCA4nJyA6bLbxMsq51ZKaK0PKKaA8vd70s
 6Pc7d6zkQG/ZmaxrRfsdQUAYfJRJq/cpeTgS4YurkZB0r0gdxk6I86vYlg+xXi0X
 eqLAkUJJJasVY/1NK/c2vsJ03W9wDYkd2IJpUcl7rWz7Aa/RurY32QmT3SnLop7m
 Tzj3CgXAu/RH8FyMNMWpI85tOis7OcMUfrjmnxquQdCZpLXSsh7Rf5EgBRiv9xhH
 txDOX3y21Jnv2A5efAVWm5EbyI204Wq2nVDzSu0xTMXWkzdBg+/OeyYfzV0Sdguf
 3/MzYTn7mVXh/EZtnvsTyNjgvVxzpXW6mAf+ne9iJaC8MUJVIeSjB7xzSfuHXUBU
 aUc9OnbkHRJCdVSeKqZbLwO3X5mTXqmDMfIcRle3BPewvZ9pOEv8VrGgsNxh9ixW
 JaCpiTdxJDFtz6cLVsNa
 =QrJx
 -----END PGP SIGNATURE-----

Merge tag 'efi-for-3.8' into x86/efi

Various urgent EFI fixes and some warning cleanups for v3.8

  * EFI boot stub fix for Macbook Pro's from Maarten Lankhorst
  * Fix an oops in efivarfs from Lingzhu Xiang
  * 32-bit warning cleanups from Jan Beulich
  * Patch to Boot on >512GB RAM systems from Nathan Zimmer
  * Set efi.runtime_version correctly
  * efivarfs updates

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-30 14:43:05 -08:00
Matt Fleming 83e6818974 efi: Make 'efi_enabled' a function to query EFI facilities
Originally 'efi_enabled' indicated whether a kernel was booted from
EFI firmware. Over time its semantics have changed, and it now
indicates whether or not we are booted on an EFI machine with
bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.

The immediate motivation for this patch is the bug report at,

    https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557

which details how running a platform driver on an EFI machine that is
designed to run under BIOS can cause the machine to become
bricked. Also, the following report,

    https://bugzilla.kernel.org/show_bug.cgi?id=47121

details how running said driver can also cause Machine Check
Exceptions. Drivers need a new means of detecting whether they're
running on an EFI machine, as sadly the expression,

    if (!efi_enabled)

hasn't been a sufficient condition for quite some time.

Users actually want to query 'efi_enabled' for different reasons -
what they really want access to is the list of available EFI
facilities.

For instance, the x86 reboot code needs to know whether it can invoke
the ResetSystem() function provided by the EFI runtime services, while
the ACPI OSL code wants to know whether the EFI config tables were
mapped successfully. There are also checks in some of the platform
driver code to simply see if they're running on an EFI machine (which
would make it a bad idea to do BIOS-y things).

This patch is a prereq for the samsung-laptop fix patch.

Cc: David Airlie <airlied@linux.ie>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Peter Jones <pjones@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Steve Langasek <steve.langasek@canonical.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-30 11:51:59 -08:00
Yinghai Lu 8b78c21d72 x86, 64bit, mm: hibernate use generic mapping_init
We should set mappings only for usable memory ranges under max_pfn
Otherwise causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

Make it only map range in pfn_mapped array.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-34-git-send-email-yinghai@kernel.org
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: linux-pm@vger.kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:59 -08:00
Yinghai Lu 72212675d1 x86, 64bit, mm: Mark data/bss/brk to nx
HPA said, we should not have RW and +x set at the time.

for kernel layout:
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x01000000-0x021434f8]
[    0.000000] .rodata: [0x02200000-0x02a13fff]
[    0.000000]   .data: [0x02c00000-0x02dc763f]
[    0.000000]   .init: [0x02dc9000-0x0312cfff]
[    0.000000]    .bss: [0x0313b000-0x03dd6fff]
[    0.000000]    .brk: [0x03dd7000-0x03dfffff]

before the patch, we have
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82dc9000        1828K     RW             GLB x  pte
0xffffffff82dc9000-0xffffffff82e00000         220K     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff8313a000        1256K     RW             GLB NX pte
0xffffffff8313a000-0xffffffff83200000         792K     RW             GLB x  pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB x  pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

after patch,, we get
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000          16M                           pmd
0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82e00000           2M     RW             GLB NX pte
0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
0xffffffff83000000-0xffffffff83200000           2M     RW             GLB NX pte
0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB NX pmd
0xffffffff83e00000-0xffffffffa0000000         450M                           pmd

so data, bss, brk get NX ...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 6c902b656c x86: Merge early kernel reserve for 32bit and 64bit
They are the same, and we could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-32-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 0212f91596 x86: Add Crash kernel low reservation
During kdump kernel's booting stage, it need to find low ram for
swiotlb buffer when system does not support intel iommu/dmar remapping.

kexed-tools is appending memmap=exactmap and range from /proc/iomem
with "Crash kernel", and that range is above 4G for 64bit after boot
protocol 2.12.

We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.

Try to reserve some under 4G if the normal "Crash kernel" is above 4G.

User could specify the size with crashkernel_low=XX[KMG].

-v2: fix warning that is found by Fengguang's test robot.
-v3: move out get_mem_size change to another patch, to solve compiling
     warning that is found by Borislav Petkov <bp@alien8.de>
-v4: user must specify crashkernel_low if system does not support
     intel or amd iommu.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 7d41a8a4a2 x86, kdump: Remove crashkernel range find limit for 64bit
Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
64bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-30-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:58 -08:00
Yinghai Lu 595ad9af85 memblock: Add memblock_mem_size()
Use it to get mem size under the limit_pfn.
to replace local version in x86 reserved_initrd.

-v2: remove not needed cast that is pointed out by HPA.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu d1af6d045f x86, boot: Not need to check setup_header version for setup_data
That is for bootloaders.

setup_data is in setup_header, and bootloader is copying that from bzImage.
So for old bootloader should keep that as 0 already.

old kexec-tools till now for elf image set setup_data to 0, so it is ok.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-28-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu 8ee2f2dfdb x86, boot: Update comments about entries for 64bit image
Now 64bit entry is fixed on 0x200, can not be changed anymore.

Update the comments to reflect that.

Also put info about it in boot.txt

-v2: fix some grammar error

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-27-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:57 -08:00
Yinghai Lu ee92d81502 x86, boot: Support loading bzImage, boot_params and ramdisk above 4G
xloadflags bit 1 indicates that we can load the kernel and all data
structures above 4G; it is set if kernel is relocatable and 64bit.

bootloader will check if xloadflags bit 1 is set to decide if
it could load ramdisk and kernel high above 4G.

bootloader will fill value to ext_ramdisk_image/size for high 32bits
when it load ramdisk above 4G.
kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 19:32:33 -08:00
Yinghai Lu 0e691cf824 x86, kexec, 64bit: Only set ident mapping for ram.
We should set mappings only for usable memory ranges under max_pfn
Otherwise causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

This patch exposes pfn_mapped array, and only sets ident mapping for ranges
in that array.

This patch relies on new kernel_ident_mapping_init that could handle existing
pgd/pud between different calls.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:35 -08:00
Yinghai Lu 9ebdc79f7a x86, kexec: Replace ident_mapping_init and init_level4_page
Now ident_mapping_init is checking if pgd/pud is present for every 2M,
so several 2Ms are in same PUD, it will keep checking if pud is there
with same pud.

init_level4_page just does not check existing pgd/pud.

We could use generic mapping_init with different settings in info to
replace those two local grown version functions.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-24-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:29 -08:00
Yinghai Lu 084d128398 x86, kexec: Set ident mapping for kernel that is above max_pfn
When first kernel is booted with memmap= or mem=  to limit max_pfn.
kexec can load second kernel above that max_pfn.

We need to set ident mapping for whole image in this case instead of just
for first 2M.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-23-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:26 -08:00
Yinghai Lu 577af55d80 x86, kexec: Remove 1024G limitation for kexec buffer on 64bit
Now 64bit kernel supports more than 1T ram and kexec tools
could find buffer above 1T, remove that obsolete limitation.
and use MAXMEM instead.

Tested on system with more than 1024G ram.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-22-git-send-email-yinghai@kernel.org
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:23 -08:00
Yinghai Lu d3c433bf9a x86, boot: Move lldt/ltr out of 64bit code section
commit 08da5a2ca

    x86_64: Early segment setup for VT

sets up LDT and TR into a valid state in order to speed up boot
decompression under VT.

Those code are put in code64, and it is using GDT that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path and jump to startup_64 directly, and it has different
GDT.

Move those lines into code32 after their GDT is loaded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-21-git-send-email-yinghai@kernel.org
Cc: Zachary Amsden <zamsden@gmail.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:19 -08:00
Yinghai Lu 187a8a73ce x86, boot: Move verify_cpu.S and no_longmode down
We need to move some code to 32bit section in following patch:

   x86, boot: Move lldt/ltr out of 64bit code section

but that will push startup_64 down from 0x200.

According to hpa, we can not change startup_64 position and that
is an ABI.

We could move function verify_cpu and no_longmode down, because
verify_cpu is used via function call and no_longmode will not
return, then we don't need to add extra code for jumping back.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-20-git-send-email-yinghai@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:15 -08:00
Yinghai Lu 3db07e70f0 x86, boot: Pass cmd_line_ptr with unsigned long instead
boot/compressed/misc.c is used for bzImage in 64bit and 32bit, and
cmd_line_ptr could point to buffer that is above 4g, cmd_line_ptr
should be 64bit otherwise high 32bit will be capped out.

So need to change data type to unsigned long, that will be 64bit get
correct address of command line buffer.

And it is still ok with 32bit bzImage, because unsigned long on 32bit kernel
is still 32bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-19-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:09 -08:00
Yinghai Lu 16a4baa642 x86, boot: Move checking of cmd_line_ptr out of common path
cmdline.c::__cmdline_find_option... are shared between 16-bit setup code
and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr is less 1M.
as those cmdline could be put above 1M, or even 4G.

Move out accessible checking out of __cmdline_find_option()
So decompressor in misc.c can parse cmdline correctly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-18-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:26:01 -08:00
Yinghai Lu f1da834cd9 x86, boot: Add get_cmd_line_ptr()
Add an accessor function for the command line address.
Later we will add support for holding a 64-bit address via ext_cmd_line_ptr.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:25:45 -08:00
Yinghai Lu a8a51a88d5 x86: Add get_ramdisk_image/size()
There are several places to find ramdisk information early for reserving
and relocating.

Use accessor functions to make code more readable and consistent.

Later will add ext_ramdisk_image/size in those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-16-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:21:05 -08:00
Yinghai Lu 1b8c78be01 x86: Merge early_reserve_initrd for 32bit and 64bit
They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-15-git-send-email-yinghai@kernel.org
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:41 -08:00
Yinghai Lu 100542306f x86, 64bit: Don't set max_pfn_mapped wrong value early on native path
We are not having max_pfn_mapped set correctly until init_memory_mapping.
So don't print its initial value for 64bit

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

-v2: update comments about max_pfn_mapped according to Stefano Stabellini.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:16 -08:00
Yinghai Lu 6b9c75aca6 x86, 64bit: #PF handler set page to cover only 2M per #PF
We only map a single 2 MiB page per #PF, even though we should be able
to do this a full gigabyte at a time with no additional memory cost.
This is a workaround for a broken AMD reference BIOS (and its
derivatives in shipping system) which maps a large chunk of memory as
WB in the MTRR system but will #MC if the processor wanders off and
tries to prefetch that memory, which can happen any time the memory is
mapped in the TLB.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-13-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
[ hpa: rewrote the patch description ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:13 -08:00
H. Peter Anvin 8170e6bed4 x86, 64bit: Use a #PF handler to materialize early mappings on demand
Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
64-bit code has to use page tables.  This makes it awkward before we
have first set up properly all-covering page tables to access objects
that are outside the static kernel range.

So far we have dealt with that simply by mapping a fixed amount of
low memory, but that fails in at least two upcoming use cases:

1. We will support load and run kernel, struct boot_params, ramdisk,
   command line, etc. above the 4 GiB mark.
2. need to access ramdisk early to get microcode to update that as
   early possible.

We could use early_iomap to access them too, but it will make code to
messy and hard to be unified with 32 bit.

Hence, set up a #PF table and use a fixed number of buffers to set up
page tables on demand.  If the buffers fill up then we simply flush
them and start over.  These buffers are all in __initdata, so it does
not increase RAM usage at runtime.

Thus, with the help of the #PF handler, we can set the final kernel
mapping from blank, and switch to init_level4_pgt later.

During the switchover in head_64.S, before #PF handler is available,
we use three pages to handle kernel crossing 1G, 512G boundaries with
sharing page by playing games with page aliasing: the same page is
mapped twice in the higher-level tables with appropriate wraparound.
The kernel region itself will be properly mapped; other mappings may
be spurious.

early_make_pgtable is using kernel high mapping address to access pages
to set page table.

-v4: Add phys_base offset to make kexec happy, and add
	init_mapping_kernel()   - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
     also move back init_level4_pgt from BSS to DATA again.
     because we have to clear it anyway.  - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
     it is with fill 512,8,0 already in head_64.S  - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
     let early_trap_init to trash that early #PF handler.
     So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
     touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
     to fix failure that is reported by Konrad on AMD systems.  - Yinghai

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:20:06 -08:00
Yinghai Lu 4f7b92263a x86, realmode: Separate real_mode reserve and setup
After we switch to use #PF handler help to set page table, init_level4_pgt
will only have entries set after init_mem_mapping().
We need to move copying init_level4_pgt to trampoline_pgd after that.

So split reserve and setup, and move the setup after init_mem_mapping()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-11-git-send-email-yinghai@kernel.org
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:13:24 -08:00
Yinghai Lu 9735e91e9c x86, 64bit, realmode: Use init_level4_pgt to set trampoline_pgd directly
with #PF handler way to set early page table, level3_ident will go away with
64bit native path.

So just use entries in init_level4_pgt to set them in trampoline_pgd.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-10-git-send-email-yinghai@kernel.org
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:30 -08:00
Yinghai Lu fa2bbce985 x86, 64bit: Copy struct boot_params early
We want to support struct boot_params (formerly known as the
zero-page, or real-mode data) above the 4 GiB mark.  We will have #PF
handler to set page table for not accessible ram early, but want to
limit it before x86_64_start_reservations to limit the code change to
native path only.

Also we will need the ramdisk info in struct boot_params to access the microcode
blob in ramdisk in x86_64_start_kernel, so copy struct boot_params early makes
it accessing ramdisk info simple.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-9-git-send-email-yinghai@kernel.org
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:26 -08:00
Yinghai Lu aece27851d x86, 64bit, mm: Add generic kernel/ident mapping helper
It is simple version for kernel_physical_mapping_init.
it will work to build one page table that will be used later.

Use mapping_info to control
        1. alloc_pg_page method
        2. if PMD is EXEC,
        3. if pgd is with kernel low mapping or ident mapping.

Will use to replace some local versions in kexec, hibernation and etc.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-8-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:25 -08:00
Yinghai Lu 231b3642a3 x86, realmode: Set real_mode permissions early
Trampoline code is executed by APs with kernel low mapping on 64bit.
We need to set trampoline code to EXEC early before we boot APs.

Found the problem after switching to #PF handler set page table,
and we do not set initial kernel low mapping with EXEC anymore in
arch/x86/kernel/head_64.S.

Change to use early_initcall instead that will make sure trampoline
will have EXEC set.

-v2: Merge two comments according to Borislav Petkov <bp@alien8.de>

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-7-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:25 -08:00
Yinghai Lu c2bdee594e x86, 64bit, mm: Make pgd next calculation consistent with pud/pmd
Just like the way we calculate next for pud and pmd, aka round down and
add size.

Also, do not do boundary-checking with 'next', and just pass 'end' down
to phys_pud_init() instead. Because the loop in phys_pud_init() stops at
PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-6-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:24 -08:00
Yinghai Lu b422a30917 x86: Factor out e820_add_kernel_range()
Separate out the reservation of the kernel static memory areas into a
separate function.

Also add support for case when memmap=xxM$yyM is used without exactmap.
Need to remove reserved range at first before we add E820_RAM
range, otherwise added E820_RAM range will be ignored.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-5-git-send-email-yinghai@kernel.org
Cc: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:24 -08:00
Yinghai Lu c9b3234a6a x86, mm: Fix page table early allocation offset checking
During debugging loading kernel above 4G, found that one page is not used
in pre-allocated BRK area for early page allocation.
pgt_buf_top is address that can not be used, so should check if that new
end is above that top, otherwise last page will not be used.

Fix that checking and also add print out for allocation from pre-allocated
BRK area to catch possible bugs later.

But after we get back that page for pgt, it tiggers one bug in pgt allocation
with xen: We need to avoid to use page as pgt to map range that is
overlapping with that pgt page.

Add checking about overlapping, when it happens, use memblock allocation
instead.  That fixes crash on Xen PV guest with 2G that Stefan found.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-2-git-send-email-yinghai@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:12:23 -08:00
H. Peter Anvin de65d816aa Merge remote-tracking branch 'origin/x86/boot' into x86/mm2
Coming patches to x86/mm2 require the changes and advanced baseline in
x86/boot.

Resolved Conflicts:
	arch/x86/kernel/setup.c
	mm/nobootmem.c

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:10:15 -08:00
John Stultz 6f16eebe1f timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK
Jason pointed out the HAS_PERSISTENT_CLOCK name isn't
quite accurate for the config, as some systems may have
the persistent_clock in some cases, but not always.

So change the config name to the more clear
ALWAYS_USE_PERSISTENT_CLOCK.

Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-29 14:40:12 -08:00
David S. Miller f1e7b73acc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:32:13 -05:00
Maarten Lankhorst 739701888f x86, efi: remove attribute check from setup_efi_pci
It looks like the original commit that copied the rom contents from
efi always copied the rom, and the fixup in setup_efi_pci from commit
886d751a2e ("x86, efi: correct precedence of operators in
setup_efi_pci") broke that.

This resulted in macbook pro's no longer finding the rom images, and
thus not being able to use the radeon card any more.

The solution is to just remove the check for now, and always copy the
rom if available.

Reported-by: Vitaly Budovski <vbudovski+news@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-29 17:52:06 +00:00
Rafael J. Wysocki 8b4e2fa4ff Merge branch 'acpi-lpss' into acpi-cleanup
The following commits depend on the 'acpi-lpss' material.
2013-01-29 13:59:00 +01:00
Olaf Hering ffca80b556 x86, efi: fix comment typo in head_32.S
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-29 10:54:06 +01:00
Jiri Kosina 617677295b Merge branch 'master' into for-next
Conflicts:
	drivers/devfreq/exynos4_bus.c

Sync with Linus' tree to be able to apply patches that are
against newer code (mvneta).
2013-01-29 10:48:30 +01:00
H. Peter Anvin 5dcd14ecd4 x86, boot: Sanitize boot_params if not zeroed on creation
Use the new sentinel field to detect bootloaders which fail to follow
protocol and don't initialize fields in struct boot_params that they
do not explicitly initialize to zero.

Based on an original patch and research by Yinghai Lu.
Changed by hpa to be invoked both in the decompression path and in the
kernel proper; the latter for the case where a bootloader takes over
decompression.

Originally-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 01:22:17 -08:00
Yang Zhang c7c9c56ca2 x86, apicv: add virtual interrupt delivery support
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29 10:48:19 +02:00
Yang Zhang 8d14695f95 x86, apicv: add virtual x2apic support
basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.

Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
               except APIC ID and TMCCT which need software's assistance to
               get right value.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29 10:48:06 +02:00
Yang Zhang 83d4c28693 x86, apicv: add APICv register virtualization support
- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29 10:47:54 +02:00
Ingo Molnar e9b6025bf8 Cleanup X86 IOAPIC code from interrupt remapping details
These patches move all interrupt remapping specific checks out of the
 x86 core code and replaces the respective call-sites with function
 pointers. As a result the interrupt remapping code is better abstraced
 from x86 core interrupt handling code.
 
 The code was rebased to v3.8-rc4 and tested on systems with AMD-Vi and
 Intel VT-d (both capable of interrupt remapping). The systems were
 tested with IOMMU enabled and with IOMMU disabled. No issues were found.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRBpGoAAoJECvwRC2XARrjxn4P/2sNpncRptR37HBP4Dl+Rd+u
 lnp9agUzzHQ67j+1o4bYGknhgfxGqDHJKNW41AinciE1EiyX1QAdFUK1mNEmtRrL
 dgsoZZdZjKIlNU8rsTon18UJvUT8wR9s8I/+DAVzO5WkRbCQWQ0kC9Ft4QQSKIfs
 M0dg8eAauIGp7/UsmttZIY9PT1KOhMWPDAaCsEYSLgfXO3VzXNtlZIjQI8bQJhSs
 DLCoge7bHfUIk13p5WM73qMFDXEXNkHgTDqMjXYB9uJiHSWyGg1vs2dV5rc0fbHQ
 KBin1NwFwT2HveMlGVy1DVgbdDEy5e7uvPnyHIwI6JedAA2MxYqOsL/VbLnRCgvt
 OH3bHoGrBJDBEwv/tleyf8SyJ6TO4px6TlOzUlIHHu+33Bc0uKigKXLuXwvTus0q
 3cP++X6jW2yn0f65Lpr7LTu/02FRaKMvHWpU1T7yc+rXwJREn1/0bEdYC/fb1u3N
 GT2aI2orlGMwLtJ1mcxbveKWBnPN5RBrOe0Zha6ycVw7aJol2fAIczSnUq/6SHDQ
 eFXS7nMqb2J5p8IgQbB5wHDovsqWEBbGC9bEYkjxQEjqD8YgLnIHDR4KgRzhFI1f
 LX19+wsBkxzwQX8sImEaKzWMoSAl0Sui01s0tc7nJkwfhnbMhIes7pfqkUx0d4ZM
 AK0It3VSCw4hCnCUC+b2
 =2d89
 -----END PGP SIGNATURE-----

Merge tag 'ioapic-cleanups-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into x86/apic

Pull "x86 IOAPIC code from interrupt remapping details cleanups" from
Joerg Roedel:

 "These patches move all interrupt remapping specific checks out of the
  x86 core code and replaces the respective call-sites with function
  pointers. As a result the interrupt remapping code is better abstraced
  from x86 core interrupt handling code.

  The code was rebased to v3.8-rc4 and tested on systems with AMD-Vi and
  Intel VT-d (both capable of interrupt remapping). The systems were
  tested with IOMMU enabled and with IOMMU disabled. No issues were found."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-29 09:14:11 +01:00
Alok N Kataria 3b4a505821 x86, kvm: Fix intialization warnings in kvm.c
With commit:

  4cca6ea04d ("x86/apic: Allow x2apic without IR on VMware platform")

we started seeing "incompatible initialization" warning messages,
since x2apic_available() expects a bool return type while
kvm_para_available() returns an int.

Reported by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-29 09:13:49 +01:00
David Woodhouse 2b9b6d8c71 x86: Require MOVBE feature in cpuid when we use it
Add MOVBE to asm/required-features.h so we check for it during startup
and don't bother checking for it later.

CONFIG_MATOM is used because it corresponds to -march=atom in the
Makefiles.  If the rules get more complicated it may be necessary to
make this an explicit Kconfig option which uses -mmovbe/-mno-movbe to
control the use of this instruction explicitly.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1359395390.3529.65.camel@shinybook.infradead.org
[ hpa: added a patch description ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-28 16:59:55 -08:00
David Woodhouse 83a57a4de1 x86: Enable ARCH_USE_BUILTIN_BSWAP
With -mmovbe enabled (implicit with -march=atom), this allows the
compiler to use the movbe instruction. This doesn't have a significant
effect on code size (unlike on PowerPC), because the movbe instruction
actually takes as many bytes to encode as a simple mov and a bswap. But
for Atom in particular I believe it should give a performance win over
the mov+bswap alternative. That was kind of why movbe was invented in
the first place, after all...

I've done basic functionality testing with IPv6 and Legacy IP, but no
performance testing. The EFI firmware on my test box unfortunately no
longer starts up.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1355966180.18919.102.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-28 08:48:57 -08:00
Joerg Roedel a1bb20c232 x86, irq: Move irq_remapped out of x86 core code
The irq_remapped function is only used in IOMMU code after
the last patch. So move its definition there too.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:51:52 +01:00
Joerg Roedel da165322df x86, io_apic: Introduce eoi_ioapic_pin call-back
This callback replaces the old __eoi_ioapic_pin function
which needs a special path for interrupt remapping.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:51:52 +01:00
Joerg Roedel 7601384f91 x86, msi: Introduce x86_msi.compose_msi_msg call-back
This call-back points to the right function for initializing
the msi_msg structure. The old code for msi_msg generation
was split up into the irq-remapped and the default case.

The irq-remapped case just calls into the specific Intel or
AMD implementation when the device is behind an IOMMU.
Otherwise the default function is called.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:42:48 +01:00
Joerg Roedel 2976fd8417 x86, irq: Introduce setup_remapped_irq()
This function does irq-remapping specific interrupt setup
like modifying the chip defaults.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:28 +01:00
Joerg Roedel 11b4a1cc38 x86, irq: Move irq_remapped() check into free_remapped_irq
The function is called unconditionally now in IO-APIC code
removing another irq_remapped() check from x86 core code.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:27 +01:00
Joerg Roedel 9f9d39e403 x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq()
This function is only called from default_ioapic_set_affinity()
which is only used when interrupt remapping is disabled
since the introduction of the set_affinity function pointer.
So the check will always evaluate as true and can be
removed.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:27 +01:00
Joerg Roedel 9b1b0e42f5 x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core
Move all the code to either to the header file
asm/irq_remapping.h or to drivers/iommu/.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:27 +01:00
Joerg Roedel 819508d302 x86, irq: Add data structure to keep AMD specific irq remapping information
Add a data structure to store information the IOMMU driver
can use to get from a 'struct irq_cfg' to the remapping
entry.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:27 +01:00
Joerg Roedel 078e1ee26a x86, irq: Move irq_remapping_enabled declaration to iommu code
Remove the last left-over from this flag from x86 code.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:26 +01:00
Joerg Roedel 1d254428c0 x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin
This function is only called when irq-remapping is disabled.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:26 +01:00
Joerg Roedel 6a9f5de272 x86, io_apic: Move irq_remapping_enabled checks out of check_timer()
Move these checks to IRQ remapping code by introducing the
panic_on_irq_remap() function.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:26 +01:00
Joerg Roedel a6a25dd327 x86, io_apic: Convert setup_ioapic_entry to function pointer
This pointer is changed to a different function when IRQ
remapping is enabled.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:26 +01:00
Joerg Roedel 373dd7a27f x86, io_apic: Introduce set_affinity function pointer
With interrupt remapping a special function is used to
change the affinity of an IO-APIC interrupt. Abstract this
with a function pointer.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:26 +01:00
Joerg Roedel 5afba62cc8 x86, msi: Use IRQ remapping specific setup_msi_irqs routine
Use seperate routines to setup MSI IRQs for both
irq_remapping_enabled cases.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 12:17:25 +01:00
Joerg Roedel 71054d8841 x86, hpet: Introduce x86_msi_ops.setup_hpet_msi
This function pointer can be overwritten by the IRQ
remapping code. The irq_remapping_enabled check can be
removed from default_setup_hpet_msi.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 10:48:30 +01:00
Joerg Roedel afcc8a40a0 x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging
This call-back is used to dump IO-APIC entries for debugging
purposes into the kernel log. VT-d needs a special routine
for this and will overwrite the default.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 10:48:30 +01:00
Joerg Roedel 1c4248ca4e x86, io_apic: Introduce x86_io_apic_ops.disable()
This function pointer is used to call a system-specific
function for disabling the IO-APIC. Currently this is used
for IRQ remapping which has its own disable routine.

Also introduce the necessary infrastructure in the interrupt
remapping code to overwrite this and other function pointers
as necessary by interrupt remapping.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 10:48:30 +01:00
Joerg Roedel 336224ba5e x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume
IO-APIC and PIC use the same resume routines when IRQ
remapping is enabled or disabled. So it should be safe to
mask the other APICs for the IRQ-remapping-disabled case
too.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 10:48:29 +01:00
Joerg Roedel 70733e0c7e x86, apic: Move irq_remapping_enabled checks into IRQ-remapping code
Move the three easy to move checks in the x86' apic.c file
into the IRQ-remapping code.

Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28 10:48:29 +01:00
David Woodhouse 99f857db88 x86, build: Dynamically find entry points in compressed startup code
We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.

Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.

Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.

This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.

Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.

This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.

[ hpa: some of the issues in the description are addressed or
  retconned by the 2.12 boot protocol.  This patch has been edited to
  only remove fixed addresses that were *not* thus retconned. ]

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse b607e21267 x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode
The 'Attributes' argument to pci->Attributes() function is 64-bit. So
when invoking in 32-bit mode it takes two registers, not just one.

This fixes memory corruption when booting via the 32-bit EFI boot stub.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse f791620fa7 x86, efi: Fix 32-bit EFI handover protocol entry point
If the bootloader calls the EFI handover entry point as a standard function
call, then it'll have a return address on the stack. We need to pop that
before calling efi_main(), or the arguments will all be out of position on
the stack.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
David Woodhouse 70a479cbe8 x86, efi: Fix display detection in EFI boot stub
When booting under OVMF we have precisely one GOP device, and it
implements the ConOut protocol.

We break out of the loop when we look at it... and then promptly abort
because 'first_gop' never gets set. We should set first_gop *before*
breaking out of the loop. Yes, it doesn't really mean "first" any more,
but that doesn't matter. It's only a flag to indicate that a suitable
GOP was found.

In fact, we'd do just as well to initialise 'width' to zero in this
function, then just check *that* instead of first_gop. But I'll do the
minimal fix for now (and for stable@).

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
2013-01-27 20:19:37 -08:00
H. Peter Anvin 09c205afde x86, boot: Define the 2.12 bzImage boot protocol
Define the 2.12 bzImage boot protocol: add xloadflags and additional
fields to allow the command line, initramfs and struct boot_params to
live above the 4 GiB mark.

The xloadflags now communicates if this is a 64-bit kernel with the
legacy 64-bit entry point and which of the EFI handover entry points
are supported.

Avoid adding new read flags to loadflags because of claimed
bootloaders testing the whole byte for == 1 to determine bzImageness
at least until the issue can be researched further.

This is based on patches by Yinghai Lu and David Woodhouse.

Originally-by: Yinghai Lu <yinghai@kernel.org>
Originally-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
2013-01-27 15:56:37 -08:00
Cong Ding 65315d4889 x86/boot: Fix minor fd leakage in tools/relocs.c
The opened file should be closed.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Cc: Kusanagi Kouichi <slash@ac.auone-net.jp>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1358183628-27784-1-git-send-email-dinggnu@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-27 10:24:28 -08:00
Frederic Weisbecker 6fac4829ce cputime: Use accessors to read task cputime stats
This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27 19:23:31 +01:00
Avi Kivity 3f0c3d0bb2 KVM: x86 emulator: fix test_cc() build failure on i386
'pushq' doesn't exist on i386.  Replace with 'push', which should work
since the operand is a register.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-27 11:09:38 +02:00
Greg Kroah-Hartman 422d26b6ec Merge 3.8-rc5 into driver-core-next
This resolves a gpio driver merge issue pointed out in linux-next.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 21:06:30 -08:00
Dave Hansen 5dfd486c47 x86, kvm: Fix kvm's use of __pa() on percpu areas
In short, it is illegal to call __pa() on an address holding
a percpu variable.  This replaces those __pa() calls with
slow_virt_to_phys().  All of the cases in this patch are
in boot time (or CPU hotplug time at worst) code, so the
slow pagetable walking in slow_virt_to_phys() is not expected
to have a performance impact.

The times when this actually matters are pretty obscure
(certain 32-bit NUMA systems), but it _does_ happen.  It is
important to keep KVM guests working on these systems because
the real hardware is getting harder and harder to find.

This bug manifested first by me seeing a plain hang at boot
after this message:

	CPU 0 irqstacks, hard=f3018000 soft=f301a000

or, sometimes, it would actually make it out to the console:

[    0.000000] BUG: unable to handle kernel paging request at ffffffff

I eventually traced it down to the KVM async pagefault code.
This can be worked around by disabling that code either at
compile-time, or on the kernel command-line.

The kvm async pagefault code was injecting page faults in
to the guest which the guest misinterpreted because its
"reason" was not being properly sent from the host.

The guest passes a physical address of an per-cpu async page
fault structure via an MSR to the host.  Since __pa() is
broken on percpu data, the physical address it sent was
bascially bogus and the host went scribbling on random data.
The guest never saw the real reason for the page fault (it
was injected by the host), assumed that the kernel had taken
a _real_ page fault, and panic()'d.  The behavior varied,
though, depending on what got corrupted by the bad write.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.com
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:34:55 -08:00
Dave Hansen d765653445 x86, mm: Create slow_virt_to_phys()
This is necessary because __pa() does not work on some kinds of
memory, like vmalloc() or the alloc_remap() areas on 32-bit
NUMA systems.  We have some functions to do conversions _like_
this in the vmalloc() code (like vmalloc_to_page()), but they
do not work on sizes other than 4k pages.  We would potentially
need to be able to handle all the page sizes that we use for
the kernel linear mapping (4k, 2M, 1G).

In practice, on 32-bit NUMA systems, the percpu areas get stuck
in the alloc_remap() area.  Any __pa() call on them will break
and basically return garbage.

This patch introduces a new function slow_virt_to_phys(), which
walks the kernel page tables on x86 and should do precisely
the same logical thing as __pa(), but actually work on a wider
range of memory.  It should work on the normal linear mapping,
vmalloc(), kmap(), etc...

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212433.4D1FCA62@kernel.stglabs.ibm.com
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:33:23 -08:00
Dave Hansen f3c4fbb68e x86, mm: Use new pagetable helpers in try_preserve_large_page()
try_preserve_large_page() can be slightly simplified by using
the new page_level_*() helpers.  This also moves the 'level'
over to the new pg_level enum type.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212432.14F3D993@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:33:23 -08:00
Dave Hansen 4cbeb51b86 x86, mm: Pagetable level size/shift/mask helpers
I plan to use lookup_address() to walk the kernel pagetables
in a later patch.  It returns a "pte" and the level in the
pagetables where the "pte" was found.  The level is just an
enum and needs to be converted to a useful value in order to
do address calculations with it.  These helpers will be used
in at least two places.

This also gives the anonymous enum a real name so that no one
gets confused about what they should be passing in to these
helpers.

"PTE_SHIFT" was chosen for naming consistency with the other
pagetable levels (PGD/PUD/PMD_SHIFT).

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212431.405D3A8C@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:33:22 -08:00
Dave Hansen a25b931684 x86, mm: Make DEBUG_VIRTUAL work earlier in boot
The KVM code has some repeated bugs in it around use of __pa() on
per-cpu data.  Those data are not in an area on which using
__pa() is valid.  However, they are also called early enough in
boot that __vmalloc_start_set is not set, and thus the
CONFIG_DEBUG_VIRTUAL debugging does not catch them.

This adds a check to also verify __pa() calls against max_low_pfn,
which we can use earler in boot than is_vmalloc_addr().  However,
if we are super-early in boot, max_low_pfn=0 and this will trip
on every call, so also make sure that max_low_pfn is set before
we try to use it.

With this patch applied, CONFIG_DEBUG_VIRTUAL will actually
catch the bug I was chasing (and fix later in this series).

I'd love to find a generic way so that any __pa() call on percpu
areas could do a BUG_ON(), but there don't appear to be any nice
and easy ways to check if an address is a percpu one.  Anybody
have ideas on a way to do this?

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130122212430.F46F8159@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:33:22 -08:00
H. Peter Anvin 7b5c4a65cc Linux 3.8-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRAuO3AAoJEHm+PkMAQRiGbfAH/1C3QQKB11aBpYLAw7qijAze
 yOui26UCnwRryxsO8zBCQjGoByy5DvY/Q0zyUCWUE6nf/JFSoKGUHzfJ1ATyzGll
 3vENP6Fnmq0Hgc4t8/gXtXrZ1k/c43cYA2XEhDnEsJlFNmNj2wCQQj9njTNn2cl1
 k6XhZ9U1V2hGYpLL5bmsZiLVI6dIpkCVw8d4GZ8BKxSLUacVKMS7ml2kZqxBTMgt
 AF6T2SPagBBxxNq8q87x4b7vyHYchZmk+9tAV8UMs1ecimasLK8vrRAJvkXXaH1t
 xgtR0sfIp5raEjoFYswCK+cf5NEusLZDKOEvoABFfEgL4/RKFZ8w7MMsmG8m0rk=
 =m68Y
 -----END PGP SIGNATURE-----

Merge tag 'v3.8-rc5' into x86/mm

The __pa() fixup series that follows touches KVM code that is not
present in the existing branch based on v3.7-rc5, so merge in the
current upstream from Linus.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:31:21 -08:00
H. Peter Anvin 3596f5bb0a Merge branch 'x86/mm' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/tip/tip into x86/mm
Add missing patch from the __pa_symbol conversion series by Alexander
Duyck.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25 16:05:40 -08:00
Paul Gortmaker 43720bd601 PM / tracing: remove deprecated power trace API
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-26 00:39:12 +01:00
Rafael J. Wysocki 51fac8388a ACPI: Remove useless type argument of driver .remove() operation
The second argument of ACPI driver .remove() operation is only used
by the ACPI processor driver and the value passed to that driver
through it is always available from the given struct acpi_device
object's removal_type field.  For this reason, the second ACPI driver
.remove() argument is in fact useless, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2013-01-26 00:37:24 +01:00
Vivien Didelot 7d0291256c x86: Add TS-5500 platform support
The Technologic Systems TS-5500 is an x86-based (AMD Elan SC520)
single board computer. This driver registers most of its devices
and exposes sysfs attributes for information such as jumpers'
state or presence of some of its options.

This driver currently registers the TS-5500 platform, its
on-board LED, 2 pin blocks (GPIO) and its analog/digital
converter. It can be extended to support other Technologic
Systems products, such as the TS-5600.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Savoir-faire Linux Inc. <kernel@savoirfairelinux.com>
Link: http://lkml.kernel.org/r/1357334294-12760-1-git-send-email-vivien.didelot@savoirfairelinux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-25 19:40:23 +01:00
Matt Fleming 712ba9e9af x86, efi: Set runtime_version to the EFI spec revision
efi.runtime_version is erroneously being set to the value of the
vendor's firmware revision instead of that of the implemented EFI
specification. We can't deduce which EFI functions are available based
on the revision of the vendor's firmware since the version scheme is
likely to be unique to each vendor.

What we really need to know is the revision of the implemented EFI
specification, which is available in the EFI System Table header.

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-25 12:00:16 +00:00
Jan Beulich bc754790f9 x86, efi: fix 32-bit warnings in setup_efi_pci()
Fix four similar build warnings on 32-bit (casts between different
size pointers and integers).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Stefan Hasko <hasko.stevo@gmail.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-25 10:22:53 +00:00
Jan Beulich f317820cb6 x86/xor: Add alternative SSE implementation only prefetching once per 64-byte line
On CPUs with 64-byte last level cache lines, this yields roughly
10% better performance, independent of CPU vendor or specific
model (as far as I was able to test).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/5093E4B802000078000A615E@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-25 09:23:50 +01:00
Jan Beulich e8f6e3f8a1 x86/xor: Unify SSE-base xor-block routines
Besides folding duplicate code, this has the advantage of fixing
x86-64's failure to use proper (para-virtualizable) accessors
for dealing with CR0.TS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/5093E47602000078000A615B@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-25 09:23:50 +01:00
Chen Gang 349eab6eb0 x86/process: Change %8s to %s for pr_warn() in release_thread()
the length of dead_task->comm[] is 16 (TASK_COMM_LEN)
on pr_warn(), it is not meaningful to use %8s for task->comm[].

So change it to %s, since the line is not solid anyway.

Additional information:

 %8s  limit the width, not for the original string output length
      if name length is more than 8, it still can be fully displayed.
      if name length is less than 8, the ' ' will be filled before name.

 %.8s truly limit the original string output length (precision)

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Link: http://lkml.kernel.org/n/tip-nridm1zvreai1tgfLjuexDmd@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 18:11:03 +01:00
Alan Cox c903f0456b x86/msr: Add capabilities check
At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.

In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.

Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Horses <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:37:51 +01:00
Maarten Lankhorst 73b664ceb5 x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES
I ran out of free entries when I had CONFIG_DMA_API_DEBUG
enabled. Some other archs seem to default to 65536, so increase
this limit for x86 too.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/50A612AA.7040206@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
----
2013-01-24 17:34:18 +01:00
Alexander Gordeev 51906e779f x86/MSI: Support multiple MSIs in presense of IRQ remapping
The MSI specification has several constraints in comparison with
MSI-X, most notable of them is the inability to configure MSIs
independently. As a result, it is impossible to dispatch
interrupts from different queues to different CPUs. This is
largely devalues the support of multiple MSIs in SMP systems.

Also, a necessity to allocate a contiguous block of vector
numbers for devices capable of multiple MSIs might cause a
considerable pressure on x86 interrupt vector allocator and
could lead to fragmentation of the interrupt vectors space.

This patch overcomes both drawbacks in presense of IRQ remapping
and lets devices take advantage of multiple queues and per-IRQ
affinity assignments.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c8bd86ff56b5fc118257436768aaa04489ac0a4c.1353324359.git.agordeev@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:25:12 +01:00
Jan Beulich 9611dc7a8d x86: Convert a few mistaken __cpuinit annotations to __init
The first two are functions serving as initcalls; the SFI one is
only being called from __init code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/50AFB35102000078000AAECA@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:12:19 +01:00
Jan Beulich 13f0e4d2b9 x86/EFI: Properly init-annotate BGRT code
These items are only ever referenced from initialization code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <mjg@redhat.com>
Link: http://lkml.kernel.org/r/50AFB29F02000078000AAE8E@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 17:12:18 +01:00
Yuanhan Liu e3e81aca8d x86: Fix a typo
legact -> legacy

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:22:10 +01:00
Youquan Song 923d8697e2 x86/perf: Add IvyBridge EP support
Running the perf utility on a Ivybridge EP server we encounter
"not supported" events:

   <not supported> L1-dcache-loads
   <not supported> L1-dcache-load-misses
   <not supported> L1-dcache-stores
   <not supported> L1-dcache-store-misses
   <not supported> L1-dcache-prefetches
   <not supported> L1-dcache-prefetch-misses

This patch adds support for this processor.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Youquan Song <youquan.song@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1355851223-27705-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:14:04 +01:00
Wen Congyang f73568a059 x86/mm: Fix the argument passed to sync_global_pgds()
The address range of sync_global_pgds() should be [start, end],
but we pass [start, end) to this function.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:12:21 +01:00
Kirill A. Shutemov 602e018607 x86/mm: Convert update_mmu_cache() and update_mmu_cache_pmd() to functions
Converting macros to functions unhide type problems before
changes will be integrated and trigger problems on other
architectures.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:12:13 +01:00
yangyongqiang 9faec5be3a perf/x86: Fix P6 driver section warning
Fix a compile warning - 'a section type conflict' by removing
__initconst.

Signed-off-by: yangyongqiang <yangyongqiang01@baidu.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:04:56 +01:00
Randy Dunlap ed8e47fefc x86/olpc: Fix olpc-xo1-sci.c build errors
Fix build errors when CONFIG_INPUT=m.  This is not pretty, but
all of the OLPC kconfig options are bool instead of tristate.

  arch/x86/built-in.o: In function `send_lid_state':
    olpc-xo1-sci.c:(.text+0x1d323): undefined reference to `input_event'
    olpc-xo1-sci.c:(.text+0x1d338): undefined reference to `input_event'
  ...

In the long run, fixing this driver kconfig to be tristate
instead of bool would be a very good change.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Chris Ball <cjb@laptop.org>
Cc: Jon Nettleton <jon.nettleton@gmail.com>
Cc: Daniel Drake <dsd@laptop.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 16:00:23 +01:00
Alex Shi 57c4f43043 arch/x86/platform/uv: Fix incorrect tlb flush all issue
The flush tlb optimization code has logical issue on UV
platform.  It doesn't flush the full range at all, since it
simply ignores its 'end' parameter (and hence also the "all"
indicator) in uv_flush_tlb_others() function.

Cliff's notes:

 | I tested the patch on a UV.  It has the effect of either
 | clearing 1 or all TLBs in a cpu.  I added some debugging to
 | test for the cases when clearing all TLBs is overkill, and in
 | practice it happens very seldom.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Tested-by: Cliff Wickman <cpw@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:58:54 +01:00
Andrew Morton 55a6e622e6 arch/x86/tools/insn_sanity.c: Identify source of messages
The kernel build prints:

  Building modules, stage 2.
  TEST    posttest
  MODPOST 3821 modules
  TEST    posttest
 Success: decoded and checked 1000000 random instructions with 0
 errors (seed:0xaac4bc47)   CC      arch/x86/boot/a20.o
  CC      arch/x86/boot/cmdline.o
  AS      arch/x86/boot/copy.o
  HOSTCC  arch/x86/boot/mkcpustr
  CC      arch/x86/boot/cpucheck.o
  CC      arch/x86/boot/early_serial_console.o

which is irritating because you don't know what program is
proudly pronouncing its success.

So, as described in "console mode programming user interface
guidelines version 101" which doesn't exist, change this program
to identify the source of its messages.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:57:53 +01:00
ShuoX Liu 0927b482ae perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
These three chip are based on Atom and have different model id.
So add such three id for perf HW event support.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Cc: yanmin_zhang@intel.linux.com
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1356713324-12442-1-git-send-email-shuox.liu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 15:10:03 +01:00
Bjorn Helgaas 6125bc8b86 x86/time/rtc: Don't print extended CMOS year when reading RTC
We shouldn't print the current century every time we read the
RTC.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130104224146.15189.14874.stgit@bhelgaas.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 14:56:35 +01:00
Davidlohr Bueso 479a99a8e5 x86/srat: Simplify memory affinity init error handling
The acpi_numa_memory_affinity_init() function can fail in
several scenarios, use a single point of error return.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Link: http://lkml.kernel.org/r/1357690721.1890.15.camel@buesod1.americas.hpqcorp.net
[ Cleaned up the label naming a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 14:20:00 +01:00
Ingo Molnar 4913ae3991 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull tracing updates from Steve Rostedt.

This commit:

      tracing: Remove the extra 4 bytes of padding in events

changes the ABI. All involved parties seem to agree that it's safe to
do now, but the devil is in the details ...

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:39:31 +01:00
Alok N Kataria 4cca6ea04d x86/apic: Allow x2apic without IR on VMware platform
This patch updates x2apic initializaition code to allow x2apic
on VMware platform even without interrupt remapping support.
The hypervisor_x2apic_available hook was added in x2apic
initialization code and used by KVM and XEN, before this.
I have also cleaned up that code to export this hook through the
hypervisor_x86 structure.

Compile tested for KVM and XEN configs, this patch doesn't have
any functional effect on those two platforms.

On VMware platform, verified that x2apic is used in physical
mode on products that support this.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Reviewed-by: Doug Covelli <dcovelli@vmware.com>
Reviewed-by: Dan Hecht <dhecht@vmware.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>
Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:11:18 +01:00
Cong Ding b9975dabe3 x86/apb/timer: Remove unnecessary "if"
adev has no chance to be NULL, so we don't need to check it. It
is also dereferenced just before the check .

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1358199561-15518-1-git-send-email-dinggnu@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 13:03:26 +01:00
Ingo Molnar 786133f6e8 Merge branch 'core/irq_work' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/core
irq_work fixes and cleanups, in preparation for full dyntics support.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 12:48:41 +01:00
Dave Jones e3f0f36ddf x86/apic: Remove noisy zero-mask warning from default_send_IPI_mask_logical()
Since circa 3.5, we've had dozens of reports of people hitting
this warning. Forwarded reports have been met with silence, so
just remove the warning if no-one cares.

Example reports:

  https://bugzilla.redhat.com/show_bug.cgi?id=797687
  https://bugzilla.redhat.com/show_bug.cgi?id=867174
  https://bugzilla.redhat.com/show_bug.cgi?id=894865

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130118175847.GA27662@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 12:12:42 +01:00
Jan Beulich d59fe3f13d ix86: Tighten asmlinkage_protect() constraints
While the description of the commit that originally introduced
asmlinkage_protect() validly says that this doesn't guarantee
clobbering of the function arguments, using "m" constraints
rather than "g" ones reduces the risk (by making it less
attractive to the compiler to move those variables into
registers) and generally results in better code (because we know
the arguments are in memory anyway, and are frequently - if not
always - used just once, with the second [compiler visible] use
in asmlinkage_protect() itself being a fake one).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <roland@hack.frob.com>
Cc: <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/50FE84EC02000078000B83B7@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 11:25:59 +01:00
Kees Cook 2c922cd07a x86/cpu/hotplug: Remove CONFIG_EXPERIMENTAL dependency
The CONFIG_EXPERIMENTAL config item has not carried much meaning
for a while now and is almost always enabled by default. As
agreed during the Linux kernel summit, remove it from any
"depends on" lines in Kconfigs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20130122210119.GA311@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 11:16:30 +01:00
Jan Beulich 444723dccc x86-64: Fix unwind annotations in recent NMI changes
While in one case a plain annotation is necessary, in the other
case the stack adjustment can simply be folded into the
immediately preceding RESTORE_ALL, thus getting the correct
annotation for free.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Link: http://lkml.kernel.org/r/51010C9302000078000B9045@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24 10:56:32 +01:00
Gleb Natapov 141687869f KVM: VMX: set vmx->emulation_required only when needed.
If emulate_invalid_guest_state=false vmx->emulation_required is never
actually used, but it ends up to be always set to true since
handle_invalid_guest_state(), the only place it is reset back to
false, is never called. This, besides been not very clean, makes vmexit
and vmentry path to check emulate_invalid_guest_state needlessly.

The patch fixes that by keeping emulation_required coherent with
emulate_invalid_guest_state setting.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:31 -02:00
Gleb Natapov 378a8b099f KVM: x86: fix use of uninitialized memory as segment descriptor in emulator.
If VMX reports segment as unusable, zero descriptor passed by the emulator
before returning. Such descriptor will be considered not present by the
emulator.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:31 -02:00
Gleb Natapov 91b0aa2ca6 KVM: VMX: rename fix_pmode_dataseg to fix_pmode_seg.
The function deals with code segment too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:30 -02:00
Gleb Natapov 25391454e7 KVM: VMX: don't clobber segment AR of unusable segments.
Usability is returned in unusable field, so not need to clobber entire
AR. Callers have to know how to deal with unusable segments already
since if emulate_invalid_guest_state=true AR is not zeroed.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:28 -02:00
Gleb Natapov 218e763f45 KVM: VMX: skip vmx->rmode.vm86_active check on cr0 write if unrestricted guest is enabled
vmx->rmode.vm86_active is never true is unrestricted guest is enabled.
Make it more explicit that neither enter_pmode() nor enter_rmode() is
called in this case.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:28 -02:00
Gleb Natapov 286da4156d KVM: VMX: remove hack that disables emulation on vcpu reset/init
There is no reason for it. If state is suitable for vmentry it
will be detected during guest entry and no emulation will happen.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:27 -02:00
Gleb Natapov c5e97c80b5 KVM: VMX: if unrestricted guest is enabled vcpu state is always valid.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:26 -02:00
Gleb Natapov 2f143240cb KVM: VMX: reset CPL only on CS register write.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:26 -02:00
Gleb Natapov 1f3141e80b KVM: VMX: remove special CPL cache access during transition to real mode.
Since vmx_get_cpl() always returns 0 when VCPU is in real mode it is no
longer needed. Also reset CPL cache to zero during transaction to
protected mode since transaction may happen while CS.selectors & 3 != 0,
but in reality CPL is 0.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-24 00:40:25 -02:00
Avi Kivity 158de57f90 KVM: x86 emulator: convert a few freestanding emulations to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:40 -02:00
Avi Kivity 34b77652b9 KVM: x86 emulator: rearrange fastop definitions
Make fastop opcodes usable in other emulations.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:39 -02:00
Avi Kivity 4d7583493e KVM: x86 emulator: convert 2-operand IMUL to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:38 -02:00
Avi Kivity 11c363ba8f KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:37 -02:00
Avi Kivity 95413dc413 KVM: x86 emulator: convert INC/DEC to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:37 -02:00
Avi Kivity 9ae9febae9 KVM: x86 emulator: covert SETCC to fastop
This is a bit of a special case since we don't have the usual
byte/word/long/quad switch; instead we switch on the condition code embedded
in the instruction.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:36 -02:00
Avi Kivity 007a3b5475 KVM: x86 emulator: convert shift/rotate instructions to fastop
SHL, SHR, ROL, ROR, RCL, RCR, SAR, SAL

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:35 -02:00
Avi Kivity 0bdea06892 KVM: x86 emulator: Convert SHLD, SHRD to fastop
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-23 22:15:33 -02:00
Mika Westerberg 3d48aab1d5 x86: add support for Intel Low Power Subsystem
We are starting to see traditional SoC peripherals also in the x86 world in
chips like Intel Lynxpoint. Typically we already have a Linux driver for
the peripheral but it takes advantage of the common clk framework to
control and retrieve information about the peripheral clock.

So far there hasn't been a standard way on x86 to pass information such as
clock rate from whatever the configuration system is used to the driver,
but instead different variations have emerged, like adding this information
to the platform data.

Solve this by adding a new config option X86_INTEL_LPSS. If this is
selected we enable common clk framework (and everything else) that is
needed to support the Intel LPSS drivers.

Enabling common clk framework on x86 was originally proposed by Mark Brown.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-23 21:14:22 +01:00
David S. Miller 93b9c1ddd3 Merge branch 'testing' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1) Add a statistic counter for invalid output states and
   remove a superfluous state valid check, from Li RongQing.

2) Probe for asynchronous block ciphers instead of synchronous block
   ciphers to make the asynchronous variants available even if no
   synchronous block ciphers are found, from Jussi Kivilinna.

3) Make rfc3686 asynchronous block cipher and make use of
   the new asynchronous variant, from Jussi Kivilinna.

4) Replace some rwlocks by rcu, from Cong Wang.

5) Remove some unused defines.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23 14:00:16 -05:00
Linus Torvalds ed06ef318a perf/urgent fixes:
. revert 20b279 - require exclude_guest to use PEBS - kernel side,
   now older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.
 
 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from
   Sebastian Andrzej Siewior
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7yQhAAoJENZQFvNTUqpAjj4P/RR+8WeXpV02yzndhry+Yjav
 L/WQH0CkRRmyUA5akTcpgFrohJCEOi+auHl7ivmDX4XWFavcAkX3H1Yz1FytCOkb
 Cbb9Lv1rGdlTno8fTVUn8mNyTG64AlWrAw3ICixWw6q6I/k6SO7EkKigCxPhmY+2
 BE2EvkZmOY/PEUXgM6HtUdifORatX48p1toS7p3CDQ31cxBN5OVNZUXa1FakJpyH
 7R+1imKLsjuyi/G7Bt061LyWQkOh7L/ITWN+5Rx4RsUwRRT3vm1H9nlqUBsPS0PW
 qfkktkCmn/cFpKbfBipuglnt16jHPMfI/pghKvzx8n2uJMNEGXbfFDJefpzcdih9
 wIRgB6a5bvA8VF6Xpcn0I5JhqLAcnWTer07JgjZevjqYCdZStpbJjvE5131JjTLw
 Dnm7UshE+VFBcA3iXNX64p/X7WDJSk+SIDsJDuNe57dktFVLw76Ibb55XG18Ex7e
 c9QcIEhD1P19VzOniDZQZNEJhqnu5Vjle/eG+JRVRCm/BgoQFyJuD3EooKPN7hHR
 Op4oqf5RhDf7XNH0+Y4rOdRMZRiumdfcEl6kdcQGPJPycxpD7xCJNzBLWK/BvQgT
 Kl0AEkRC0KE2c5LyFttW+g1Byu1rctlMz2TVJDTskTm0XGOQ9mTzsQBP5rgBVt+b
 hQsMMWNVSI+jfm8bTJwx
 =LTjZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 . revert 20b279 - require exclude_guest to use PEBS - kernel side, now
   older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.

 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI,
   from Sebastian Andrzej Siewior

[ Pulling directly, Ingo would normally pull but has been unresponsive ]

* tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Fix building from 'make perf-*-src-pkg' tarballs
  perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22 14:32:07 -08:00
Oleg Nesterov 9899d11f65 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:00 -08:00
H. Peter Anvin d29a4a5fe8 This patchset adds support for federated systems where multiple memory
controllers can exist and see each other over multiple PCI domains. This
 basically means that AMD node ids can be more than 8 now and the code
 handling this is taught to incorporate PCI domain into those IDs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7tvQAAoJEBLB8Bhh3lVKAdcP/iiHMUvovmwYL4H/PUUrBmKH
 JZnESlNIxWwRsecIQSqJO+O0SQa489cnpBV59yEMrs7Cu2sOhi7/ubbwzngzannS
 ab/M1GhaXn8UJ8N9NzUnxJ0KW/1cbtN1J3YgyqJ7zx9m058l3KDpDkuIyCejzRGR
 cFQHZUgjIUL7LNaAQmGZnAgsVbVUv+yzZhzeYxXQ2h6445H10pd6JEb6/aZTUFlL
 nYv2ypWdBXr27Mc8FkKdftiFw4dLUEQsNgFbBVJZtKbi4WrSTloP3dWjwo8KCtFz
 1QgvsKlNOIl5DEXl0sSoTc8Jhi05eqPANke2oTu46fSwoHGKP2atmcroSdiHmDDM
 vH6X5K6bs1fNMxqyAjvHUnbUs2aupBMjIxobZaKBLM+arK4sSt+elRGZMZxHtjgh
 tHeMpVvkXDAtNX+C9frDSiFEkj2NNvuCJc8naXdQ1cDNLuanP7dnTfS3ZcHQShoI
 FBzz0RbRkXKXcE1sQRfzT5dmt2gdgDzFjqN4rz4fEOM3+1MFRZ/B/cKLcZ25abvc
 3uweD6e6XSfFnnpz8xkr6eek3CpwHPiNYQWLXq9ick7BW5fjcHiU0Fo2rJO6G8tq
 vRqGU6P81sHlGrE0sLMiaYv9NqTYEERuKO05WSP9ryn3tdGebvpy116DeD5w/olo
 MTcWtwp69J+Eoqr7bb9z
 =ZzvJ
 -----END PGP SIGNATURE-----

Merge tag 'numascale' into x86/platform

This patchset adds support for federated systems where multiple memory
controllers can exist and see each other over multiple PCI domains. This
basically means that AMD node ids can be more than 8 now and the code
handling this is taught to incorporate PCI domain into those IDs.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-22 08:37:34 -08:00
Xiao Guangrong 93c05d3ef2 KVM: x86: improve reexecute_instruction
The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:33 -02:00
Xiao Guangrong 95b3cf69bd KVM: x86: let reexecute_instruction work for tdp
Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:32 -02:00
Xiao Guangrong 22368028fe KVM: x86: clean up reexecute_instruction
Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-21 22:58:31 -02:00
Jun Nakajima ddd70cf93d goldfish: platform device for x86
Based on code by Jun Nakajima but stripped of all the old x86 mach-foo
stuff and turned into a single file for the Goldfish virtual bus layer.

The actual created platform device and bus enumeration is portable between the
ARM and x86 Goldfish emulations.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Link: http://lkml.kernel.org/r/20130121172205.19517.22535.stgit@bob.linux.org.uk
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Xiaohui Xin <xiaohui.xin@intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
[Ported to 3.7 and reorganised so that we can keep most of the code
 shared properly]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
2013-01-21 12:09:19 -08:00
Masami Hiramatsu f684199f5d kprobes/x86: Move kprobes stuff under arch/x86/kernel/kprobes/
Move arch-dep kprobes stuff under arch/x86/kernel/kprobes.

Link: http://lkml.kernel.org/r/20120928081522.3560.75469.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed whitespace and s/__attribute__((packed))/__packed/ ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:37 -05:00
Masami Hiramatsu e7dbfe349d kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c
Split ftrace-based kprobes code from kprobes, and introduce
CONFIG_(HAVE_)KPROBES_ON_FTRACE Kconfig flags.
For the cleanup reason, this also moves kprobe_ftrace check
into skip_singlestep.

Link: http://lkml.kernel.org/r/20120928081520.3560.25624.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:36 -05:00
Masami Hiramatsu 06aeaaeabf ftrace: Move ARCH_SUPPORTS_FTRACE_SAVE_REGS in Kconfig
Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.

Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:35 -05:00
Rusty Russell 373d4d0997 taint: add explicit flag to show whether lock dep is still OK.
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-21 17:17:57 +10:30
Herbert Xu 7983627657 crypto: crc32-pclmul - Kill warning on x86-32
This patch removes a gratuitous warning on x86-32:

arch/x86/crypto/crc32-pclmul_asm.S:87:2: warning: #warning Using 32bit code support [-Wcpp]

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 18:05:02 +11:00
Jussi Kivilinna d3f5188dfe crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:51 +11:00
Jussi Kivilinna ac9d55dd42 crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:51 +11:00
Jussi Kivilinna 2dcfd44dee crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:50 +11:00
Jussi Kivilinna 044438082c crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_*
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:50 +11:00
Jussi Kivilinna b05d3f3756 crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions
Signed-off-by: Jussi Kivilinna <jussi.kivilinn@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:49 +11:00
Jussi Kivilinna 698a5abbb0 crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:49 +11:00
Jussi Kivilinna 1985fecf01 crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:49 +11:00
Jussi Kivilinna e17e209ea4 crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:48 +11:00
Jussi Kivilinna 59990684b0 crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:48 +11:00
Jussi Kivilinna 5186e395fe crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:48 +11:00
Jussi Kivilinna 8309b745bb crypto: aesni-intel - add ENDPROC statements for assembler functions
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:47 +11:00
Jussi Kivilinna 3f29974383 crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:47 +11:00
Alexander Boyko 78c37d191d crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation
This patch adds crc32 algorithms to shash crypto api. One is wrapper to
gerneric crc32_le function. Second is crc32 pclmulqdq implementation. It
use hardware provided PCLMULQDQ instruction to accelerate the CRC32 disposal.
This instruction present from Intel Westmere and AMD Bulldozer CPUs.

For intel core i5 I got 450MB/s for table implementation and 2100MB/s
for pclmulqdq implementation.

Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20 10:16:45 +11:00
H. Peter Anvin 021ef050fc x86-32: Start out cr0 clean, disable paging before modifying cr3/4
Patch

  5a5a51db78 x86-32: Start out eflags and cr4 clean

... made x86-32 match x86-64 in that we initialize %eflags and %cr4
from scratch.  This broke OLPC XO-1.5, because the XO enters the
kernel with paging enabled, which the kernel doesn't expect.

Since we no longer support 386 (the source of most of the variability
in %cr0 configuration), we can simply match further x86-64 and
initialize %cr0 to a fixed value -- the one variable part remaining in
%cr0 is for FPU control, but all that is handled later on in
initialization; in particular, configuring %cr0 as if the FPU is
present until proven otherwise is correct and necessary for the probe
to work.

To deal with the XO case sanely, explicitly disable paging in %cr0
before we muck with %cr3, %cr4 or EFER -- those operations are
inherently unsafe with paging enabled.

NOTE: There is still a lot of 386-related junk in head_32.S which we
can and should get rid of, however, this is intended as a minimal fix
whereas the cleanup can be deferred to the next merge window.

Reported-by: Andres Salomon <dilinger@queued.net>
Tested-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-19 11:01:22 -08:00
Linus Torvalds 5c69bed266 Fixes:
- CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
  - Fix racy vma access spotted by Al Viro
  - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
  - Fix vcpu online/offline BUG:scheduling while atomic..
  - Fix unbound buffer scanning for more than 32 vCPUs.
  - Fix grant table being incorrectly initialized
  - Fix incorrect check in pciback
  - Allow privcmd in backend domains.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ+L7qAAoJEFjIrFwIi8fJLNIH/jUsneraEggWeh0L4GGWZvWL
 cNCf0zjQt/pi1Q5drbleW2/6Wv6s6N1QA9pGRsJ+rrliC73HVTqIWFh0TjpwmCVy
 hZal7jDXOuFVIR7GbGEPn004T6mkEnYDb/O2fyojwMVg0NQYwtMYJfTBkKdjKnmV
 z6sWpQPVqO3/nZ17k2DipYRldbeiqS6LLOiUWd72b2W8bV4ySY5iVPVsqFusSEr6
 PNyW33RPs5H0jEPR1uJlLD+l/uIbENykpEPeAS2uHGlch129+xHH5h79dwYJTbw6
 x5nAOveO9VNJscUoqhpE7YbySzJmrUwxnBerZ6YTW6WCknYXrx4uiVAlfWem7uY=
 =26Sq
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels)
 - Fix racy vma access spotted by Al Viro
 - Fix mmap batch ioctl potentially resulting in large O(n) page allcations.
 - Fix vcpu online/offline BUG:scheduling while atomic..
 - Fix unbound buffer scanning for more than 32 vCPUs.
 - Fix grant table being incorrectly initialized
 - Fix incorrect check in pciback
 - Allow privcmd in backend domains.

Fix up whitespace conflict due to ugly merge resolution in Xen tree in
arch/arm/xen/enlighten.c

* tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
  Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
  xen/gntdev: remove erronous use of copy_to_user
  xen/gntdev: correctly unmap unlinked maps in mmu notifier
  xen/gntdev: fix unsafe vma access
  xen/privcmd: Fix mmap batch ioctl.
  Xen: properly bound buffer access when parsing cpu/*/availability
  xen/grant-table: correctly initialize grant table version 1
  x86/xen : Fix the wrong check in pciback
  xen/privcmd: Relax access control in privcmd_ioctl_mmap
2013-01-18 12:02:52 -08:00
Rafael J. Wysocki a090b22f3f Merge branch 'acpica' into acpi-lpss
The following commits depend on the 'acpica' material.
2013-01-18 13:49:29 +01:00
Nathan Zimmer b8f2c21db3 efi, x86: Pass a proper identity mapping in efi_call_phys_prelog
Update efi_call_phys_prelog to install an identity mapping of all available
memory.  This corrects a bug on very large systems with more then 512 GB in
which bios would not be able to access addresses above not in the mapping.

The result is a crash that looks much like this.

BUG: unable to handle kernel paging request at 000000effd870020
IP: [<0000000078bce331>] 0x78bce330
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU 0
Pid: 0, comm: swapper/0 Tainted: G        W    3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
RIP: 0010:[<0000000078bce331>]  [<0000000078bce331>] 0x78bce330
RSP: 0000:ffffffff81601d28  EFLAGS: 00010006
RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
FS:  0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
Stack:
 0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
 0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
 0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
Call Trace:
 [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
 [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
 [<ffffffff81035946>] ? efi_call4+0x46/0x80
 [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
 [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
 [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
 [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
 [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
Code:  Bad RIP value.
RIP  [<0000000078bce331>] 0x78bce330
 RSP <ffffffff81601d28>
CR2: 000000effd870020
---[ end trace ead828934fef5eab ]---

Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-01-18 09:43:43 +00:00
Greg Kroah-Hartman ed408f7c0f Merge 3.9-rc4 into driver-core-next
This is to fix up a build problem with a wireless driver due to the
dynamic-debug patches in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 19:48:18 -08:00
Bjorn Helgaas 708b59bfe1 Merge branch 'pci/rafael-set-root-bridge-handle' into next
* pci/rafael-set-root-bridge-handle:
  ACPI / PCI: Set root bridge ACPI handle in advance
2013-01-17 16:00:36 -07:00
Jacob Pan 29c6fb7be1 x86/nmi: export local_touch_nmi() symbol for modules
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-01-17 22:25:57 +08:00
Andrew Cooper 9174adbee4 xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.
This fixes CVE-2013-0190 / XSA-40

There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path.  This can result in the kernel crashing.

In the classic kernel case, the relevant code looked a little like:

        popl %eax      # Error code from hypervisor
        jz 5f
        addl $16,%esp
        jmp iret_exc   # Hypervisor said iret fault
5:      addl $16,%esp
                       # Hypervisor said segment selector fault

Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.

In the PVOPS case, the code looks like:

        popl_cfi %eax         # Error from the hypervisor
        lea 16(%esp),%esp     # Add $16 before choosing fault path
        CFI_ADJUST_CFA_OFFSET -16
        jz 5f
        addl $16,%esp         # Incorrectly adjust %esp again
        jmp iret_exc

It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present.  At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.

This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-16 16:17:42 -05:00
Linus Torvalds 2409c873be Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This is mainly a workaround for a bug in Sandy Bridge graphics which
  causes corruption of certain memory pages."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
  x86/Sandy Bridge: mark arrays in __init functions as __initconst
  x86/Sandy Bridge: reserve pages when integrated graphics is present
  x86, efi: correct precedence of operators in setup_efi_pci
2013-01-16 09:11:50 -08:00
Konrad Rzeszutek Wilk d55bf532d7 Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic."
This reverts commit 41bd956de3.

The fix is incorrect and not appropiate for the latest kernels.
In fact it _causes_ the BUG: scheduling while atomic while
doing vCPU hotplug.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-15 22:41:27 -05:00
John Stultz e90c83f757 x86: Select HAS_PERSISTENT_CLOCK on x86
Select HAS_PERSISTENT_CLOCK on x86 to simplify RTC options
and allow the compiler to remove unused code.

Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15 18:16:09 -08:00
Bernd Faust 2353b47bff Round the calculated scale factor in set_cyc2ns_scale()
During some experiments with an external clock (in a FPGA), we saw that
the TSC clock drifted approx. 2.5ms per second.

This drift was caused by the current way of calculating the scale.
In our case cpu_khz had a value of 3292725. This resulted in a scale
value of 310. But when doing the calculation by hand it shows that the
actual value is 310.9886188491, so a value of 311 would be more precise.

With this change the value is rounded.

Signed-off-by: Bernd Faust <berndfaust@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15 18:16:07 -08:00
Konrad Rzeszutek Wilk 7bcc1ec077 Linux 3.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJQxqj1AAoJEHm+PkMAQRiG9MQH/j21UwP2QGpdpXbWAnFMjtlv
 uE/yCFhPoqR1QjjE6oRlO6MHFA41xGDbr5RQki9Ik2AfSYiastt4ZWYvtSJKVTCr
 O0Lj+Cdt/2qBkGiARHqVEBZ4S/l/cw4/EHPb5StFyu3ggnPPQhoPIP7oAmRn0+mh
 NNb5CEcJOLqIaJSteqMP71Q899ncbLayBnimYCaC2f6r00beqNXIqxSHipcPlUsf
 ehNxqCX+5z5Q788EL33EL8GpBcy4Ueevu6nvnuVI8qIEnBnrBVngsiaQ4Hti+2eK
 A//4DYoF2N1wLjQv7hFeiwMURQ16OlxXoc/Z66sv2QQRwPxOIQlxdhWuey4KebA=
 =7LYr
 -----END PGP SIGNATURE-----

Merge tag 'v3.7' into stable/for-linus-3.8

Linux 3.7

* tag 'v3.7': (833 commits)
  Linux 3.7
  Input: matrix-keymap - provide proper module license
  Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage
  ipv4: ip_check_defrag must not modify skb before unsharing
  Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended"
  inet_diag: validate port comparison byte code to prevent unsafe reads
  inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
  inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
  inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
  mm: vmscan: fix inappropriate zone congestion clearing
  vfs: fix O_DIRECT read past end of block device
  net: gro: fix possible panic in skb_gro_receive()
  tcp: bug fix Fast Open client retransmission
  tmpfs: fix shared mempolicy leak
  mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones
  mm: compaction: validate pfn range passed to isolate_freepages_block
  mmc: sh-mmcif: avoid oops on spurious interrupts (second try)
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
  mmc: sdhci-s3c: fix missing clock for gpio card-detect
  lib/Makefile: Fix oid_registry build dependency
  ...

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Conflicts:
	arch/arm/xen/enlighten.c
	drivers/xen/Makefile

[We need to have the v3.7 base as the 'for-3.8' was based off v3.7-rc3
and there are some patches in v3.7-rc6 that we to have in our branch]
2013-01-15 15:58:25 -05:00
Takuya Yoshikawa 6b81b05e44 KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time
If the userspace starts dirty logging for a large slot, say 64GB of
memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
a long time such as tens of milliseconds.  This patch controls the lock
hold time by asking the scheduler if we need to reschedule for others.

One penalty for this is that we need to flush TLBs before releasing
mmu_lock.  But since holding mmu_lock for a long time does affect not
only the guest, vCPU threads in other words, but also the host as a
whole, we should pay for that.

In practice, the cost will not be so high because we can protect a fair
amount of memory before being rescheduled: on my test environment,
cond_resched_lock() was called only once for protecting 12GB of memory
even without THP.  We can also revisit Avi's "unlocked TLB flush" work
later for completely suppressing extra TLB flushes if needed.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:28 +02:00
Takuya Yoshikawa 9d1beefb71 KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself
Better to place mmu_lock handling and TLB flushing code together since
this is a self-contained function.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:17 +02:00
Takuya Yoshikawa b34cb590fb KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself
No reason to make callers take mmu_lock since we do not need to protect
kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
together by mmu_lock in kvm_arch_commit_memory_region(): the former
calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
mmu_lock as can be seen from the fact that it is read locklessly.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:14:09 +02:00
Takuya Yoshikawa e12091ce7b KVM: Remove unused slot_bitmap from kvm_mmu_page
Not needed any more.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:58 +02:00
Takuya Yoshikawa b99db1d352 KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based
This makes it possible to release mmu_lock and reschedule conditionally
in a later patch.  Although this may increase the time needed to protect
the whole slot when we start dirty logging, the kernel should not allow
the userspace to trigger something that will hold a spinlock for such a
long time as tens of milliseconds: actually there is no limit since it
is roughly proportional to the number of guest pages.

Another point to note is that this patch removes the only user of
slot_bitmap which will cause some problems when we increase the number
of slots further.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:47 +02:00
Takuya Yoshikawa 245c3912ea KVM: MMU: Remove unused parameter level from __rmap_write_protect()
No longer need to care about the mapping level in this function.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:31 +02:00
Takuya Yoshikawa c972f3b125 KVM: Write protect the updated slot only when dirty logging is enabled
Calling kvm_mmu_slot_remove_write_access() for a deleted slot does
nothing but search for non-existent mmu pages which have mappings to
that deleted memory; this is safe but a waste of time.

Since we want to make the function rmap based in a later patch, in a
manner which makes it unsafe to be called for a deleted slot, we makes
the caller see if the slot is non-zero and being dirty logged.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14 11:13:15 +02:00
H. Peter Anvin e43b3cec71 x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI
early_pci_allowed() and read_pci_config_16() are only available if
CONFIG_PCI is defined.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2013-01-13 20:58:57 -08:00
H. Peter Anvin ab3cd8670e x86/Sandy Bridge: mark arrays in __init functions as __initconst
Mark static arrays as __initconst so they get removed when the init
sections are flushed.

Reported-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-13 20:36:39 -08:00
Rafael J. Wysocki 6c0cc950ae ACPI / PCI: Set root bridge ACPI handle in advance
The ACPI handles of PCI root bridges need to be known to
acpi_bind_one(), so that it can create the appropriate
"firmware_node" and "physical_node" files for them, but currently
the way it gets to know those handles is not exactly straightforward
(to put it lightly).

This is how it works, roughly:

  1. acpi_bus_scan() finds the handle of a PCI root bridge,
     creates a struct acpi_device object for it and passes that
     object to acpi_pci_root_add().

  2. acpi_pci_root_add() creates a struct acpi_pci_root object,
     populates its "device" field with its argument's address
     (device->handle is the ACPI handle found in step 1).

  3. The struct acpi_pci_root object created in step 2 is passed
     to pci_acpi_scan_root() and used to get resources that are
     passed to pci_create_root_bus().

  4. pci_create_root_bus() creates a struct pci_host_bridge object
     and passes its "dev" member to device_register().

  5. platform_notify(), which for systems with ACPI is set to
     acpi_platform_notify(), is called.

So far, so good.  Now it starts to be "interesting".

  6. acpi_find_bridge_device() is used to find the ACPI handle of
     the given device (which is the PCI root bridge) and executes
     acpi_pci_find_root_bridge(), among other things, for the
     given device object.

  7. acpi_pci_find_root_bridge() uses the name (sic!) of the given
     device object to extract the segment and bus numbers of the PCI
     root bridge and passes them to acpi_get_pci_rootbridge_handle().

  8. acpi_get_pci_rootbridge_handle() browses the list of ACPI PCI
     root bridges and finds the one that matches the given segment
     and bus numbers.  Its handle is then used to initialize the
     ACPI handle of the PCI root bridge's device object by
     acpi_bind_one().  However, this is *exactly* the ACPI handle we
     started with in step 1.

Needless to say, this is quite embarassing, but it may be avoided
thanks to commit f3fd0c8 (ACPI: Allow ACPI handles of devices to be
initialized in advance), which makes it possible to initialize the
ACPI handle of a device before passing it to device_register().

Accordingly, add a new __weak routine, pcibios_root_bridge_prepare(),
defaulting to an empty implementation that can be replaced by the
interested architecutres (x86 and ia64 at the moment) with functions
that will set the root bridge's ACPI handle before its dev member is
passed to device_register().  Make both x86 and ia64 provide such
implementations of pcibios_root_bridge_prepare() and remove
acpi_pci_find_root_bridge() and acpi_get_pci_rootbridge_handle() that
aren't necessary any more.

Included is a fix for breakage on systems with non-ACPI PCI host
bridges from Bjorn Helgaas.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-13 17:14:28 -07:00
Jesse Barnes a9acc5365d x86/Sandy Bridge: reserve pages when integrated graphics is present
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table.  So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.

Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.

[ hpa: made a number of stylistic changes, marked arrays as static
  const, and made less verbose; use "memblock=debug" for full
  verbosity. ]

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-11 14:26:38 -08:00
Kees Cook 01b35ab723 arch/x86/um: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Jeff Dike <jdike@addtoit.com>
CC: Richard Weinberger <richard@nod.at>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Richard Weinberger <richard@nod.at>
2013-01-11 11:38:05 -08:00
Kees Cook 6ea3038648 arch/x86: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2013-01-11 11:38:04 -08:00
Xiao Guangrong 7751babd3c KVM: MMU: fix infinite fault access retry
We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10 15:28:30 -02:00
Xiao Guangrong c22885050e KVM: MMU: fix Dirty bit missed if CR0.WP = 0
If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-10 15:28:08 -02:00
Linus Torvalds ccae663cd4 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: use dynamic percpu allocations for shared msrs area
  KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV
  powerpc: Corrected include header path in kvm_para.h
  Add rcu user eqs exception hooks for async page fault
2013-01-10 09:05:18 -08:00
Daniel J Blueman 8b84c8df38 x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id
Change amd_get_nb_id to return u16 to support >255 memory controllers,
and related consistency fixes.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
Daniel J Blueman 772c3ff385 x86, AMD, NB: Add multi-domain support
Fix get_node_id to match northbridge IDs from the array of detected
ones, allowing multi-server support such as with Numascale's
NumaConnect, renaming to 'amd_get_node_id' for consistency.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com
[Boris: shorten lines to fit 80 cols]
Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-10 16:17:58 +01:00
David Ahern a706d965dc perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
This patch is brought to you by the letter 'H'.

Commit 20b279 breaks compatiblity with older perf binaries when run with
precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
samples) unless host only profiling is requested (:H modifier). The workaround
for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
toggles exclude_guest to 1). This was deemed unacceptable by Linus:

https://lkml.org/lkml/2012/12/12/570

Between family in town and the fresh snow in Breckenridge there is no time left
to be working on the proper fix for this over the holidays. In the New Year I
have more pressing problems to resolve -- like some memory leaks in perf which
are proving to be elusive -- although the aforementioned snow is probably why
they are proving to be elusive. Either way I do not have any spare time to work
on this and from the time I have managed to spend on it the solution is more
difficult than just moving to a new exclude_guest flag (does not work) or
flipping the logic to include_guest (which is not as trivial as one would
think).

So, two options: silently force exclude_guest on as suggested by Gleb which
means no impact to older perf binaries or revert the original patch which
caused the breakage.

This patch does the latter -- reverts the original patch that introduced the
regression. The problem can be revisited in the future as time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-10 09:21:19 -03:00
Lv Zheng 0947c6dee3 ACPICA: Update compilation environment settings.
This patch does not affect the generation of the Linux binary.
This patch decreases 300 lines of 20121018 divergence.diff.

This patch updates architecture specific environment settings for compiling
ACPICA as such enhancement already has been done in ACPICA.

Note that the appended compiler default settings in the
<acpi/platform/acenv.h> will deprecate some of the macros defined in the
architecture specific <asm/acpi.h>. Thus two of the <asm/acpi.h> headers
have been cleaned up in this patch accordingly.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-10 12:36:17 +01:00
Avi Kivity fb864fbc72 KVM: x86 emulator: convert basic ALU ops to fastop
Opcodes:
	TEST
	CMP
	ADD
	ADC
	SUB
	SBB
	XOR
	OR
	AND

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:30 -02:00
Avi Kivity f7857f35db KVM: x86 emulator: add macros for defining 2-operand fastop emulation
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:28 -02:00
Avi Kivity 45a1467d7e KVM: x86 emulator: convert NOT, NEG to fastop
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:25 -02:00
Avi Kivity 75f728456f KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:21 -02:00
Avi Kivity b6744dc3fb KVM: x86 emulator: introduce NoWrite flag
Instead of disabling writeback via OP_NONE, just specify NoWrite.

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:18 -02:00
Avi Kivity b7d491e7f0 KVM: x86 emulator: Support for declaring single operand fastops
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:17 -02:00
Avi Kivity e28bbd44da KVM: x86 emulator: framework for streamlining arithmetic opcodes
We emulate arithmetic opcodes by executing a "similar" (same operation,
different operands) on the cpu.  This ensures accurate emulation, esp. wrt.
eflags.  However, the prologue and epilogue around the opcode is fairly long,
consisting of a switch (for the operand size) and code to load and save the
operands.  This is repeated for every opcode.

This patch introduces an alternative way to emulate arithmetic opcodes.
Instead of the above, we have four (three on i386) functions consisting
of just the opcode and a ret; one for each operand size.  For example:

   .align 8
   em_notb:
	not %al
	ret

   .align 8
   em_notw:
	not %ax
	ret

   .align 8
   em_notl:
	not %eax
	ret

   .align 8
   em_notq:
	not %rax
	ret

The prologue and epilogue are shared across all opcodes.  Note the functions
use a special calling convention; notably eflags is an input/output parameter
and is not clobbered.  Rather than dispatching the four functions through a
jump table, the functions are declared as a constant size (8) so their address
can be calculated.

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-09 17:39:17 -02:00
Bjorn Helgaas e84813c0cb Merge branch 'pci/yinghai-survey-resources' into next
* pci/yinghai-survey-resources:
  x86/PCI: Implement pcibios_resource_survey_bus()
  PCI/ACPI: Reserve firmware-allocated resources for hot-added root buses
  x86/PCI: Keep resource allocation functions after boot
  x86/PCI: Don't track firmware-assigned BAR values for hot-added devices
  x86/PCI: Factor out pcibios_allocate_dev_rom_resource()
  x86/PCI: Allocate resources on a per-bus basis for hot-adding root buses
  x86/PCI: Factor out pcibios_allocate_dev_resources()
  x86/PCI: Factor out pcibios_allocate_bridge_resources()
2013-01-09 11:37:16 -07:00
Borislav Petkov f51bde6f0d x86, MCE: Retract most UAPI exports
Retract back most macro definitions which went into the
user-visible mce.h header. Even though those bits are mostly
hardware-defined/-architectural, their naming is not. If we export them
to userspace, any kernel unification/renaming/cleanup cannot be done
anymore since those are effectively cast in stone. Besides, if userspace
wants those definitions, they can write their own defines and go crazy.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-01-09 14:49:02 +01:00
Marcelo Tosatti b09408d00f KVM: VMX: fix incorrect cached cpl value with real/v8086 modes
CPL is always 0 when in real mode, and always 3 when virtual 8086 mode.

Using values other than those can cause failures on operations that
check CPL.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-08 17:25:35 -02:00
Gleb Natapov b0cfeb5ded KVM: x86: remove unused variable from walk_addr_generic()
Fix compilation warning.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-08 17:23:39 -02:00
Marcelo Tosatti 013f6a5d3d KVM: x86: use dynamic percpu allocations for shared msrs area
Use dynamic percpu allocations for the shared msrs structure,
to avoid using the limited reserved percpu space.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-08 12:51:56 -02:00
Jussi Kivilinna 0024dc5371 crypto: aesni-intel - remove rfc3686(ctr(aes)), utilize rfc3686 from ctr-module instead
rfc3686 in CTR module is now able of using asynchronous ctr(aes) from
aesni-intel, so rfc3686(ctr(aes)) in aesni-intel is no longer needed.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-01-08 07:04:47 +01:00
Yinghai Lu b3e65e1f91 x86/PCI: Implement pcibios_resource_survey_bus()
During testing remove/rescan root bus 00, found
[  338.142574] bus: 'pci': really_probe: probing driver ata_piix with device 0000:00:01.1
[  338.146788] ata_piix 0000:00:01.1: device not available (can't reserve [io  0x01f0-0x01f7])
[  338.150565] ata_piix: probe of 0000:00:01.1 failed with error -22

because that fixed resource is not claimed.
For bootint path it is claimed in  from
        arch/x86/pci/i386.c::pcibios_allocate_resources()

Claim those resources, so on the remove/rescan will still use old
resources.

It is some kind honoring FW setting in the registers during hot add.
esp root-bus hot add is through acpi, BIOS has chance to set some registers
before handing over.

[bhelgaas: move weak definition to patch that uses it]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 15:58:48 -07:00
Yinghai Lu b95168e010 x86/PCI: Keep resource allocation functions after boot
The PCI resource allocation functions will be used for hot-added
devices, so keep them around.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 15:58:48 -07:00
Yinghai Lu 745216025d x86/PCI: Don't track firmware-assigned BAR values for hot-added devices
The BIOS doesn't assign BAR values for hot-added devices, so don't
bother saving the original values when we enumerate these devices.

[bhelgaas: changelog, return constant 0 in pcibios_retrieve_fw_addr]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 15:58:43 -07:00
Yinghai Lu dc2f56fa84 x86/PCI: Factor out pcibios_allocate_dev_rom_resource()
Factor pcibios_allocate_rom_resources() and
pcibios_allocate_dev_rom_resource() out of pcibios_assign_resources().
This will allow us to allocate ROM resources for hot-added root buses.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 15:58:34 -07:00
Yinghai Lu 83edc87ce8 x86/PCI: Allocate resources on a per-bus basis for hot-adding root buses
Previously pcibios_allocate_resources() allocated resources at boot-time
for all PCI devices using for_each_pci_dev().  This patch changes
pcibios_allocate_resources() so we can specify a bus, so we can do
similar allocation when hot-adding a root bus.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 15:58:16 -07:00
Gleb Natapov 908e7d7999 KVM: MMU: simplify folding of dirty bit into accessed_dirty
MMU code tries to avoid if()s HW is not able to predict reliably by using
bitwise operation to streamline code execution, but in case of a dirty bit
folding this gives us nothing since write_fault is checked right before
the folding code. Lets just piggyback onto the if() to make code more clear.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-07 20:31:35 -02:00
Gleb Natapov ee04e0cea8 KVM: mmu: remove unused trace event
trace_kvm_mmu_delay_free_pages() is no longer used.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-07 19:54:50 -02:00
Yinghai Lu c7f4bbc92f x86/PCI: Factor out pcibios_allocate_dev_resources()
Factor pcibios_allocate_dev_resources() out of
pcibios_allocate_resources().  Currently we only allocate these
resources at boot-time with a for_each_pci_dev() loop.  Eventually
we'll use pcibios_allocate_dev_resources() for hot-added devices, too.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 14:47:16 -07:00
Yinghai Lu f7ac356dc3 x86/PCI: Factor out pcibios_allocate_bridge_resources()
Thus pcibios_allocate_bus_resources() could more simple and clean.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-07 12:18:36 -07:00
Greg Kroah-Hartman a18e3690a5 X86: drivers: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitconst,
and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Drake <dsd@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:04 -08:00
Bjorn Helgaas b7869ba17c x86/PCI: Remove unused pci_root_bus
pci_root_bus is unused, so remove all references to it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-03 15:28:34 -07:00
Jorrit Schippers d82603c6da treewide: Replace incomming with incoming in all comments and strings
Signed-off-by: Jorrit Schippers <jorrit@ncode.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-03 16:15:49 +01:00
Gleb Natapov 0ca1b4f4ba KVM: VMX: handle IO when emulation is due to #GP in real mode.
With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
enter the vcpu with smaller segment limit than guest configured.  If the
guest tries to access pass this limit it will get #GP at which point
instruction will be emulated with correct segment limit applied. If
during the emulation IO is detected it is not handled correctly. Vcpu
thread should exit to userspace to serve the IO, but it returns to the
guest instead.  Since emulation is not completed till userspace completes
the IO the faulty instruction is re-executed ad infinitum.

The patch fixes that by exiting to userspace if IO happens during
instruction emulation.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 19:36:31 -02:00
Gleb Natapov d54d07b2ca KVM: VMX: Do not fix segment register during vcpu initialization.
Segment registers will be fixed according to current emulation policy
during switching to real mode for the first time.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 19:36:30 -02:00
Gleb Natapov d99e415275 KVM: VMX: fix emulation of invalid guest state.
Currently when emulation of invalid guest state is enable
(emulate_invalid_guest_state=1) segment registers are still fixed for
entry to vm86 mode some times. Segment register fixing is avoided in
enter_rmode(), but vmx_set_segment() still does it unconditionally.
The patch fixes it.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 19:36:29 -02:00
Gleb Natapov 89efbed02c KVM: VMX: make rmode_segment_valid() more strict.
Currently it allows entering vm86 mode if segment limit is greater than
0xffff and db bit is set. Both of those can cause incorrect execution of
instruction by cpu since in vm86 mode limit will be set to 0xffff and db
will be forced to 0.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 19:36:28 -02:00
Gleb Natapov 045a282ca4 KVM: emulator: implement fninit, fnstsw, fnstcw
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 19:36:27 -02:00
Gleb Natapov 3a78a4f463 KVM: emulator: drop RPL check from linearize() function
According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
"Privilege Level Checking When Accessing Data Segments" RPL checking is
done during loading of a segment selector, not during data access. We
already do checking during segment selector loading, so drop the check
during data access. Checking RPL during data access triggers #GP if
after transition from real mode to protected mode RPL bits in a segment
selector are set.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 19:36:26 -02:00
Jesse Larrew 11393a077d x86: kvm_para: fix typo in hypercall comments
Correct a typo in the comment explaining hypercalls.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-02 16:02:25 -02:00
Tejun Heo 4d899be584 x86/mce: don't use [delayed_]work_pending()
There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from x86/mce.  Only compile tested.

v2: Local var work removed from mce_schedule_work() as suggested by
    Borislav.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
2012-12-28 13:40:16 -08:00
Myron Stowe 1278998f8f PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)
Commit 284f5f9 was intended to disable the "only_one_child()" optimization
on Stratus ftServer systems, but its DMI check is wrong.  It looks for
DMI_SYS_VENDOR that contains "ftServer", when it should look for
DMI_SYS_VENDOR containing "Stratus" and DMI_PRODUCT_NAME containing
"ftServer".

Tested on Stratus ftServer 6400.

Reported-by: Fadeeva Marina <astarta@rat.ru>
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=51331
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v3.5+
2012-12-26 10:39:23 -07:00
Gleb Natapov f924d66d27 KVM: VMX: remove unneeded temporary variable from vmx_set_segment()
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:02:00 +02:00
Gleb Natapov 1ecd50a947 KVM: VMX: clean-up vmx_set_segment()
Move all vm86_active logic into one place.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:01:49 +02:00
Gleb Natapov 39dcfb95de KVM: VMX: remove redundant code from vmx_set_segment()
Segment descriptor's base is fixed by call to fix_rmode_seg(). Not need
to do it twice.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:01:37 +02:00
Gleb Natapov beb853ffec KVM: VMX: use fix_rmode_seg() to fix all code/data segments
The code for SS and CS does the same thing fix_rmode_seg() is doing.
Use it instead of hand crafted code.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:01:18 +02:00
Gleb Natapov c6ad115348 KVM: VMX: return correct segment limit and flags for CS/SS registers in real mode
VMX without unrestricted mode cannot virtualize real mode, so if
emulate_invalid_guest_state=0 kvm uses vm86 mode to approximate
it. Sometimes, when guest moves from protected mode to real mode, it
leaves segment descriptors in a state not suitable for use by vm86 mode
virtualization, so we keep shadow copy of segment descriptors for internal
use and load fake register to VMCS for guest entry to succeed. Till
now we kept shadow for all segments except SS and CS (for SS and CS we
returned parameters directly from VMCS), but since commit a5625189f6
emulator enforces segment limits in real mode. This causes #GP during move
from protected mode to real mode when emulator fetches first instruction
after moving to real mode since it uses incorrect CS base and limit to
linearize the %rip. Fix by keeping shadow for SS and CS too.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:01:03 +02:00
Gleb Natapov 0647f4aa8c KVM: VMX: relax check for CS register in rmode_segment_valid()
rmode_segment_valid() checks if segment descriptor can be used to enter
vm86 mode. VMX spec mandates that in vm86 mode CS register will be of
type data, not code. Lets allow guest entry with vm86 mode if the only
problem with CS register is incorrect type. Otherwise entire real mode
will be emulated.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:00:47 +02:00
Gleb Natapov 07f42f5f25 KVM: VMX: cleanup rmode_segment_valid()
Set segment fields explicitly instead of using  binary operations.

No behaviour changes.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23 14:00:36 +02:00
Linus Torvalds 54d46ea993 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "sigaltstack infrastructure + conversion for x86, alpha and um,
  COMPAT_SYSCALL_DEFINE infrastructure.

  Note that there are several conflicts between "unify
  SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
  resolution is trivial - just remove definitions of SS_ONSTACK and
  SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
  include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  alpha: switch to generic sigaltstack
  new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
  generic compat_sys_sigaltstack()
  introduce generic sys_sigaltstack(), switch x86 and um to it
  new helper: compat_user_stack_pointer()
  new helper: restore_altstack()
  unify SS_ONSTACK/SS_DISABLE definitions
  new helper: current_user_stack_pointer()
  missing user_stack_pointer() instances
  Bury the conditionals from kernel_thread/kernel_execve series
  COMPAT_SYSCALL_DEFINE: infrastructure
2012-12-20 18:05:28 -08:00
David Woodhouse ffee0de411 x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT
It is easy to waste a bunch of time when one takes a 32-bit .config
from a test machine and try to build it on a faster 64-bit system, and
its existing setting of CONFIG_64BIT=n gets *changed* to match the
build host.  Similarly, if one has an existing build tree it is easy
to trash an entire build tree that way.

This is because the default setting for $ARCH when discovered from
'uname' is one of the legacy pre-x86-merge values (i386 or x86_64),
which effectively force the setting of CONFIG_64BIT to match. We should
default to ARCH=x86 instead, finally completing the merge that we
started so long ago.

This patch preserves the behaviour of the legacy ARCH settings for commands
such as:

   make ARCH=x86_64 randconfig
   make ARCH=i386 randconfig

... since making the value of CONFIG_64BIT actually random in that situation
is not desirable.

In time, perhaps we can retire this legacy use of the old ARCH= values.
We already have a way to override values for *any* config option, using
$KCONFIG_ALLCONFIG, so it could be argued that we don't necessarily need
to keep ARCH={i386,x86_64} around as a special case just for overriding
CONFIG_64BIT.

We'd probably at least want to add a way to override config options from
the command line ('make CONFIG_FOO=y oldconfig') before we talk about doing
that though.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1356040315.3198.51.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 14:37:18 -08:00
Sasha Levin 8f170faeb4 x86, apb_timer: remove unused variable percpu_timer
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-28-git-send-email-sasha.levin@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 11:49:39 -08:00
Sasha Levin 7be0b06520 um: don't compare a pointer to 0
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-27-git-send-email-sasha.levin@oracle.com
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 11:49:39 -08:00
Sasha Levin 6444174548 arch/x86/platform/uv: use ARRAY_SIZE where possible
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-26-git-send-email-sasha.levin@oracle.com
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Alex Shi <alex.shi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 11:49:39 -08:00
Sasha Levin 886d751a2e x86, efi: correct precedence of operators in setup_efi_pci
With the current code, the condition in the if() doesn't make much sense due to
precedence of operators.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-25-git-send-email-sasha.levin@oracle.com
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-20 11:47:14 -08:00
Linus Torvalds 787314c35f IOMMU Updates for Linux v3.8
A few new features this merge-window. The most important one is
 probably, that dma-debug now warns if a dma-handle is not checked with
 dma_mapping_error by the device driver. This requires minor changes to
 some architectures which make use of dma-debug. Most of these changes
 have the respective Acks by the Arch-Maintainers.
 Besides that there are updates to the AMD IOMMU driver for refactor the
 IOMMU-Groups support and to make sure it does not trigger a hardware
 erratum.
 The OMAP changes (for which I pulled in a branch from Tony Lindgren's
 tree) have a conflict in linux-next with the arm-soc tree. The conflict
 is in the file arch/arm/mach-omap2/clock44xx_data.c which is deleted in
 the arm-soc tree. It is safe to delete the file too so solve the
 conflict. Similar changes are done in the arm-soc tree in the common
 clock framework migration. A missing hunk from the patch in the IOMMU
 tree will be submitted as a seperate patch when the merge-window is
 closed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQzbQQAAoJECvwRC2XARrjXCIP/2RxBzbVOiaPOorl+ZWbsZ41
 lzWiXsCHJkh4BK4/qGsVeKhiNd9LcbQUlhywnBbhWxym3spzmjGtvU2Hcg8QiO/M
 R83r9S4e8Z6DnF9Gcats1Ns9BufgpyhLXg3XoXPxtyHOgRS59fvYi6xXOxyX30Dy
 uhbj+WL6UD0zvOMNztEnM1p6UhX+XlpvzKDTR5+G5xKdVPkcgeiaKSwqz739caTn
 QE2NpqIh+8Mwuu1nIapk8h07xhUYU5eGMXa38u1LvDwSHsrsCMLC+lXIjtInn7Gw
 Bv+XcCHgtOaoPQwwk/xd2HVwJQxO9HNb5YX51EIjwP0C5S/3yW9Ji1RgqFb6Ewqq
 jIkF6ckwUheLWsBGkw5UknI/f7RX3MDiTWkziYLIniYKKewm+ymGfgIqPt2TzLIO
 tMZZiIssKvy7wOXQ5JjpYJg5Xmrau6opNwdEguC8pWkJT7qsn+3SeLjMt0Lh9IoY
 +37DOgOLb3O3/vnZJ3i0KMRZBfVeaRj5HaGmlxFCYUZCNQymIPTih9Jtqm+WuVcu
 YaGQCTtynsQ0JVh8YEekLzSfgd3OODP68fyCg1CQNixEgvUi2hd/toX2/Z1wkkSA
 JC9bZarcoPkSWqaTAA2HvmaaxvRR+0UbhFPopFTQarVV0MVLZWBxoyuKy/nMrmMd
 UgTzrDYy74UKdrSTwIXg
 =pPHZ
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "A few new features this merge-window.  The most important one is
  probably, that dma-debug now warns if a dma-handle is not checked with
  dma_mapping_error by the device driver.  This requires minor changes
  to some architectures which make use of dma-debug.  Most of these
  changes have the respective Acks by the Arch-Maintainers.

  Besides that there are updates to the AMD IOMMU driver for refactor
  the IOMMU-Groups support and to make sure it does not trigger a
  hardware erratum.

  The OMAP changes (for which I pulled in a branch from Tony Lindgren's
  tree) have a conflict in linux-next with the arm-soc tree.  The
  conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is
  deleted in the arm-soc tree.  It is safe to delete the file too so
  solve the conflict.  Similar changes are done in the arm-soc tree in
  the common clock framework migration.  A missing hunk from the patch
  in the IOMMU tree will be submitted as a seperate patch when the
  merge-window is closed."

* tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits)
  ARM: dma-mapping: support debug_dma_mapping_error
  ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks
  iommu/omap: Adapt to runtime pm
  iommu/omap: Migrate to hwmod framework
  iommu/omap: Keep mmu enabled when requested
  iommu/omap: Remove redundant clock handling on ISR
  iommu/amd: Remove obsolete comment
  iommu/amd: Don't use 512GB pages
  iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch
  iommu/tegra: gart: Move bus_set_iommu after probe for multi arch
  iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all
  tile: dma_debug: add debug_dma_mapping_error support
  sh: dma_debug: add debug_dma_mapping_error support
  powerpc: dma_debug: add debug_dma_mapping_error support
  mips: dma_debug: add debug_dma_mapping_error support
  microblaze: dma-mapping: support debug_dma_mapping_error
  ia64: dma_debug: add debug_dma_mapping_error support
  c6x: dma_debug: add debug_dma_mapping_error support
  ARM64: dma_debug: add debug_dma_mapping_error support
  intel-iommu: Prevent devices with RMRRs from being placed into SI Domain
  ...
2012-12-20 10:07:25 -08:00
Al Viro c40702c49f new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
note that they are relying on access_ok() already checked by caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:41 -05:00
Al Viro 9026843952 generic compat_sys_sigaltstack()
Again, conditional on CONFIG_GENERIC_SIGALTSTACK

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:41 -05:00
Al Viro 6bf9adfc90 introduce generic sys_sigaltstack(), switch x86 and um to it
Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not
select it are completely unaffected

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:40 -05:00
Al Viro 9b064fc3f9 new helper: compat_user_stack_pointer()
Compat counterpart of current_user_stack_pointer(); for most of the biarch
architectures those two are identical, but e.g. arm64 and arm use different
registers for stack pointer...

Note that amd64 variants of current_user_stack_pointer/compat_user_stack_pointer
do *not* rely on pt_regs having been through FIXUP_TOP_OF_STACK.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:40 -05:00
Al Viro 031b656698 unify SS_ONSTACK/SS_DISABLE definitions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:39 -05:00
Al Viro 5208ba24e7 missing user_stack_pointer() instances
for the architectures that have usp in pt_regs and do not have
user_stack_pointer() already defined.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:39 -05:00
Al Viro ae903caae2 Bury the conditionals from kernel_thread/kernel_execve series
All architectures have
	CONFIG_GENERIC_KERNEL_THREAD
	CONFIG_GENERIC_KERNEL_EXECVE
	__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:38 -05:00
Linus Torvalds 1bd12c91de Merge branch 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull one final 386 removal patch from Peter Anvin.

IRQ 13 FPU error handling is gone.  That was not one of the proudest
moments in PC history.

* 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, 386 removal: Remove support for IRQ 13 FPU error reporting
2012-12-19 13:02:23 -08:00
Linus Torvalds 7a684c452e Nothing all that exciting; a new module-from-fd syscall for those who want
to verify the source of the module (ChromeOS) and/or use standard IMA on it
 or other security hooks.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ0VKlAAoJENkgDmzRrbjxjuEQALVHpD1cSmryOzVwkNn7rVGP
 PV3KVbUs+qzUCm2c3AafIIlSBm2LOUl+cR3uNC7di8aHarRF3VHkK2OQ4Fx97ECd
 KKBqAyY3R0q1mAKujb/MWwiK0YgosEDIOzGGn2yQhNFsxKqnMB02P4j82IO7+g+w
 Cc3XuDyWHoH2I+ySgz0Q8NHAqufD/DMZUKud7jw2Lsv6PuICJ1Oqgl/Gd/muxort
 4a5tV3tjhRGywHS/8b2fbDUXkybC5NKK0FN+gyoaROmJ/THeHEQDGXZT9bc2vmVx
 HvRy/5k8dzQ6LAJ2mLnPvy0pmv0u7NYMvjxTxxUlUkFMkYuVticikQfwSYDbDPt4
 mbsLxchpgi8z4x8HltEERffCX5tldo/5hz1uemqhqIsMRIrRFnlHkSIgkGjVHf2u
 LXQBLT8uTm6C0VyNQPrI/hUZzIax7WtKbPSoK9lmExNbKqloEFh/mVXvfQxei2kp
 wnUZcnmPIqSvw7b4CWu7HibMYu2VvGBgm3YIfJRi4AQme1mzFYLpZoxF5Pj+Ykbt
 T//Hb1EsNQTTFCg7MZhnJSAw/EVUvNDUoullORClyqw6+xxjVKqWpPJgYDRfWOlJ
 Xa+s7DNrL+Oo1WWR8l5ruoQszbR8szIyeyPKKxRUcQj2zsqghoWuzKAx2saSEw3W
 pNkoJU+dGC7kG/yVAS8N
 =uoJj
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module update from Rusty Russell:
 "Nothing all that exciting; a new module-from-fd syscall for those who
  want to verify the source of the module (ChromeOS) and/or use standard
  IMA on it or other security hooks."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MODSIGN: Fix kbuild output when using default extra_certificates
  MODSIGN: Avoid using .incbin in C source
  modules: don't hand 0 to vmalloc.
  module: Remove a extra null character at the top of module->strtab.
  ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants
  ASN.1: Define indefinite length marker constant
  moduleparam: use __UNIQUE_ID()
  __UNIQUE_ID()
  MODSIGN: Add modules_sign make target
  powerpc: add finit_module syscall.
  ima: support new kernel module syscall
  add finit_module syscall to asm-generic
  ARM: add finit_module syscall to ARM
  security: introduce kernel_module_from_file hook
  module: add flags arg to sys_finit_module()
  module: add syscall to load module from fd
2012-12-19 07:55:08 -08:00
Linus Torvalds 673ab8783b Merge branch 'akpm' (more patches from Andrew)
Merge patches from Andrew Morton:
 "Most of the rest of MM, plus a few dribs and drabs.

  I still have quite a few irritating patches left around: ones with
  dubious testing results, lack of review, ones which should have gone
  via maintainer trees but the maintainers are slack, etc.

  I need to be more activist in getting these things wrapped up outside
  the merge window, but they're such a PITA."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits)
  mm/vmscan.c: avoid possible deadlock caused by too_many_isolated()
  vmscan: comment too_many_isolated()
  mm/kmemleak.c: remove obsolete simple_strtoul
  mm/memory_hotplug.c: improve comments
  mm/hugetlb: create hugetlb cgroup file in hugetlb_init
  mm/mprotect.c: coding-style cleanups
  Documentation: ABI: /sys/devices/system/node/
  slub: drop mutex before deleting sysfs entry
  memcg: add comments clarifying aspects of cache attribute propagation
  kmem: add slab-specific documentation about the kmem controller
  slub: slub-specific propagation changes
  slab: propagate tunable values
  memcg: aggregate memcg cache values in slabinfo
  memcg/sl[au]b: shrink dead caches
  memcg/sl[au]b: track all the memcg children of a kmem_cache
  memcg: destroy memcg caches
  sl[au]b: allocate objects from memcg cache
  sl[au]b: always get the cache from its page in kmem_cache_free()
  memcg: skip memcg kmem allocations in specified code regions
  memcg: infrastructure to match an allocation to the right cache
  ...
2012-12-18 15:08:12 -08:00
Shérab 88d67ee3ec arch/x86/platform/iris/iris.c: register a platform device and a platform driver
This makes the iris driver use the platform API, so it is properly exposed
in /sys.

[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout]
Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18 15:02:11 -08:00
Linus Torvalds 6842d98de7 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull powertool update from Len Brown:
 "This updates the tree w/ the latest version of turbostat, which
  reports temperature and - on SNB and later - Watts."

Fix up semantic merge conflict as per Len.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools: Allow tools to be installed in a user specified location
  tools/power: turbostat: make Makefile a bit more capable
  tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()
  tools/power turbostat: v3.0: monitor Watts and Temperature
  tools/power turbostat: fix output buffering issue
  tools/power turbostat: prevent infinite loop on migration error path
  x86 power: define RAPL MSRs
  tools/power/x86/turbostat: share kernel MSR #defines
2012-12-18 12:34:29 -08:00
Linus Torvalds 224394ad75 Bugfixes:
* Fix to bootup regression introduced by 'x86-bsp-hotplug-for-linus' tip branch.
  * Fix to vcpu hotplug code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQz/sjAAoJEFjIrFwIi8fJNy4H/1FT8SUpCPnVm5mHpPFQdE0X
 DgkjQuNuUUEpi+1fOaIl4CVu4B6uRqY2K6C1pOMgf2SDUdgvtv7Tk+jR1wuNIG9r
 Q4yslc9LcCy5916hT9t/7+THmKqfibbocvRAtcjrOHfcdcMnYYBrCP8YeeNARfe9
 oduzs8+BC8xCThS6rbhe+PHtsfXucf4+aRdXYg7w1c6EeA7RCY/8o5FF8vVOFbcf
 mFOeKzMD7zHwoV7i8iYMmydhLOkmXj0QfQcHtV5kZ2m43FQ4nCUYMtqJa9Q6RXzH
 4tUr4gYu8QE4t7gusP3e3kYCtJLDxtiCa1s3mp0tWT7S5LZsVlyWa0n30YW30W8=
 =U02+
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc0-bugfix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen bugfixes from Konrad Rzeszutek Wilk:
 "Two fixes.  One of them is caused by the recent change introduced by
  the 'x86-bsp-hotplug-for-linus' tip tree that inhibited bootup (old
  function does not do what it used to do).  The other one is just a
  vanilla bug.

   - Fix to bootup regression introduced by 'x86-bsp-hotplug-for-linus'
     tip branch.
   - Fix to vcpu hotplug code."

* tag 'stable/for-linus-3.8-rc0-bugfix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/vcpu: Fix vcpu restore path.
  xen: Add EVTCHNOP_reset in Xen interface header files.
  xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
2012-12-18 12:26:54 -08:00
David Rientjes c36e0501ee x86, paravirt: fix build error when thp is disabled
With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks
because set_pmd_at() is undeclared:

  mm/memory.c: In function 'do_pmd_numa_page':
  mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at'
  mm/mprotect.c: In function 'change_pmd_protnuma':
  mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at'

This is because paravirt defines set_pmd_at() only when
CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded.  The
fix is to define it for all CONFIG_PARAVIRT configurations.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18 09:49:03 -08:00
Linus Torvalds ea88eeac0c md update for 3.8
Mostly just little fixes.  Probably biggest part is
 AVX accelerated RAID6 calculations.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIVAwUAUM/w2Dnsnt1WYoG5AQKXlg/9F5juv4CjRkRRFLqZgOPBLmn/s/2Vspgh
 2Kv8Jcyixd8jUQNbobZv0ahlJH/iSU61kpOE8QjLbKi5Y42vAbM0ZU2aHJ6nqGZy
 HiTI8K+7kTvCK3ZXLcUQ+4oPPBNTcoTZbLWaEOmIqB1ruLddoIR7M9fG3PspVeG0
 jijnXR8IfL6mr4YDXnJkEhFrneTysVik05RkKYZKyM/9r3stAoMJ9o0/EFy3OFxb
 lO6mLEtvjVArXcnuf1RMCw2YKgki9Y4r73HCplgQsVFvcxcpsya4gFF+lRR5j7cO
 /eMYbSQ89iWEYKh1dJ9u1nofc8fX5ia71QQyO1fkO4GXRHXPVIyBgKSbe7SaL6iG
 JUMm7idUV2rZGeq3ln3k8Yor4QqHvN1n7pRKKUF+ZdsPoQ1B/TABu+qpsAdo5ZhP
 fxDsULsHrzEaxgetd4V8F2Uptca9ni43sMI8mwsvVlA0p6SOzMIyoJLC9xAZpx11
 b3H3+7Oje/fasmszBoq5B9uAlSt9XXVN4DDn2q6cX+S96JSX6jcsN1c6cJBO+ZxB
 OU6a6P5mnU6HuxU02rspe7G8BeU+ybaonErOW+GdyC4r7M/cImC0dSp0NGHK2211
 oqu0xBx/Q/ddTFwKQqa4HzR2ws09+LhKbjdqYIhCEKttIbLIAjf73ARZ19XPSRRX
 pDR/ey2CB6E=
 =uK52
 -----END PGP SIGNATURE-----

Merge tag 'md-3.8' of git://neil.brown.name/md

Pull md update from Neil Brown:
 "Mostly just little fixes.  Probably biggest part is AVX accelerated
  RAID6 calculations."

* tag 'md-3.8' of git://neil.brown.name/md:
  md/raid5: add blktrace calls
  md/raid5: use async_tx_quiesce() instead of open-coding it.
  md: Use ->curr_resync as last completed request when cleanly aborting resync.
  lib/raid6: build proper files on corresponding arch
  lib/raid6: Add AVX2 optimized gen_syndrome functions
  lib/raid6: Add AVX2 optimized recovery functions
  md: Update checkpoint of resync/recovery based on time.
  md:Add place to update ->recovery_cp.
  md.c: re-indent various 'switch' statements.
  md: close race between removing and adding a device.
  md: removed unused variable in calc_sb_1_csm.
2012-12-18 09:32:44 -08:00
Li Zhong 9b132fbe54 Add rcu user eqs exception hooks for async page fault
This patch adds user eqs exception hooks for async page fault page not
present code path, to exit the user eqs and re-enter it as necessary.

Async page fault is different from other exceptions that it may be
triggered from idle process, so we still need rcu_irq_enter() and
rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
that needs use rcu.

As Frederic pointed out it would be safest and simplest to protect the
whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
code there deeply for potential RCU uses and ensure it will never be
extended later to use RCU.".

However, We'd better re-enter the cpu idle eqs if we get the exception
in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt().

So the patch does what Frederic suggested for rcu_irq_*() API usage
here, except that I moved the rcu_irq_*() pair originally in
do_async_page_fault() into kvm_async_pf_task_wait().

That's because, I think it's better to have rcu_irq_*() pairs to be in
one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
kvm_async_pf_task_wait() has other callers, which might cause
rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
which is illegal if the cpu happens to be in rcu idle state.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-18 15:15:41 +02:00
Nickolai Zeldovich d4b06c2d4c kvm: fix i8254 counter 0 wraparound
The kvm i8254 emulation for counter 0 (but not for counters 1 and 2)
has at least two bugs in mode 0:

1. The OUT bit, computed by pit_get_out(), is never set high.

2. The counter value, computed by pit_get_count(), wraps back around to
   the initial counter value, rather than wrapping back to 0xFFFF
   (which is the behavior described in the comment in __kpit_elapsed,
   the behavior implemented by qemu, and the behavior observed on AMD
   hardware).

The bug stems from __kpit_elapsed computing the elapsed time mod the
initial counter value (stored as nanoseconds in ps->period).  This is both
unnecessary (none of the callers of kpit_elapsed expect the value to be
at most the initial counter value) and incorrect (it causes pit_get_count
to appear to wrap around to the initial counter value rather than 0xFFFF).
Removing this mod from __kpit_elapsed fixes both of the above bugs.

Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-18 11:12:38 +02:00
Wei Liu 9d328a948f xen/vcpu: Fix vcpu restore path.
The runstate of vcpu should be restored for all possible cpus, as well as the
vcpu info placement.

Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-12-17 21:58:09 -05:00
Konrad Rzeszutek Wilk 06d0b5d9ed xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
Git commit 30106c1743
("x86, hotplug: Support functions for CPU0 online/offline") alters what
the call to smp_store_cpu_info() does. For BSP we should use the
smp_store_boot_cpu_info() and for secondary CPU's the old
variant of smp_store_cpu_info() should be used. This fixes
the regression introduced by said commit.

Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-12-17 21:56:35 -05:00
Andrew Morton d7124073ad create non-empty arch/x86/include/uapi/asm/ files
patch(1) doesn't create zero-length files, so my kernel didn't compile.

Put something in these files so patch(1) actually creates them.

Cc: David Howells <dhowells@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:11 -08:00
H. Peter Anvin bc3eba6068 x86, 386 removal: Remove support for IRQ 13 FPU error reporting
Remove support for FPU error reporting via IRQ 13, as opposed to
exception 16 (#MF).  One last remnant of i386 gone.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@linux.intel.com>
2012-12-17 11:42:40 -08:00
Linus Torvalds 2a74dbb9a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "A quiet cycle for the security subsystem with just a few maintenance
  updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: create a sysfs mount point for smackfs
  Smack: use select not depends in Kconfig
  Yama: remove locking from delete path
  Yama: add RCU to drop read locking
  drivers/char/tpm: remove tasklet and cleanup
  KEYS: Use keyring_alloc() to create special keyrings
  KEYS: Reduce initial permissions on keys
  KEYS: Make the session and process keyrings per-thread
  seccomp: Make syscall skipping and nr changes more consistent
  key: Fix resource leak
  keys: Fix unreachable code
  KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-16 15:40:50 -08:00
Linus Torvalds 3d59eebc5e Automatic NUMA Balancing V11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD
 Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB
 vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo
 xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT
 DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb
 72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj
 YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj
 3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR
 hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W
 CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF
 BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi
 Ka0JKgnWvsa6ez6FSzKI
 =ivQa
 -----END PGP SIGNATURE-----

Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma

Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
 "There are three implementations for NUMA balancing, this tree
  (balancenuma), numacore which has been developed in tip/master and
  autonuma which is in aa.git.

  In almost all respects balancenuma is the dumbest of the three because
  its main impact is on the VM side with no attempt to be smart about
  scheduling.  In the interest of getting the ball rolling, it would be
  desirable to see this much merged for 3.8 with the view to building
  scheduler smarts on top and adapting the VM where required for 3.9.

  The most recent set of comparisons available from different people are

    mel:    https://lkml.org/lkml/2012/12/9/108
    mingo:  https://lkml.org/lkml/2012/12/7/331
    tglx:   https://lkml.org/lkml/2012/12/10/437
    srikar: https://lkml.org/lkml/2012/12/10/397

  The results are a mixed bag.  In my own tests, balancenuma does
  reasonably well.  It's dumb as rocks and does not regress against
  mainline.  On the other hand, Ingo's tests shows that balancenuma is
  incapable of converging for this workloads driven by perf which is bad
  but is potentially explained by the lack of scheduler smarts.  Thomas'
  results show balancenuma improves on mainline but falls far short of
  numacore or autonuma.  Srikar's results indicate we all suffer on a
  large machine with imbalanced node sizes.

  My own testing showed that recent numacore results have improved
  dramatically, particularly in the last week but not universally.
  We've butted heads heavily on system CPU usage and high levels of
  migration even when it shows that overall performance is better.
  There are also cases where it regresses.  Of interest is that for
  specjbb in some configurations it will regress for lower numbers of
  warehouses and show gains for higher numbers which is not reported by
  the tool by default and sometimes missed in treports.  Recently I
  reported for numacore that the JVM was crashing with
  NullPointerExceptions but currently it's unclear what the source of
  this problem is.  Initially I thought it was in how numacore batch
  handles PTEs but I'm no longer think this is the case.  It's possible
  numacore is just able to trigger it due to higher rates of migration.

  These reports were quite late in the cycle so I/we would like to start
  with this tree as it contains much of the code we can agree on and has
  not changed significantly over the last 2-3 weeks."

* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
  mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
  mm/rmap: Convert the struct anon_vma::mutex to an rwsem
  mm: migrate: Account a transhuge page properly when rate limiting
  mm: numa: Account for failed allocations and isolations as migration failures
  mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
  mm: numa: Add THP migration for the NUMA working set scanning fault case.
  mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
  mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
  mm: sched: numa: Control enabling and disabling of NUMA balancing
  mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
  mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
  mm: numa: migrate: Set last_nid on newly allocated page
  mm: numa: split_huge_page: Transfer last_nid on tail page
  mm: numa: Introduce last_nid to the page frame
  sched: numa: Slowly increase the scanning period as NUMA faults are handled
  mm: numa: Rate limit setting of pte_numa if node is saturated
  mm: numa: Rate limit the amount of memory that is migrated between nodes
  mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
  mm: numa: Migrate pages handled during a pmd_numa hinting fault
  mm: numa: Migrate on reference policy
  ...
2012-12-16 15:18:08 -08:00
Joerg Roedel 9c6ecf6a3a Merge branches 'iommu/fixes', 'dma-debug', 'x86/amd', 'x86/vt-d', 'arm/tegra' and 'arm/omap' into next 2012-12-16 12:24:09 +01:00
Linus Torvalds 11520e5e7c Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)"
This reverts commit bd52276fa1 ("x86-64/efi: Use EFI to deal with
platform wall clock (again)"), and the two supporting commits:

  da5a108d05b4: "x86/kernel: remove tboot 1:1 page table creation code"

  185034e72d59: "x86, efi: 1:1 pagetable mapping for virtual EFI calls")

as they all depend semantically on commit 53b87cf088 ("x86, mm:
Include the entire kernel memory map in trampoline_pgd") that got
reverted earlier due to the problems it caused.

This was pointed out by Yinghai Lu, and verified by me on my Macbook Air
that uses EFI.

Pointed-out-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-15 15:20:41 -08:00
Linus Torvalds 1ed55eac3b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:

 - Added aesni/avx/x86_64 implementations for camellia.

 - Optimised AVX code for cast5/serpent/twofish/cast6.

 - Fixed vmac bug with unaligned input.

 - Allow compression algorithms in FIPS mode.

 - Optimised crc32c implementation for Intel.

 - Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (32 commits)
  crypto: caam - Updated SEC-4.0 device tree binding for ERA information.
  crypto: testmgr - remove superfluous initializers for xts(aes)
  crypto: testmgr - allow compression algs in fips mode
  crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel
  crypto: testmgr - clean alg_test_null entries in alg_test_descs[]
  crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests
  crypto: cast5/cast6 - move lookup tables to shared module
  padata: use __this_cpu_read per-cpu helper
  crypto: s5p-sss - Fix compilation error
  crypto: picoxcell - Add terminating entry for platform_device_id table
  crypto: omap-aes - select BLKCIPHER2
  crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher
  crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file
  crypto: tcrypt - add async speed test for camellia cipher
  crypto: tegra-aes - fix error-valued pointer dereference
  crypto: tegra - fix missing unlock on error case
  crypto: cast5/avx - avoid using temporary stack buffers
  crypto: serpent/avx - avoid using temporary stack buffers
  crypto: twofish/avx - avoid using temporary stack buffers
  crypto: cast6/avx - avoid using temporary stack buffers
  ...
2012-12-15 12:35:19 -08:00
Linus Torvalds be354f4081 Revert "x86, mm: Include the entire kernel memory map in trampoline_pgd"
This reverts commit 53b87cf088.

It causes odd bootup problems on x86-64.  Markus Trippelsdorf gets a
repeatable oops, and I see a non-repeatable oops (or constant stream of
messages that scroll off too quickly to read) that seems to go away with
this commit reverted.

So we don't know exactly what is wrong with the commit, but it's
definitely problematic, and worth reverting sooner rather than later.

Bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: H Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-15 12:29:54 -08:00
Gleb Natapov e11ae1a102 KVM: remove unused variable.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-14 23:07:44 -02:00
David Howells af170c5061 UAPI: (Scripted) Disintegrate arch/x86/include/asm
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-12-14 22:37:13 +00:00
Linus Torvalds d42b3a2906 Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI update from Peter Anvin:
 "EFI tree, from Matt Fleming.  Most of the patches are the new efivarfs
  filesystem by Matt Garrett & co.  The balance are support for EFI
  wallclock in the absence of a hardware-specific driver, and various
  fixes and cleanups."

* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  efivarfs: Make efivarfs_fill_super() static
  x86, efi: Check table header length in efi_bgrt_init()
  efivarfs: Use query_variable_info() to limit kmalloc()
  efivarfs: Fix return value of efivarfs_file_write()
  efivarfs: Return a consistent error when efivarfs_get_inode() fails
  efivarfs: Make 'datasize' unsigned long
  efivarfs: Add unique magic number
  efivarfs: Replace magic number with sizeof(attributes)
  efivarfs: Return an error if we fail to read a variable
  efi: Clarify GUID length calculations
  efivarfs: Implement exclusive access for {get,set}_variable
  efivarfs: efivarfs_fill_super() ensure we clean up correctly on error
  efivarfs: efivarfs_fill_super() ensure we free our temporary name
  efivarfs: efivarfs_fill_super() fix inode reference counts
  efivarfs: efivarfs_create() ensure we drop our reference on inode on error
  efivarfs: efivarfs_file_read ensure we free data in error paths
  x86-64/efi: Use EFI to deal with platform wall clock (again)
  x86/kernel: remove tboot 1:1 page table creation code
  x86, efi: 1:1 pagetable mapping for virtual EFI calls
  x86, mm: Include the entire kernel memory map in trampoline_pgd
  ...
2012-12-14 10:08:40 -08:00
Linus Torvalds 18dd0bf22b Merge branch 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ACPI update from Peter Anvin:
 "This is a patchset which didn't make the last merge window.  It adds a
  debugging capability to feed ACPI tables via the initramfs.

  On a grander scope, it formalizes using the initramfs protocol for
  feeding arbitrary blobs which need to be accessed early to the kernel:
  they are fed first in the initramfs blob (lots of bootloaders can
  concatenate this at boot time, others can use a single file) in an
  uncompressed cpio archive using filenames starting with "kernel/".

  The ACPI maintainers requested that this patchset be fed via the x86
  tree rather than the ACPI tree as the footprint in the general x86
  code is much bigger than in the ACPI code proper."

* 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  X86 ACPI: Use #ifdef not #if for CONFIG_X86 check
  ACPI: Fix build when disabled
  ACPI: Document ACPI table overriding via initrd
  ACPI: Create acpi_table_taint() function to avoid code duplication
  ACPI: Implement physical address table override
  ACPI: Store valid ACPI tables passed via early initrd in reserved memblock areas
  x86, acpi: Introduce x86 arch specific arch_reserve_mem_area() for e820 handling
  lib: Add early cpio decoder
2012-12-14 10:03:23 -08:00
Linus Torvalds 2d9c8b5d6a Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "Rework all config variables used throughout the MCA code and collect
  them together into a mca_config struct.  This keeps them tightly and
  neatly packed together instead of spilled all over the place.

  Then, convert those which are used as booleans into real booleans and
  save some space.  These bits are exposed via
     /sys/devices/system/machinecheck/machinecheck*/"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, MCA: Finish mca_config conversion
  x86, MCA: Convert the next three variables batch
  x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout
  x86, MCA: Convert dont_log_ce, banks and tolerant
  drivers/base: Add a DEVICE_BOOL_ATTR macro
2012-12-14 09:59:59 -08:00
Kees Cook 34e1169d99 module: add syscall to load module from fd
As part of the effort to create a stronger boundary between root and
kernel, Chrome OS wants to be able to enforce that kernel modules are
being loaded only from our read-only crypto-hash verified (dm_verity)
root filesystem. Since the init_module syscall hands the kernel a module
as a memory blob, no reasoning about the origin of the blob can be made.

Earlier proposals for appending signatures to kernel modules would not be
useful in Chrome OS, since it would involve adding an additional set of
keys to our kernel and builds for no good reason: we already trust the
contents of our root filesystem. We don't need to verify those kernel
modules a second time. Having to do signature checking on module loading
would slow us down and be redundant. All we need to know is where a
module is coming from so we can say yes/no to loading it.

If a file descriptor is used as the source of a kernel module, many more
things can be reasoned about. In Chrome OS's case, we could enforce that
the module lives on the filesystem we expect it to live on.  In the case
of IMA (or other LSMs), it would be possible, for example, to examine
extended attributes that may contain signatures over the contents of
the module.

This introduces a new syscall (on x86), similar to init_module, that has
only two arguments. The first argument is used as a file descriptor to
the module and the second argument is a pointer to the NULL terminated
string of module arguments.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (merge fixes)
2012-12-14 13:05:22 +10:30
Alex Williamson 0f888f5acd KVM: Increase user memory slots on x86 to 125
With the 3 private slots, this gives us a nice round 128 slots total.
The primary motivation for this is to support more assigned devices.
Each assigned device can theoretically use up to 8 slots (6 MMIO BARs,
1 ROM BAR, 1 spare for a split MSI-X table mapping) though it's far
more typical for a device to use 3-4 slots.  If we assume a typical VM
uses a dozen slots for non-assigned devices purposes, we should always
be able to support 14 worst case assigned devices or 28 to 37 typical
devices.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:25:25 -02:00
Alex Williamson f82a8cfe93 KVM: struct kvm_memory_slot.user_alloc -> bool
There's no need for this to be an int, it holds a boolean.
Move to the end of the struct for alignment.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:24:38 -02:00
Alex Williamson 0743247fbf KVM: Make KVM_PRIVATE_MEM_SLOTS optional
Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used.  Even x86 only uses 3 of the 4.  These
aren't exposed so there's no need to add padding.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:21:58 -02:00
Alex Williamson bbacc0c111 KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM.  One is
the user accessible slots and the other is user + private.  Make this
more obvious.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:21:57 -02:00
Gleb Natapov f3200d00ea KVM: inject ExtINT interrupt before APIC interrupts
According to Intel SDM Volume 3 Section 10.8.1 "Interrupt Handling with
the Pentium 4 and Intel Xeon Processors" and Section 10.8.2 "Interrupt
Handling with the P6 Family and Pentium Processors" ExtINT interrupts are
sent directly to the processor core for handling. Currently KVM checks
APIC before it considers ExtINT interrupts for injection which is
backwards from the spec. Make code behave according to the SDM.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 23:05:21 -02:00
Nadav Amit 5e2c688351 KVM: x86: fix mov immediate emulation for 64-bit operands
MOV immediate instruction (opcodes 0xB8-0xBF) may take 64-bit operand.
The previous emulation implementation assumes the operand is no longer than 32.
Adding OpImm64 for this matter.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=881579

Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 22:30:56 -02:00
Gleb Natapov 7f662273e4 KVM: emulator: implement AAD instruction
Windows2000 uses it during boot. This fixes
https://bugzilla.kernel.org/show_bug.cgi?id=50921

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13 22:27:22 -02:00
Linus Torvalds 66cdd0ceaf Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "Considerable KVM/PPC work, x86 kvmclock vsyscall support,
  IA32_TSC_ADJUST MSR emulation, amongst others."

Fix up trivial conflict in kernel/sched/core.c due to cross-cpu
migration notifier added next to rq migration call-back.

* tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits)
  KVM: emulator: fix real mode segment checks in address linearization
  VMX: remove unneeded enable_unrestricted_guest check
  KVM: VMX: fix DPL during entry to protected mode
  x86/kexec: crash_vmclear_local_vmcss needs __rcu
  kvm: Fix irqfd resampler list walk
  KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
  x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
  KVM: MMU: optimize for set_spte
  KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
  KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
  KVM: PPC: bookehv: Add guest computation mode for irq delivery
  KVM: PPC: Make EPCR a valid field for booke64 and bookehv
  KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
  KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
  KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
  KVM: PPC: e500: Add emulation helper for getting instruction ea
  KVM: PPC: bookehv64: Add support for interrupt handling
  KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
  KVM: PPC: booke: Fix get_tb() compile error on 64-bit
  KVM: PPC: e500: Silence bogus GCC warning in tlb code
  ...
2012-12-13 15:31:08 -08:00
Linus Torvalds 896ea17d3d Features:
- Add necessary infrastructure to make balloon driver work under ARM.
  - Add /dev/xen/privcmd interfaces to work with ARM and PVH.
  - Improve Xen PCIBack wild-card parsing.
  - Add Xen ACPI PAD (Processor Aggregator) support - so can offline/online
    sockets depending on the power consumption.
  - PVHVM + kexec = use an E820_RESV region for the shared region so we don't
    overwrite said region during kexec reboot.
  - Cleanups, compile fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQyJaAAAoJEFjIrFwIi8fJ9DoIALAjj3qaGDimykc/RPSu2MLL
 Tfchb1su0WxSu6fP17jBadq39Qna85UzZATMCyN47k8wB3KoSEW13rqwe7JSsdT/
 SEfZDrlbhNK+JAWJETx+6gq7J7dMwi/tFt4CbwPv/zAHb7C7JyzEgKctbi4Q1e89
 FFMXZru2IWDbaqlcJQjJcE/InhWy5vKW3bY5nR/Bz0RBf9lk/WHbcJwLXirsDcKk
 uMVmPy4yiApX6ZCPbYP5BZvsIFkmLKQEfpmwdzbLGDoL7N1onqq/lgYNgZqPJUkE
 XL1GVBbRGpy+NQr++vUS1NiRyR81EChRO3IrDZwzvNEPqKa9GoF5U1CdRh71R5I=
 =uZQZ
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen updates from Konrad Rzeszutek Wilk:
 - Add necessary infrastructure to make balloon driver work under ARM.
 - Add /dev/xen/privcmd interfaces to work with ARM and PVH.
 - Improve Xen PCIBack wild-card parsing.
 - Add Xen ACPI PAD (Processor Aggregator) support - so can offline/
   online sockets depending on the power consumption.
 - PVHVM + kexec = use an E820_RESV region for the shared region so we
   don't overwrite said region during kexec reboot.
 - Cleanups, compile fixes.

Fix up some trivial conflicts due to the balloon driver now working on
ARM, and there were changes next to the previous work-arounds that are
now gone.

* tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/PVonHVM: fix compile warning in init_hvm_pv_info
  xen: arm: implement remap interfaces needed for privcmd mappings.
  xen: correctly use xen_pfn_t in remap_domain_mfn_range.
  xen: arm: enable balloon driver
  xen: balloon: allow PVMMU interfaces to be compiled out
  xen: privcmd: support autotranslated physmap guests.
  xen: add pages parameter to xen_remap_domain_mfn_range
  xen/acpi: Move the xen_running_on_version_or_later function.
  xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h
  xen/acpi: Fix compile error by missing decleration for xen_domain.
  xen/acpi: revert pad config check in xen_check_mwait
  xen/acpi: ACPI PAD driver
  xen-pciback: reject out of range inputs
  xen-pciback: simplify and tighten parsing of device IDs
  xen PVonHVM: use E820_Reserved area for shared_info
2012-12-13 14:29:16 -08:00
Linus Torvalds f6e858a00a Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc VM changes from Andrew Morton:
 "The rest of most-of-MM.  The other MM bits await a slab merge.

  This patch includes the addition of a huge zero_page.  Not a
  performance boost but it an save large amounts of physical memory in
  some situations.

  Also a bunch of Fujitsu engineers are working on memory hotplug.
  Which, as it turns out, was badly broken.  About half of their patches
  are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken.  We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text.  Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
  mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
  mm/memory.c: remove unused code from do_wp_page()
  asm-generic, mm: pgtable: consolidate zero page helpers
  mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
  hwpoison, hugetlbfs: fix RSS-counter warning
  hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
  mm: protect against concurrent vma expansion
  memcg: do not check for mm in __mem_cgroup_count_vm_event
  tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
  mm: provide more accurate estimation of pages occupied by memmap
  fs/buffer.c: remove redundant initialization in alloc_page_buffers()
  fs/buffer.c: do not inline exported function
  writeback: fix a typo in comment
  mm: introduce new field "managed_pages" to struct zone
  mm, oom: remove statically defined arch functions of same name
  mm, oom: remove redundant sleep in pagefault oom handler
  mm, oom: cleanup pagefault oom handler
  memory_hotplug: allow online/offline memory to result movable node
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  mm, memcg: avoid unnecessary function call when memcg is disabled
  ...
2012-12-13 13:11:15 -08:00
Linus Torvalds 193c0d6825 PCI changes for the v3.8 merge window:
Host bridge hotplug:
     - Untangle _PRT from struct pci_bus (Bjorn Helgaas)
     - Request _OSC control before scanning root bus (Taku Izumi)
     - Assign resources when adding host bridge (Yinghai Lu)
     - Remove root bus when removing host bridge (Yinghai Lu)
     - Remove _PRT during hot remove (Yinghai Lu)
 
   SRIOV
     - Add sysfs knobs to control numVFs (Don Dutile)
 
   Power management
     - Notify devices when power resource turned on (Huang Ying)
 
   Bug fixes
     - Work around broken _SEG on HP xw9300 (Bjorn Helgaas)
     - Keep runtime PM enabled for unbound PCI devices (Huang Ying)
     - Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie)
     - Fix xen frontend shutdown issue (David Vrabel)
     - Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott)
 
   Miscellaneous
     - Add GPL license for drivers/pci/ioapic (Andrew Cooks)
     - Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas)
     - NumaChip remote PCI support (Daniel Blueman)
     - Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo Han)
     - Convert dev_printk() to dev_info(), etc (Joe Perches)
     - Add support for non PCI BAR ROM data (Matthew Garrett)
     - Add x86 support for host bridge translation offset (Mike Yoknis)
     - Report success only when every driver supports AER (Vijay Pandarathil)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQyKwSAAoJEPGMOI97Hn6zScgQAJZK2VDfCv74mKrgSDNokIzH
 5nVDrc9AHKJm7CUODs6keJK5d4TD/za3Zao68zrYHsJJKes2ni2Z3W34HP2RXKK2
 eOmePXOHYPPZMlimP9r9cVxNu1ZJCyp/yWSBcsPF4zUgWhBWLRaSj85I049gQ0sz
 +05nZYfLjVd3HNiaXsG4CQyMrNF46XEsLhF9vs+Nr2GHPwrpzhfScgYv63oDS86C
 3ICKsjmiRUZcNelxIFYmyxa5u89QdW5XHjzc9eHGQuus24Vxw+TZzsdfc17sUJEE
 HTyXY+RjDpOVhdtwwUjrCEOiyZYvy3g9+3sKxoxgt/76ghdUaR7fxITwB97qVMFD
 T0ESlKjSV/Qv5QYdyy5uP4zwNs/PXCWXkTg/L1m71F30BxKWDa7tgiA6uK7Z7fl5
 1aokKBdk3mtJJJIDJG1YkxPXx/JItTGCNYrx7CcFj49rSjrUWLQdmrYahersRIsB
 3wiD2xTi9e4dXeP/+VGzGOWB/sHk+73jvrvZe/REa1FCnMINDz4+9V9WaGROMqyq
 MQ8kX0KfYcNVNxy1GOXjU5wLpMN/t/QbvI7gwzRP1DAUCJPoOgFy7AjvSTVG3zuy
 8CtdOFttVkUn5dqsbQR0gVbyQVTS3PGSKz5XC/s8kVDWhja0xZTBYwrskM/4zdSD
 Xf48OyYV5EjpC3FYUSiU
 =OE3Q
 -----END PGP SIGNATURE-----

Merge tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI update from Bjorn Helgaas:
 "Host bridge hotplug:
   - Untangle _PRT from struct pci_bus (Bjorn Helgaas)
   - Request _OSC control before scanning root bus (Taku Izumi)
   - Assign resources when adding host bridge (Yinghai Lu)
   - Remove root bus when removing host bridge (Yinghai Lu)
   - Remove _PRT during hot remove (Yinghai Lu)

  SRIOV
    - Add sysfs knobs to control numVFs (Don Dutile)

  Power management
   - Notify devices when power resource turned on (Huang Ying)

  Bug fixes
   - Work around broken _SEG on HP xw9300 (Bjorn Helgaas)
   - Keep runtime PM enabled for unbound PCI devices (Huang Ying)
   - Fix Optimus dual-GPU runtime D3 suspend issue (Dave Airlie)
   - Fix xen frontend shutdown issue (David Vrabel)
   - Work around PLX PCI 9050 BAR alignment erratum (Ian Abbott)

  Miscellaneous
   - Add GPL license for drivers/pci/ioapic (Andrew Cooks)
   - Add standard PCI-X, PCIe ASPM register #defines (Bjorn Helgaas)
   - NumaChip remote PCI support (Daniel Blueman)
   - Fix PCIe Link Capabilities Supported Link Speed definition (Jingoo
     Han)
   - Convert dev_printk() to dev_info(), etc (Joe Perches)
   - Add support for non PCI BAR ROM data (Matthew Garrett)
   - Add x86 support for host bridge translation offset (Mike Yoknis)
   - Report success only when every driver supports AER (Vijay
     Pandarathil)"

Fix up trivial conflicts.

* tag 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
  PCI: Use phys_addr_t for physical ROM address
  x86/PCI: Add NumaChip remote PCI support
  ath9k: Use standard #defines for PCIe Capability ASPM fields
  iwlwifi: Use standard #defines for PCIe Capability ASPM fields
  iwlwifi: collapse wrapper for pcie_capability_read_word()
  iwlegacy: Use standard #defines for PCIe Capability ASPM fields
  iwlegacy: collapse wrapper for pcie_capability_read_word()
  cxgb3: Use standard #defines for PCIe Capability ASPM fields
  PCI: Add standard PCIe Capability Link ASPM field names
  PCI/portdrv: Use PCI Express Capability accessors
  PCI: Use standard PCIe Capability Link register field names
  x86: Use PCI setup data
  PCI: Add support for non-BAR ROMs
  PCI: Add pcibios_add_device
  EFI: Stash ROMs if they're not in the PCI BAR
  PCI: Add and use standard PCI-X Capability register names
  PCI/PM: Keep runtime PM enabled for unbound PCI devices
  xen-pcifront: Handle backend CLOSED without CLOSING
  PCI: SRIOV control and status via sysfs (documentation)
  PCI/AER: Report success only when every device has AER-aware driver
  ...
2012-12-13 12:14:47 -08:00
Linus Torvalds a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Jim Kukunas 7056741fd9 lib/raid6: Add AVX2 optimized recovery functions
Optimize RAID6 recovery functions to take advantage of
the 256-bit YMM integer instructions introduced in AVX2.

The patch was tested and benchmarked before submission.
However hardware is not yet released so benchmark numbers
cannot be reported.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-12-13 16:42:01 +11:00
Linus Torvalds 6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
David Rientjes c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Lai Jiangshan 4b0ef1fe8a page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Kirill A. Shutemov e180377f1a thp: change split_huge_page_pmd() interface
Pass vma instead of mm and add address parameter.

In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.

This change is preparation to huge zero pmd splitting implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:31 -08:00
Linus Torvalds 9977d9b379 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild
2012-12-12 12:22:13 -08:00
Linus Torvalds 1ebaf4f4e6 Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer update from Ingo Molnar:
 "This tree includes HPET fixes and also implements a calibration-free,
  TSC match driven APIC timer interrupt mode: 'TSC deadline mode'
  supported in SandyBridge and later CPUs."

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: hpet: Fix inverted return value check in arch_setup_hpet_msi()
  x86: hpet: Fix masking of MSI interrupts
  x86: apic: Use tsc deadline for oneshot when available
2012-12-11 20:01:33 -08:00
Linus Torvalds 743aa456c1 Merge branch 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull "Nuke 386-DX/SX support" from Ingo Molnar:
 "This tree removes ancient-386-CPUs support and thus zaps quite a bit
  of complexity:

    24 files changed, 56 insertions(+), 425 deletions(-)

  ... which complexity has plagued us with extra work whenever we wanted
  to change SMP primitives, for years.

  Unfortunately there's a nostalgic cost: your old original 386 DX33
  system from early 1991 won't be able to boot modern Linux kernels
  anymore.  Sniff."

I'm not sentimental.  Good riddance.

* 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, 386 removal: Document Nx586 as a 386 and thus unsupported
  x86, cleanups: Simplify sync_core() in the case of no CPUID
  x86, 386 removal: Remove CONFIG_X86_POPAD_OK
  x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK
  x86, 386 removal: Remove CONFIG_INVLPG
  x86, 386 removal: Remove CONFIG_BSWAP
  x86, 386 removal: Remove CONFIG_XADD
  x86, 386 removal: Remove CONFIG_CMPXCHG
  x86, 386 removal: Remove CONFIG_M386 from Kconfig
2012-12-11 19:59:32 -08:00
Linus Torvalds a05a4e24dc Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 topology discovery improvements from Ingo Molnar:
 "These changes improve topology discovery on AMD CPUs.

  Right now this feeds information displayed in
  /sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we
  could use this to set up a better scheduling topology."

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD
  x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD
  x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD
  x86: Add cpu_has_topoext
2012-12-11 19:58:29 -08:00
Linus Torvalds e9a5a91971 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Small cleanups."

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix the error of using "const" in gen-insn-attr-x86.awk
  x86, apic: Cleanup cfg->domain setup for legacy interrupts
  x86: Remove dead hlt_use_halt code
2012-12-11 19:57:56 -08:00
Linus Torvalds 74b8423345 Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 BSP hotplug changes from Ingo Molnar:
 "This tree enables CPU#0 (the boot processor) to be onlined/offlined on
  x86, just like any other CPU.  Enabled on Intel CPUs for now.

  Allowing this required the identification and fixing of latent CPU#0
  assumptions (such as CPU#0 initializations, etc.) in the x86
  architecture code, plus the identification of barriers to
  BSP-offlining, such as active PIC interrupts which can only be
  serviced on the BSP.

  It's behind a default-off option, and there's a debug option that
  allows the automatic testing of this feature.

  The motivation of this feature is to allow and prepare for true
  CPU-hotplug hardware support: recent changes to MCE support enable us
  to detect a deteriorating but not yet hard-failing L1/L2 cache on a
  CPU that could be soft-unplugged - or a failing L3 cache on a
  multi-socket system.

  Note that true hardware hot-plug is not yet fully enabled by this,
  because that requires a special platform wakeup sequence to be sent to
  the freshly powered up CPU#0.  Future patches for this are planned,
  once such a platform exists.  Chicken and egg"

* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, topology: Debug CPU0 hotplug
  x86/i387.c: Initialize thread xstate only on CPU0 only once
  x86, hotplug: Handle retrigger irq by the first available CPU
  x86, hotplug: The first online processor saves the MTRR state
  x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
  x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
  x86-32, hotplug: Add start_cpu0() entry point to head_32.S
  x86-64, hotplug: Add start_cpu0() entry point to head_64.S
  kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
  x86, hotplug, suspend: Online CPU0 for suspend or hibernate
  x86, hotplug: Support functions for CPU0 online/offline
  x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
  x86, Kconfig: Add config switch for CPU0 hotplug
  doc: Add x86 CPU0 online/offline feature
2012-12-11 19:56:33 -08:00
Linus Torvalds 5074474737 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Ingo Molnar:
 "Two small changes: a cleanup and allow CONFIG_X86_MPPARSE to be turned
  off on SFI as well."

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present
  x86/boot/doc: Fix grammar and typo in boot.txt
2012-12-11 19:55:58 -08:00
Linus Torvalds 0019fab355 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "Two fixlets and a cleanup."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_32: Return actual stack when requesting sp from regs
  x86: Don't clobber top of pt_regs in nested NMI
  x86/asm: Clean up copy_page_*() comments and code
2012-12-11 19:55:20 -08:00
Linus Torvalds 090f8ccba3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Lots of activity:

   211 files changed, 8328 insertions(+), 4116 deletions(-)

  most of it on the tooling side.

  Main changes:

   * ftrace enhancements and fixes from Steve Rostedt.

   * uprobes fixes, cleanups and preparation for the ARM port from Oleg
     Nesterov.

   * UAPI fixes, from David Howels - prepares the arch/x86 UAPI
     transition

   * Separate perf tests into multiple objects, one per test, from Jiri
     Olsa.

   * Make hardware event translations available in sysfs, from Jiri
     Olsa.

   * Fixes to /proc/pid/maps parsing, preparatory to supporting data
     maps, from Namhyung Kim

   * Implement ui_progress for GTK, from Namhyung Kim

   * Add framework for automated perf_event_attr tests, where tools with
     different command line options will be run from a 'perf test', via
     python glue, and the perf syscall will be intercepted to verify
     that the perf_event_attr fields set by the tool are those expected,
     from Jiri Olsa

   * Add a 'link' method for hists, so that we can have the leader with
     buckets for all the entries in all the hists.  This new method is
     now used in the default 'diff' output, making the sum of the
     'baseline' column be 100%, eliminating blind spots.

   * libtraceevent fixes for compiler warnings trying to make perf it
     build on some distros, like fedora 14, 32-bit, some of the warnings
     really pointed to real bugs.

   * Add a browser for 'perf script' and make it available from the
     report and annotate browsers.  It does filtering to find the
     scripts that handle events found in the perf.data file used.  From
     Feng Tang

   * perf inject changes to allow showing where a task sleeps, from
     Andrew Vagin.

   * Makefile improvements from Namhyung Kim.

   * Add --pre and --post command hooks in 'stat', from Peter Zijlstra.

   * Don't stop synthesizing threads when one vanishes, this is for the
     existing threads when we start a tool like trace.

   * Use sched:sched_stat_runtime to provide a thread summary, this
     produces the same output as the 'trace summary' subcommand of
     tglx's original "trace" tool.

   * Support interrupted syscalls in 'trace'

   * Add an event duration column and filter in 'trace'.

   * There are references to the man pages in some tools, so try to
     build Documentation when installing, warning the user if that is
     not possible, from Borislav Petkov.

   * Give user better message if precise is not supported, from David
     Ahern.

   * Try to find cross-built objdump path by using the session
     environment information in the perf.data file header, from Irina
     Tirdea, original patch and idea by Namhyung Kim.

   * Diplays more output on features check for make V=1, so that one can
     figure out what is happening by looking at gcc output, etc.  From
     Jiri Olsa.

   * Add on_exit implementation for systems without one, e.g.  Android,
     from Bernhard Rosenkraenzer.

   * Only process events for vcpus of interest, helps handling large
     number of events, from David Ahern.

   * Cross compilation fixes for Android, from Irina Tirdea.

   * Add documentation on compiling for Android, from Irina Tirdea.

   * perf diff improvements from Jiri Olsa.

   * Target (task/user/cpu/syswide) handling improvements, from Namhyung
     Kim.

   * Add support in 'trace' for tracing workload given by command line,
     from Namhyung Kim.

   * ... and much more."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
  uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
  perf evsel: Introduce is_group_member method
  perf powerpc: Use uapi/unistd.h to fix build error
  tools: Pass the target in descend
  tools: Honour the O= flag when tool build called from a higher Makefile
  tools: Define a Makefile function to do subdir processing
  perf ui: Always compile browser setup code
  perf ui: Add ui_progress__finish()
  perf ui gtk: Implement ui_progress functions
  perf ui: Introduce generic ui_progress helper
  perf ui tui: Move progress.c under ui/tui directory
  perf tools: Add basic event modifier sanity check
  perf tools: Omit group members from perf_evlist__disable/enable
  perf tools: Ensure single disable call per event in record comand
  perf tools: Fix 'disabled' attribute config for record command
  perf tools: Fix attributes for '{}' defined event groups
  perf tools: Use sscanf for parsing /proc/pid/maps
  perf tools: Add gtk.<command> config option for launching GTK browser
  perf tools: Fix compile error on NO_NEWT=1 build
  perf hists: Initialize all of he->stat with zeroes
  ...
2012-12-11 18:14:31 -08:00
Linus Torvalds 37ea95a959 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU update from Ingo Molnar:
 "The major features of this tree are:

     1. A first version of no-callbacks CPUs.  This version prohibits
        offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
        Relaxing this constraint is in progress, but not yet ready
        for prime time.  These commits were posted to LKML at
        https://lkml.org/lkml/2012/10/30/724.

     2. Changes to SRCU that allows statically initialized srcu_struct
        structures.  These commits were posted to LKML at
        https://lkml.org/lkml/2012/10/30/296.

     3. Restructuring of RCU's debugfs output.  These commits were posted
        to LKML at https://lkml.org/lkml/2012/10/30/341.

     4. Additional CPU-hotplug/RCU improvements, posted to LKML at
        https://lkml.org/lkml/2012/10/30/327.
        Note that the commit eliminating __stop_machine() was judged to
        be too-high of risk, so is deferred to 3.9.

     5. Changes to RCU's idle interface, most notably a new module
        parameter that redirects normal grace-period operations to
        their expedited equivalents.  These were posted to LKML at
        https://lkml.org/lkml/2012/10/30/739.

     6. Additional diagnostics for RCU's CPU stall warning facility,
        posted to LKML at https://lkml.org/lkml/2012/10/30/315.
        The most notable change reduces the
        default RCU CPU stall-warning time from 60 seconds to 21 seconds,
        so that it once again happens sooner than the softlockup timeout.

     7. Documentation updates, which were posted to LKML at
        https://lkml.org/lkml/2012/10/30/280.
        A couple of late-breaking changes were posted at
        https://lkml.org/lkml/2012/11/16/634 and
        https://lkml.org/lkml/2012/11/16/547.

     8. Miscellaneous fixes, which were posted to LKML at
        https://lkml.org/lkml/2012/10/30/309.

     9. Finally, a fix for an lockdep-RCU splat was posted to LKML
        at https://lkml.org/lkml/2012/11/7/486."

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  context_tracking: New context tracking susbsystem
  sched: Mark RCU reader in sched_show_task()
  rcu: Separate accounting of callbacks from callback-free CPUs
  rcu: Add callback-free CPUs
  rcu: Add documentation for the new rcuexp debugfs trace file
  rcu: Update documentation for TREE_RCU debugfs tracing
  rcu: Reduce default RCU CPU stall warning timeout
  rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check
  rcu: Clarify memory-ordering properties of grace-period primitives
  rcu: Add new rcutorture module parameters to start/end test messages
  rcu: Remove list_for_each_continue_rcu()
  rcu: Fix batch-limit size problem
  rcu: Add tracing for synchronize_sched_expedited()
  rcu: Remove old debugfs interfaces and also RCU flavor name
  rcu: split 'rcuhier' to each flavor
  rcu: split 'rcugp' to each flavor
  rcu: split 'rcuboost' to each flavor
  rcu: split 'rcubarrier' to each flavor
  rcu: Fix tracing formatting
  rcu: Remove the interface "rcudata.csv"
  ...
2012-12-11 18:10:49 -08:00
Linus Torvalds 608ff1a210 Merge branch 'akpm' (Andrew's patchbomb)
Merge misc updates from Andrew Morton:
 "About half of most of MM.  Going very early this time due to
  uncertainty over the coreautounifiednumasched things.  I'll send the
  other half of most of MM tomorrow.  The rest of MM awaits a slab merge
  from Pekka."

* emailed patches from Andrew Morton: (71 commits)
  memory_hotplug: ensure every online node has NORMAL memory
  memory_hotplug: handle empty zone when online_movable/online_kernel
  mm, memory-hotplug: dynamic configure movable memory and portion memory
  drivers/base/node.c: cleanup node_state_attr[]
  bootmem: fix wrong call parameter for free_bootmem()
  avr32, kconfig: remove HAVE_ARCH_BOOTMEM
  mm: cma: remove watermark hacks
  mm: cma: skip watermarks check for already isolated blocks in split_free_page()
  mm, oom: fix race when specifying a thread as the oom origin
  mm, oom: change type of oom_score_adj to short
  mm: cleanup register_node()
  mm, mempolicy: remove duplicate code
  mm/vmscan.c: try_to_freeze() returns boolean
  mm: introduce putback_movable_pages()
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce compaction and migration for ballooned pages
  mm: introduce a common interface for balloon pages mobility
  mm: redefine address_space.assoc_mapping
  mm: adjust address_space_operations.migratepage() return code
  arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/
  ...
2012-12-11 18:05:37 -08:00
Michel Lespinasse cdc1734495 mm: use vm_unmapped_area() in hugetlbfs on i386 architecture
Update the i386 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:25 -08:00
Michel Lespinasse 7d02505965 mm: fix cache coloring on x86_64 architecture
Fix the x86-64 cache alignment code to take pgoff into account.  Use the
x86 and MIPS cache alignment code as the basis for a generic cache
alignment function.

The old x86 code will always align the mmap to aliasing boundaries,
even if the program mmaps the file with a non-zero pgoff.

If program A mmaps the file with pgoff 0, and program B mmaps the file
with pgoff 1.  The old code would align the mmaps, resulting in misaligned
pages:

  A:  0123
  B:  123

After this patch, they are aligned so the pages line up:

  A: 0123
  B:  123

Proposed by Rik van Riel.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:25 -08:00