Commit Graph

13447 Commits (819791319becde19e32788a34cc2556aef9f9e6d)

Author SHA1 Message Date
Geert Uytterhoeven fffe52e86b ps3av: misc updates
ps3av:
  - Move the definition of struct ps3av to ps3av.c, as it's locally used only.
  - Kill ps3av.sem, use the existing ps3av.mutex instead.
  - Make the 512-byte buffer in ps3av_do_pkt() static to reduce stack usage.
    Its use is protected by a semaphore anyway.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:08 -07:00
Geert Uytterhoeven 5caf5db887 ps3av: thread updates
ps3av: Replace the kernel_thread and the ping pong semaphores by a singlethread
workqueue and a completion.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:08 -07:00
Geert Uytterhoeven 254f9c5cd2 Convert non-highmem kmap_atomic() to static inline function
Convert kmap_atomic() in the non-highmem case from a macro to a static
inline function, for better type-checking and the ability to pass void
pointers instead of struct page pointers.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:08 -07:00
Finn Thain f877958879 NuBus header update
Sync the nubus defines with the latest code in the mac68k repo. Some of these
are needed for DP8390 driver update in the next patch.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:07 -07:00
Finn Thain df7e7d6a89 m68k: remove unused adb.h
The asm-m68k/adb.h header is unused. Some definitions are wrong and the rest
are duplicated in linux/adb.h. Remove it.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:07 -07:00
Roman Zippel b3e2fd9ceb lockdep: Add missing disable/enable irq variant
Add missing disable/enable irq variant

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:06 -07:00
Michael Schmitz c04cb856e2 m68k: Atari keyboard and mouse support.
Atari keyboard and mouse support.
(reformating and Kconfig fixes by Roman Zippel)

Signed-off-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:59:05 -07:00
Linus Torvalds 8d41f0e8d5 Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (44 commits)
  i2c-s3c2410: Fix bug in releasing driver
  i2c-s3c2410: Fix I2C SDA to SCL setup time
  i2c: New i2c-tiny-usb bus driver
  i2c: Documentation update
  i2c: SPIN_LOCK_UNLOCKED cleanup
  i2c: Obsolete i2c-ixp2000, i2c-ixp4xx and scx200_i2c
  i2c: New Simtec I2C bus driver
  i2c: Bitbanging I2C bus driver using the GPIO API
  Use menuconfig objects - I2C
  i2c: Restore i2c_smbus_read_block_data
  i2c-pxa: Clean transaction stop
  i2c-algo-bit: Improve debugging
  i2c-algo-bit: Implement a 50/50 SCL duty cycle
  i2c-omap: Switch to static adapter numbering
  i2c: Blackfin Two Wire Interface driver
  i2c-algo-sgi: Comment and whitespace cleanups
  i2c: Make i2c_del_driver a void function
  i2c: Move i2c-isa-only exported symbol declarations
  i2c: Document i2c_new_device()
  i2c: Add i2c_new_probed_device()
  ...

Fixed trivial conflict in Documentation/feature-removal-schedule.txt manually.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04 17:46:27 -07:00
Linus Torvalds ded1504dfa Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Report the number of processors in PowerNow-k8 correctly
  [CPUFREQ] do not declare undefined functions
  [CPUFREQ] cleanup kconfig options
  [CPUFREQ] Longhaul - Revert Longhaul ver. 2
  [CPUFREQ] Remove deprecated /proc/acpi/processor/performance write support
  [CPUFREQ] Fix limited cpufreq when booted on battery
  Fix preemption warnings in speedstep-centrino.c
  [CPUFREQ] Longhaul - Correct PCI code
  [CPUFREQ] p4-clockmod: switch to rdmsr_on_cpu/wrmsr_on_cpu
2007-05-04 17:38:48 -07:00
Linus Torvalds 98b96173c7 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart
* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
  [AGPGART] sworks-agp: Switch to PCI ref counting APIs
  [AGPGART] Nvidia AGP: Use refcount aware PCI interfaces
  [AGPGART] Fix sparse warning in sgi-agp.c
  [AGPGART] Intel-agp adjustments
  [AGPGART] Move [un]map_page_into_agp into asm/agp.h
  [AGPGART] Add missing calls to global_flush_tlb() to ali-agp
  [AGPGART] prevent probe collision of sis-agp and amd64_agp
2007-05-04 17:38:16 -07:00
Vlad Yasevich 07d9396771 [SCTP]: Set assoc_id correctly during INIT collision.
During the INIT/COOKIE-ACK collision cases, it's possible to get
into a situation where the association id is not yet set at the time
of the user event generation.  As a result, user events have an
association id set to 0 which will confuse applications.

This happens if we hit case B of duplicate cookie processing.
In the particular example found and provided by Oscar Isaula
<Oscar.Isaula@motorola.com>, flow looks like this:
A				B
---- INIT------->  (lost)
	    <---------INIT------
---- INIT-ACK--->
	    <------ Cookie ECHO

When the Cookie Echo is received, we end up trying to update the
association that was created on A as a result of the (lost) INIT,
but that association doesn't have the ID set yet.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 13:55:27 -07:00
Sridhar Samudrala 827bf12236 [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 13:36:30 -07:00
Jamal Hadi Salim 5a6d34162f [XFRM] SPD info TLV aggregation
Aggregate the SPD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:39 -07:00
Jamal Hadi Salim af11e31609 [XFRM] SAD info TLV aggregationx
Aggregate the SAD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:13 -07:00
Jennifer Hunt 561e036006 [AF_IUCV]: Implementation of a skb backlog queue
With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.

Using a skb backlog queue is fixing all of these problems .

Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:22:07 -07:00
Martin Schwidefsky cf8ba7a955 [S390] add hardware capability support (ELF_HWCAP).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-04 18:48:35 +02:00
Cornelia Huck 52706ec903 [S390] cio: Deprecate read_dev_chars() and read_conf_data{,_lpm}().
These helper functions are a leftover from 2.4 sync I/O and are a
notorious source for bugs. They lead to device driver specific code
creeping into cio, and some issues can't really be fixed at all.

Device drivers can easily implement those functions themselves in a
more robust manner, so let's get rid of them.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-04 18:48:25 +02:00
Christoph Hellwig 33464e3b57 [S390] get rid of kprobes notifier call chain.
And here's a port of the powerpc patch to get rid of the notifier
chain completely to s390.  It's ontop of Martins patch as that one
is in mainline already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-05-04 18:48:24 +02:00
Eric Dumazet db3459d1a7 [IPV6]: Some cleanups in include/net/ipv6.h
1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits
   holes for 64bit arches. Shrinks by 8 bytes sizeof(struct
   ip6_flowlabel)

2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void *) casts :
   Compiler might take into account natural alignement of in6_addr
   structs to emit better code for memcpy()/memcmp() Casts to (void *)
   force byte accesses.

3) ipv6_addr_prefix() optimization :

Better to clear whole struct, as compiler can emit better code for
memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional
branches.

# size vmlinux.after vmlinux.before
   text    data     bss     dec     hex filename
5262262  647612  557432 6467306  62aeea vmlinux.after
5262550  647612  557432 6467594  62b00a vmlinux.before

thats 288 bytes saved.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 17:39:04 -07:00
Pavel Emelianov 7562f876cd [NET]: Rework dev_base via list_head (v3)
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 15:13:45 -07:00
Michael Chan 27a005b883 [BNX2]: Add support for 5709 Serdes.
Add PCI ID and code to support the 5709 Serdes PHY.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 13:23:41 -07:00
Michael Chan 427c2196b9 [ETHTOOL]: Add 2.5G bit definitions.
Add 2.5G supported and advertising bit definitions.  2.5G is supported
by the bnx2 driver.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 13:17:25 -07:00
Sascha Hauer ff4bfb2163 [ARM] 4328/1: Move i.MX UART regs to driver
This patch moves the i.MX UART register descriptions from
include/asm-arm/arch-imx/imx-regs.h to the serial driver itself.
This helps using the driver on other architectures like mx31

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 20:24:21 +01:00
Sascha Hauer fe7fdb80e9 [ARM] 4329/1: fix position of NETX_SYSTEM_REG
This patch fixes the position of the netx reset control register

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 20:22:49 +01:00
Russell King 5559bca8e6 [ARM] ecard: Convert card type enum to a flag
'type' in the struct expansion_card is only used to indicate
whether this card is an EASI card or not.  Therefore, having
it as an enum is wasteful (and introduces additional noise
when we come to remove the enum.)  Convert it to a mere flag
instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:16:56 +01:00
Russell King c0b04d1b2c [ARM] ecard: Move private ecard junk out of asm/ecard.h
Move ecard.c private junk from asm/ecard.h to a local header file.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:16:56 +01:00
Andrew Victor 7776a94c31 [ARM] 4352/1: AT91: Platform data for LCD and AC97.
Define resources, platform_device and device registration functions for
the LCD and AC97 controllers on the AT91SAM9263.
Also update the AT91SAM9261 to use the common atmel_lcdfb driver.

Signed-off-by: Nicolas Ferre <nicolas.ferre@rfo.atmel.com>
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:10:22 +01:00
Andrew Victor ce813b97e5 [ARM] 4350/1: AT91: Hardware header for ADC peripheral
Definitions for Analog-to-Digital Converter (ADC) found on the Atmel
AT91SAM9260 processor.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:10:20 +01:00
Dan Williams d2dd8b1fed [ARM] 4342/2: iop13xx: add resource definitions for the tpmi units
The tpmi units interface with the SAS controller on iop348.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:03:54 +01:00
Dan Williams e90ddd813d [ARM] 4348/4: iop3xx: Give Linux control over PCI initialization
Currently the iop3xx platform support code assumes that RedBoot is the
bootloader and has already initialized the ATU.  Linux should handle this
initialization for three reasons:

1/ The memory map that RedBoot sets up is not optimal (page_to_dma and
virt_to_phys return different addresses).  The effect of this is that using
the dma mapping API for the internal bus dma units generates pci bus
addresses that are incorrect for the internal bus.

2/ Not all iop platforms use RedBoot

3/ If the ATU is already initialized it indicates that the iop is an add-in
card in another host, it does not own the PCI bus, and should not be
re-initialized.

Changelog:
* rather than change nr_controllers to zero, simply do not call
  pci_common_init

Cc: Lennert Buytenhek <kernel@wantstofly.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-03 14:02:48 +01:00
Patrick McHardy fc38582db9 [NETFILTER]: bridge netfilter: consolidate header pushing/pulling code
Consolidate the common push/pull sequences into a few helper functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:36:16 -07:00
Jorge Boncompte c2a1910b06 [NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.

The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.

Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:34:42 -07:00
Ilpo Järvinen 0ec96822d5 [TCP]: Use S+L catcher only with SACK for now
TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:30:34 -07:00
Patrick McHardy 4e9cac2ba4 [NET]: Add __dev_getfirstbyhwtype
Add __dev_getfirstbyhwtype for callers that don't want a reference but
some data from the device and thus need to take the rtnl anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:28:13 -07:00
Randy Dunlap be52178b9f [NET] skbuff: fix kernel-doc
Fix skbuff.h kernel-doc:
linux-2.6.21-git4//include/linux/skbuff.h:316): No description found for parameter 'transport_header'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:16:20 -07:00
David Howells ef4533f8af [AFS]: Make the match_*() functions take const options.
Make the match_*() functions take a const pointer to the options table
and make strings pointers in the options table const too.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:10:39 -07:00
Eric Dumazet 709525fad8 [IPV6]: Get rid of __HAVE_ARCH_ADDR_SET.
__HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:08:43 -07:00
Avi Kivity 2ff81f70b5 KVM: Remove unused 'instruction_length'
As we no longer emulate in userspace, this is meaningless.  We don't
compute it on SVM anyway.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity 02c8320972 KVM: Don't require explicit indication of completion of mmio or pio
It is illegal not to return from a pio or mmio request without completing
it, as mmio or pio is an atomic operation.  Therefore, we can simplify
the userspace interface by avoiding the completion indication.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:32 +03:00
Avi Kivity b8836737d9 KVM: Add fpu get/set operations
These are really helpful when migrating an floating point app to another
machine.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity e8207547d2 KVM: Add physical memory aliasing feature
With this, we can specify that accesses to one physical memory range will
be remapped to another.  This is useful for the vga window at 0xa0000 which
is used as a movable window into the (much larger) framebuffer.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:28 +03:00
Avi Kivity 039576c03c KVM: Avoid guest virtual addresses in string pio userspace interface
The current string pio interface communicates using guest virtual addresses,
relying on userspace to translate addresses and to check permissions.  This
interface cannot fully support guest smp, as the check needs to take into
account two pages at one in case an unaligned string transfer straddles a
page boundary.

Change the interface not to communicate guest addresses at all; instead use
a buffer page (mmaped by userspace) and do transfers there.  The kernel
manages the virtual to physical translation and can perform the checks
atomically by taking the appropriate locks.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:25 +03:00
Avi Kivity 07c45a366d KVM: Allow kernel to select size of mmap() buffer
This allows us to store offsets in the kernel/user kvm_run area, and be
sure that userspace has them mapped.  As offsets can be outside the
kvm_run struct, userspace has no way of knowing how much to mmap.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 1961d276c8 KVM: Add guest mode signal mask
Allow a special signal mask to be used while executing in guest mode.  This
allows signals to be used to interrupt a vcpu without requiring signal
delivery to a userspace handler, which is quite expensive.  Userspace still
receives -EINTR and can get the signal via sigwait().

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 1b19f3e61d KVM: Add a special exit reason when exiting due to an interrupt
This is redundant, as we also return -EINTR from the ioctl, but it
allows us to examine the exit_reason field on resume without seeing
old data.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 8eb7d334bd KVM: Fold kvm_run::exit_type into kvm_run::exit_reason
Currently, userspace is told about the nature of the last exit from the
guest using two fields, exit_type and exit_reason, where exit_type has
just two enumerations (and no need for more).  So fold exit_type into
exit_reason, reducing the complexity of determining what really happened.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity b4e63f560b KVM: Allow userspace to process hypercalls which have no kernel handler
This is useful for paravirtualized graphics devices, for example.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 5d308f4550 KVM: Add method to check for backwards-compatible API extensions
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:24 +03:00
Avi Kivity 739872c56f KVM: Renumber ioctls
The recent changes have left the ioctl numbers in complete disarray.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 2a4dac3952 KVM: Remove minor wart from KVM_CREATE_VCPU ioctl
That ioctl does not transfer any data, so it should be an _IO rather than an
_IOW.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 106b552b43 KVM: Remove the 'emulated' field from the userspace interface
We no longer emulate single instructions in userspace.  Instead, we service
mmio or pio requests.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 06465c5a3a KVM: Handle cpuid in the kernel instead of punting to userspace
KVM used to handle cpuid by letting userspace decide what values to
return to the guest.  We now handle cpuid completely in the kernel.  We
still let userspace decide which values the guest will see by having
userspace set up the value table beforehand (this is necessary to allow
management software to set the cpu features to the least common denominator,
so that live migration can work).

The motivation for the change is that kvm kernel code can be impacted by
cpuid features, for example the x86 emulator.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 46fc147788 KVM: Do not communicate to userspace through cpu registers during PIO
Currently when passing the a PIO emulation request to userspace, we
rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx
(on string instructions).  This (a) requires two extra ioctls for getting
and setting the registers and (b) is unfriendly to non-x86 archs, when
they get kvm ports.

So fix by doing the register fixups in the kernel and passing to userspace
only an abstract description of the PIO to be done.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity 9a2bb7f486 KVM: Use a shared page for kernel/user communication when runing a vcpu
Instead of passing a 'struct kvm_run' back and forth between the kernel and
userspace, allocate a page and allow the user to mmap() it.  This reduces
needless copying and makes the interface expandable by providing lots of
free space.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:23 +03:00
Avi Kivity ff42697436 KVM: Export <linux/kvm.h>
This allows users to actually build prgrams that use kvm without
the entire source tree.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:22 +03:00
Avi Kivity bbe4432e66 KVM: Use own minor number
Use the minor number (232) allocated to kvm by lanana.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03 10:52:22 +03:00
Adrian Bunk ecf36501bc PCI: the overdue removal of pci_module_init()
Unless we finally completely remove it, people will always add new users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Adrian Bunk 5adc55da4a PCI: remove the broken PCI_MULTITHREAD_PROBE option
This patch removes the PCI_MULTITHREAD_PROBE option that had already 
been marked as broken.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 032de8e2fe MSI: Give archs the option to free all MSI/Xs at once.
This patch introduces an optional function, arch_teardown_msi_irqs(),
which gives an arch the opportunity to do per-device teardown for
MSI/X. If that's not required, the default version simply calls
arch_teardown_msi_irq() for each msi irq required.

arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev
is a list of msi_descs, it is up to the arch to free the irq associated
with each of these as appropriate.

For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs
with irq == 0 are considered unallocated, and the arch teardown routine
is not called on them.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 9c8313343c MSI: Give archs the option to allocate all MSI/Xs at once.
This patch introduces an optional function, arch_setup_msi_irqs(),
(note the plural) which gives an arch the opportunity to do per-device
setup for MSI/X and then allocate all the requested MSI/Xs at once.

If that's not required by the arch, the default version simply calls
arch_setup_msi_irq() for each MSI irq required.

arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list
of msi_descs with irq == 0, it is up to the arch to connect these up to
an irq (via set_irq_msi()) or return an error. For convenience the number
of vectors and the type are passed also.

All msi_descs with irq != 0 are considered allocated, and the arch
teardown routine will be called on them when necessary.

The existing semantics of pci_enable_msix() are that if the requested
number of irqs can not be allocated, the maximum number that _could_ be
allocated is returned. To support that, we define that in case of an
error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0
are considered allocated, and are counted toward the "max that could be
allocated".


Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 314e77b3ee MSI: Remove dev->first_msi_irq
Now that we keep a list of msi descriptors, we don't need first_msi_irq
in the pci dev.

If we somehow have zero MSIs configured list_entry() will give us weird
oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also
a check in pci_msi_check_device() to make sure nvec > 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman 4aa9bc955d MSI: Use a list instead of the custom link structure
The msi descriptors are linked together with what looks a lot like
a linked list, but isn't a struct list_head list. Make it one.

The only complication is that previously we walked a list of irqs, and
got the descriptor for each with get_irq_msi(). Now we have a list of
descriptors and need to get the irq out of it, so it needs to be in the
actual struct msi_desc. We use 0 to indicate no irq is setup.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman 65891215e6 PCI: Create alloc_pci_dev(), the one true way to create a struct pci_dev
There are currently several places in the kernel where we kmalloc()
a struct pci_dev and start initialising it. It'd be preferable to
have an allocator so we can ensure the pci_dev is correctly initialised
in one place.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman c9953a73e9 MSI: Add an arch_msi_check_device()
Add an arch_check_device(), which gives archs a chance to check the input
to pci_enable_msi/x. The arch might be interested in the value of nvec so
pass it in. Propagate the error value returned from the arch routine out
to the caller.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Sergei Shtylyov 0da0ead901 PCI: define pci_request/release_regions() for CONFIG_PCI=n
Balance declarations of pci_request_regions() and pci_release_regions() with
empty inline definitions for the CONFIG_PCI=n case -- otherwise my patch to
drivers/net/3c59x.c in the -mm tree doesn't compile. :-)

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Jean Delvare 6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Jean Delvare a9dfd281a7 PCI: scatterlist.h needs types.h
Most architectures' scatterlist.h use the type dma_addr_t, but omit to
include <asm/types.h> which defines it.  This could lead to build failures,
so let's add the missing includes.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:34 -07:00
Brian King f7bdd12d23 pci: New PCI-E reset API
Adds a new API which can be used to issue various types
of PCI-E reset, including PCI-E warm reset and PCI-E hot reset.
This is needed for an ipr PCI-E adapter which does not properly
implement BIST. Running BIST on this adapter results in PCI-E
errors. The only reliable reset mechanism that exists on this
hardware is PCI Fundamental reset (warm reset). Since driving
this type of reset is architecture unique, this provides the
necessary hooks for architectures to add this support.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:34 -07:00
Greg Kroah-Hartman 823bccfc40 remove "struct subsystem" as it is no longer needed
We need to work on cleaning up the relationship between kobjects, ksets and
ktypes.  The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 18:57:59 -07:00
Sam Ravnborg dc24f0e708 kbuild: remove dependency on input.h from file2alias
Almost all definitions used by file2alias was already
present in mod_devicetable.h.
Added the last definition and killed the input.h usage.

The errornous include was pointed out
by: Jan Engelhardt <jengelh@linux01.gwdg.de>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Deepak Saxena <dsaxena@plexity.net>
2007-05-02 20:58:08 +02:00
Jan Kiszka c41bf8fa5e [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and
none of them appear to require additional preempt_disable/enable here.
Let's open-code save_init_fpu in __unlazy_fpu to save a few ops.

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Jan Kiszka 02b64dab56 [PATCH] i386: white space fixes in i387.h
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen c5bcb5635a [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
RDTSCP is already synchronous and doesn't need an explicit CPUID.
This is a little faster and more importantly avoids VMEXITs on Hypervisors.

Original patch from Joerg Roedel, but reworked by AK
Also includes miscompilation fix by Eric Biederman

Cc: "Joerg Roedel" <joerg.roedel@amd.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen 9bccb23dc5 [PATCH] i386: Add X86_FEATURE_RDTSCP
Following x86-64
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 3aefbe0746 [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
Syncs up with x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen e859dc553c [PATCH] i386: Implement alternative_io for i386
Ported from x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 3671df8572 [PATCH] i386: Evaluate constant cpu features at runtime
Redefine cpu_has() to evaluate cpu features already checked in early
boot at compile time.  This way the compiler might eliminate some dead code.
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen c7f81c9453 [PATCH] i386: Verify important CPUID bits in real mode
Check some CPUID bits that are needed for compiler generated early in boot.
When the system is still in real mode before changing the VESA BIOS mode
it is possible to still display an visible error message on the screen.

Similar to x86-64.

Includes cleanups from Eric Biederman

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 05cb007dac [PATCH] x86-64: Use the 32bit wd_ops for 64bit too.
This mainly removes a lot of code, replacing it with calls into the new 32bit
perfctr-watchdog.c

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen 09198e6850 [PATCH] i386: Clean up NMI watchdog code
- Introduce a wd_ops structure
- Convert the various nmi watchdogs over to it
- This allows to split the perfctr reservation from the watchdog
setup cleanly.
- Do perfctr reservation globally as it should have always been
- Remove dead code referenced only by unused EXPORT_SYMBOLs

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Zachary Amsden 9e5e3162b2 [PATCH] i386: pte simplify ops
Add comment and condense code to make use of native_local_ptep_get_and_clear
function.  Also, it turns out the 2-level and 3-level paging definitions were
identical, so move the common definition into pgtable.h

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Zachary Amsden 142dd97591 [PATCH] i386: pte xchg optimization
In situations where page table updates need only be made locally, and there is
no cross-processor A/D bit races involved, we need not use the heavyweight
xchg instruction to atomically fetch and clear page table entries.  Instead,
we can just read and clear them directly.

This introduces a neat optimization for non-SMP kernels; drop the atomic xchg
operations from page table updates.

Thanks to Michel Lespinasse for noting this potential optimization.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Zachary Amsden c2c1accd4b [PATCH] i386: pte clear optimization
When exiting from an address space, no special hypervisor notification of page
table updates needs to occur; direct page table hypervisors, such as Xen,
switch to another address space first (init_mm) and unprotects the page tables
to avoid the cost of trapping to the hypervisor for each pte_clear.  Shadow
mode hypervisors, such as VMI and lhype don't need to do the extra work of
calling through paravirt-ops, and can just directly clear the page table
entries without notifiying the hypervisor, since all the page tables are about
to be freed.

So introduce native_pte_clear functions which bypass any paravirt-ops
notification.  This results in a significant performance win for VMI and
removes some indirect calls from zap_pte_range.

Note the 3-level paging already had a native_pte_clear function, thus
demanding argument conformance and extra args for the 2-level definition.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:19 +02:00
Andi Kleen 57a4f91ae5 [PATCH] x86-64: Auto compute __NR_syscall_max at compile time
No need to maintain it anymore

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis [** ISO-8859-1 charset **] VázquezCao 70ae77f497 [PATCH] x86-64: Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64
Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
NMI_VECTOR to avoid potential hangups in the event of crash when kdump
tries to stop the other CPUs.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis [** ISO-8859-1 charset **] VázquezCao 9062d888aa [PATCH] x86-64: __send_IPI_dest_field - x86_64
Implement __send_IPI_dest_field which can be used to send IPIs when the
"destination shorthand" field of the ICR is set to 00 (destination
field). Use it whenever possible.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:18 +02:00
Fernando Luis VazquezCao 8339e9fba3 [PATCH] x86-64: safe_apic_wait_icr_idle - x86_64
apic_wait_icr_idle looks like this:

static __inline__ void apic_wait_icr_idle(void)
{
  while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
    cpu_relax();
}

The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.

Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.

A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-outâ€
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."

Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.

Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.

AK: moved both functions out of line
AK: Added improved loop from Keith Owens

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Fernando Luis VazquezCao f2b218dd61 [PATCH] i386: safe_apic_wait_icr_idle - i386
apic_wait_icr_idle looks like this:

static __inline__ void apic_wait_icr_idle(void)
{
  while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
    cpu_relax();
}

The busy loop in this function would not be problematic if the
corresponding status bit in the ICR were always updated, but that does
not seem to be the case under certain crash scenarios. Kdump uses an IPI
to stop the other CPUs in the event of a crash, but when any of the
other CPUs are locked-up inside the NMI handler the CPU that sends the
IPI will end up looping forever in the ICR check, effectively
hard-locking the whole system.

Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:

"A local APIC unit indicates successful dispatch of an IPI by
resetting the Delivery Status bit in the Interrupt Command
Register (ICR). The operating system polls the delivery status
bit after sending an INIT or STARTUP IPI until the command has
been dispatched.

A period of 20 microseconds should be sufficient for IPI dispatch
to complete under normal operating conditions. If the IPI is not
successfully dispatched, the operating system can abort the
command. Alternatively, the operating system can retry the IPI by
writing the lower 32-bit double word of the ICR. This “time-outâ€
mechanism can be implemented through an external interrupt, if
interrupts are enabled on the processor, or through execution of
an instruction or time-stamp counter spin loop."

Intel's documentation suggests the implementation of a time-out
mechanism, which, by the way, is already being open-coded in some parts
of the kernel that tinker with ICR.

Create a apic_wait_icr_idle replacement that implements the time-out
mechanism and that can be used to solve the aforementioned problem.

AK: moved both functions out of line
AK: added improved loop from Keith Owens

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl de938c51d5 [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync
If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
we are running on an AMD64/K8 system, the boot CPU must have had
MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
have copied these bits cleared), so we set them on this CPU as well.

This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
across the CPUs of SMP systems in order to fullfill the duty of
system software to "initialize and maintain MTRR consistency
across all processors." as written in the AMD and Intel manuals.

If an WRMSR instruction fails because MtrrFixDramModEn is not
set, I expect that also the Intel-style MTRR bits are not updated.

AK: minor cleanup, moved MSR defines around

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 2b1f6278d7 [PATCH] x86: Save the MTRRs of the BSP before booting an AP
Applied fix by Andew Morton:
http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.

AMD and Intel x86 CPU manuals state that it is the responsibility of
system software to initialize and maintain MTRR consistency across
all processors in Multi-Processing Environments.

Quote from page 188 of the AMD64 System Programming manual (Volume 2):

7.6.5 MTRRs in Multi-Processing Environments

"In multi-processing environments, the MTRRs located in all processors must
characterize memory in the same way. Generally, this means that identical
values are written to the MTRRs used by the processors." (short omission here)
"Failure to do so may result in coherency violations or loss of atomicity.
Processor implementations do not check the MTRR settings in other processors
to ensure consistency. It is the responsibility of system software to
initialize and maintain MTRR consistency across all processors."

Current Linux MTRR code already implements the above in the case that the
BIOS does not properly initialize MTRRs on the secondary processors,
but the case where the fixed-range MTRRs of the boot processor are changed
after Linux started to boot, before the initialsation of a secondary
processor, is not handled yet.

In this case, secondary processors are currently initialized by Linux
with MTRRs which the boot processor had very early, when mtrr_bp_init()
did run, but not with the MTRRs which the boot processor uses at the
time when that secondary processors is actually booted,
causing differing MTRR contents on the secondary processors.

Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
of the boot processor when it transitions the system into ACPI mode.
The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
code runs acpi_enable().

Other occasions where the SMI handler of the BIOS may change bits in
the MTRRs could occur as well. To initialize newly booted secodary
processors with the fixed-range MTRRs which the boot processor uses
at that time, this patch saves the fixed-range MTRRs of the boot
processor before new secondary processors are started. When the
secondary processors run their Linux initialisation code, their
fixed-range MTRRs will be updated with the saved fixed-range MTRRs.

If CONFIG_MTRR is not set, we define mtrr_save_state
as an empty statement because there is nothing to do.

Possible TODOs:

*) CPU-hotplugging outside of SMP suspend/resume is not yet tested
   with this patch.

*) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
   then the calls to mtrr_save_state() could be replaced by calls to
   mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
   needed.

   That would need either verification of the CPU-hotplug code or
   at least a test on a >2 CPU machine.

*) The MTRRs of other running processors are not yet checked at this
   time but it might be interesting to syncronize the MTTRs of all
   processors before booting. That would be an incremental patch,
   but of rather low priority since there is no machine known so
   far which would require this.

AK: moved prototypes on x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Bernhard Kaindl 2b3b4835c9 [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches.
In this current implementation which is used in other patches,
mtrr_save_fixed_ranges() accepts a dummy void pointer because
in the current implementation of one of these patches, this
function may be called from smp_call_function_single() which
requires that this function takes a void pointer argument.

This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
which is the element of the static struct which stores our current
backup of the fixed-range MTRR values which all CPUs shall be
using.

Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
kernel initialisation time, __init needs to be removed from
the declaration of get_fixed_ranges().

If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
as an empty statement because there is nothing to do.

AK: Moved prototypes for x86-64 around to fix warnings

Signed-off-by: Bernhard Kaindl <bk@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02 19:27:17 +02:00
Andi Kleen 856f44ff4a [PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge 03df4f6ee9 [PATCH] i386: Clean up ELF note generation
Three cleanups:

1: ELF notes are never mapped, so there's no need to have any access
flags in their phdr.

2: When generating them from asm, tell the assembler to use a SHT_NOTE
section type.  There doesn't seem to be a way to do this from C.

3: Use ANSI rather than traditional cpp behaviour to stringify the
macro argument.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge 441d40dca0 [PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org>
The other symbols used to delineate the alt-instructions sections have the
form __foo/__foo_end.  Rename parainstructions to match.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:16 +02:00
Zachary Amsden e0bb864397 [PATCH] i386: Convert VMI timer to use clock events
Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure.  On UP systems, with no local APIC, we just continue to route
these events through the PIT.  On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ.  It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Dan Hecht <dhecht@vmware.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 57decbda6a [PATCH] x86: update for i386 and x86-64 check_bugs
Remove spurious comments, headers and keywords from x86-64 bugs.[ch].

Use identify_boot_cpu()

AK: merged with other patch

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge c5413fbe89 [PATCH] i386: Fix UP gdt bugs
Fixes two problems with the GDT when compiling for uniprocessor:
 - There's no percpu segment, so trying to load its selector into %fs fails.
   Use a null selector instead.
 - The real gdt needs to be loaded at some point.  Do it in cpu_init().

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 1956c73bb5 [PATCH] i386: Define per_cpu_offset
Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
asm-generic/percpu.h does for UP.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 978c038ec9 [PATCH] i386: cleanups to help using per-cpu variables from asm
This patch does a few small cleanups:
 - use PER_CPU_NAME to generate the names of per-cpu variables
 - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
   affect condition flags
 - add PER_CPU_VAR which allows direct access to pre-cpu variables
   with the %fs: prefix on SMP.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 7c3576d261 [PATCH] i386: Convert PDA into the percpu section
Currently x86 (similar to x84-64) has a special per-cpu structure
called "i386_pda" which can be easily and efficiently referenced via
the %fs register.  An ELF section is more flexible than a structure,
allowing any piece of code to use this area.  Indeed, such a section
already exists: the per-cpu area.

So this patch:
(1) Removes the PDA and uses per-cpu variables for each current member.
(2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
(3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
    can be used to calculate addresses for this CPU's variables.
(4) Simplifies startup, because %fs doesn't need to be loaded with a
    special segment at early boot; it can be deferred until the first
    percpu area is allocated (or never for UP).

The result is less code and one less x86-specific concept.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge 7a61d35d4b [PATCH] i386: Page-align the GDT
Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
lguest, KVM and native don't care.

Simple transformation to page-aligned "struct gdt_page".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 4cdd9c8931 [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear
In shadow mode hypervisors, ptep_get_and_clear achieves the desired
purpose of keeping the shadows in sync by issuing a native_get_and_clear,
followed by a call to pte_update, which indicates the PTE has been
modified.

Direct mode hypervisors (Xen) have no need for this anyway, and will trap
the update using writable pagetables.

This means no hypervisor makes use of ptep_get_and_clear; there is no
reason to have it in the paravirt-ops structure.  Change confusing
terminology about raw vs. native functions into consistent use of
native_pte_xxx for operations which do not invoke paravirt-ops.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 1a45b7aaa5 [PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers
Replace all the open-coded macros for generating calls with a pair of
more general macros (__PVOP_CALL/VCALL), and redefine all the
PVOP_V?CALL[0-4] in terms of them.

[ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
  (paravirt_ops-document-asm-i386-paravirth.patch) ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 4e0fa85602 [PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi
Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge ce6234b529 [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages
Xen and VMI both have special requirements when mapping a highmem pte
page into the kernel address space.  These can be dealt with by adding
a new kmap_atomic_pte() function for mapping highptes, and hooking it
into the paravirt_ops infrastructure.

Xen specifically wants to map the pte page RO, so this patch exposes a
helper function, kmap_atomic_prot, which maps the page with the
specified page protections.

This also adds a kmap_flush_unused() function to clear out the cached
kmap mappings.  Xen needs this to clear out any potential stray RW
mappings of pages which will become part of a pagetable.

[ Zach - vmi.c will need some attention after this patch.  It wasn't
  immediately obvious to me what needs to be done. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge a27fe809b8 [PATCH] i386: PARAVIRT: revert map_pt_hook.
Back out the map_pt_hook to clear the way for kmap_atomic_pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge d4c104771a [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op
This patch adds a pv_op for flush_tlb_others.  Linux running on native
hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
have a particular mm's pagetable entries cached in its TLB.  This is
inefficient in a paravirtualized environment, since the hypervisor
knows which real CPUs actually contain cached mappings, which may be a
small subset of a guest's VCPUs.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge 63f70270cc [PATCH] i386: PARAVIRT: add common patching machinery
Implement the actual patching machinery.  paravirt_patch_default()
contains the logic to automatically patch a callsite based on a few
simple rules:

 - if the paravirt_op function is paravirt_nop, then patch nops
 - if the paravirt_op function is a jmp target, then jmp to it
 - if the paravirt_op function is callable and doesn't clobber too much
    for the callsite, call it directly

paravirt_patch_default is suitable as a default implementation of
paravirt_ops.patch, will remove most of the expensive indirect calls
in favour of either a direct call or a pile of nops.

Backends may implement their own patcher, however.  There are several
helper functions to help with this:

paravirt_patch_nop	nop out a callsite
paravirt_patch_ignore	leave the callsite as-is
paravirt_patch_call	patch a call if the caller and callee
			have compatible clobbers
paravirt_patch_jmp	patch in a jmp
paravirt_patch_insns	patch some literal instructions over
			the callsite, if they fit

This patch also implements more direct patches for the native case, so
that when running on native hardware many common operations are
implemented inline.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 294688c028 [PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h
Clean things up, and broadly document:
 - the paravirt_ops functions themselves
 - the patching mechanism

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge f8822f4201 [PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable
Wrap a set of interesting paravirt_ops calls in a wrapper which makes
the callsites available for patching.  Unfortunately this is pretty
ugly because there's no way to get gcc to generate a function call,
but also wrap just the callsite itself with the necessary labels.

This patch supports functions with 0-4 arguments, and either void or
returning a value.  64-bit arguments must be split into a pair of
32-bit arguments (lower word first).  Small structures are returned in
registers.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 42c24fa22e [PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register
Fix a few clobbers to include the return register.  The clobbers set
is the set of all registers modified (or may be modified) by the code
snippet, regardless of whether it was deliberate or accidental.

Also, make sure that callsites which are used in contexts which don't
allow clobbers actually save and restore all clobberable registers.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge d582203578 [PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure
Use patch type identifiers derived from the offset of the operation in
the paravirt_ops structure.  This avoids having to maintain a separate
enum for patch site types.

Also, since the identifier is derived from the offset into
paravirt_ops, the offset can be derived from the identifier.  This is
used to remove replicated information in the various callsite macros,
which has been a source of bugs in the past.

This patch also drops the fused save_fl+cli operation, which doesn't
really add much and makes things more complex - specifically because
it breaks the 1:1 relationship between identifiers and offsets.  If
this operation turns out to be particularly beneficial, then the right
answer is to define a new entrypoint for it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 98de032b68 [PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity
Rename struct paravirt_patch to paravirt_patch_site, so that it
clearly refers to a callsite, and not the patch which may be applied
to that callsite.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge d6dd61c831 [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction
Add hooks to allow a paravirt implementation to track the lifetime of
an mm.  Paravirtualization requires three hooks, but only two are
needed in common code.  They are:

arch_dup_mmap, which is called when a new mmap is created at fork

arch_exit_mmap, which is called when the last process reference to an
  mm is dropped, which typically happens on exit and exec.

The third hook is activate_mm, which is called from the arch-specific
activate_mm() macro/function, and so doesn't need stub versions for
other architectures.  It's called when an mm is first used.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: linux-arch@vger.kernel.org
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge 5311ab62cd [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing
Normally when running in PAE mode, the 4th PMD maps the kernel address space,
which can be shared among all processes (since they all need the same kernel
mappings).

Xen, however, does not allow guests to have the kernel pmd shared between page
tables, so parameterize pgtable.c to allow both modes of operation.

There are several side-effects of this.  One is that vmalloc will update the
kernel address space mappings, and those updates need to be propagated into
all processes if the kernel mappings are not intrinsically shared.  In the
non-PAE case, this is done by maintaining a pgd_list of all processes; this
list is used when all process pagetables must be updated.  pgd_list is
threaded via otherwise unused entries in the page structure for the pgd, which
means that the pgd must be page-sized for this to work.

Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
pgd to page aligned anyway, so this patch forces the pgd to be page
aligned+sized when the kernel pmd is unshared, to accomodate both these
requirements.

Also, since there may be several distinct kernel pmds (if the user/kernel
split is below 3G), there's no point in allocating them from a slab cache;
they're just allocated with get_free_page and initialized appropriately.  (Of
course the could be cached if there is just a single kernel pmd - which is the
default with a 3G user/kernel split - but it doesn't seem worthwhile to add
yet another case into this code).

[ Many thanks to wli for review comments. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge 90caccb975 [PATCH] i386: PARAVIRT: Allocate a fixmap slot
Allocate a fixmap slot for use by a paravirt_ops implementation.  This
is intended for early-boot bootstrap mappings.  Once the zones and
allocator have been set up, it would be better to use get_vm_area() to
allocate some virtual space.

Xen uses this to map the hypervisor's shared info page, which doesn't
have a pseudo-physical page number, and therefore can't be mapped
ordinarily.  It is needed early because it contains the vcpu state,
including the interrupt mask.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge b239fb2501 [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable
This patch introduces paravirt_ops hooks to control how the kernel's
initial pagetable is set up.

In the case of a native boot, the very early bootstrap code creates a
simple non-PAE pagetable to map the kernel and physical memory.  When
the VM subsystem is initialized, it creates a proper pagetable which
respects the PAE mode, large pages, etc.

When booting under a hypervisor, there are many possibilities for what
paging environment the hypervisor establishes for the guest kernel, so
the constructon of the kernel's pagetable depends on the hypervisor.

In the case of Xen, the hypervisor boots the kernel with a fully
constructed pagetable, which is already using PAE if necessary.  Also,
Xen requires particular care when constructing pagetables to make sure
all pagetables are always mapped read-only.

In order to make this easier, kernel's initial pagetable construction
has been changed to only allocate and initialize a pagetable page if
there's no page already present in the pagetable.  This allows the Xen
paravirt backend to make a copy of the hypervisor-provided pagetable,
allowing the kernel to establish any more mappings it needs while
keeping the existing ones.

A slightly subtle point which is worth highlighting here is that Xen
requires all kernel mappings to share the same pte_t pages between all
pagetables, so that updating a kernel page's mapping in one pagetable
is reflected in all other pagetables.  This makes it possible to
allocate a page and attach it to a pagetable without having to
explicitly enumerate that page's mapping in all pagetables.

And:

+From: "Eric W. Biederman" <ebiederm@xmission.com>

If we don't set the leaf page table entries it is quite possible that
will inherit and incorrect page table entry from the initial boot
page table setup in head.S.  So we need to redo the effort here,
so we pick up PSE, PGE and the like.

Hypervisors like Xen require that their page tables be read-only,
which is slightly incompatible with our low identity mappings, however
I discussed this with Jeremy he has modified the Xen early set_pte
function to avoid problems in this area.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge 3dc494e86d [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries
Add a set of accessors to pack, unpack and modify page table entries
(at all levels).  This allows a paravirt implementation to control the
contents of pgd/pmd/pte entries.  For example, Xen uses this to
convert the (pseudo-)physical address into a machine address when
populating a pagetable entry, and converting back to pphys address
when an entry is read.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge 4587623360 [PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations
Add a _paravirt_nop function for use as a stub for no-op operations,
and paravirt_nop #defined void * version to make using it easier
(since all its uses are as a void *).

This is useful to allow the patcher to automatically identify noop
operations so it can simply nop out the callsite.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
[mingo] but only as a cleanup of the current open-coded (void *) casts.
My problem with this is that it loses the types. Not that there is much
to check for, but still, this adds some assumptions about how function
calls look like
2007-05-02 19:27:13 +02:00
Rusty Russell a75c54f933 [PATCH] i386: i386 separate hardware-defined TSS from Linux additions
On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote:
> Please clean it up properly with two structs.

Not sure about this, now I've done it.  Running it here.

If you like it, I can do x86-64 as well.

==
lguest defines its own TSS struct because the "struct tss_struct"
contains linux-specific additions.  Andi asked me to split the struct
in processor.h.

Unfortunately it makes usage a little awkward.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge d0175ab644 [PATCH] i386: Remove smp_alt_instructions
The .smp_altinstructions section and its corresponding symbols are
completely unused, so remove them.

Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:13 +02:00
H. Peter Anvin 4bc5aa91fb [PATCH] x86: Clean up x86 control register and MSR macros (corrected)
This patch is based on Rusty's recent cleanup of the EFLAGS-related
macros; it extends the same kind of cleanup to control registers and
MSRs.

It also unifies these between i386 and x86-64; at least with regards
to MSRs, the two had definitely gotten out of sync.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Andi Kleen f039b75471 [PATCH] x86: Don't use MWAIT on AMD Family 10
It doesn't put the CPU into deeper sleep states, so it's better to use the standard
idle loop to save power. But allow to reenable it anyways for benchmarking.

I also removed the obsolete idle=halt on i386

Cc: andreas.herrmann@amd.com

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge c169859d6d [PATCH] x86-64: Clean up asm-x86_64/bugs.h
Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge 1dbf527c51 [PATCH] i386: Make COMPAT_VDSO runtime selectable.
Now that relocation of the VDSO for COMPAT_VDSO users is done at
runtime rather than compile time, it is possible to enable/disable
compat mode at runtime.

This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the
kernel command line, or via sysctl.  (Switching on a running system
shouldn't be done lightly; any process which was relying on the compat
VDSO will be upset if it goes away.)

The COMPAT_VDSO config option still exists, but if enabled it just
makes vdso_enabled default to VDSO_COMPAT.

+From: Hugh Dickins <hugh@veritas.com>

Fix oops from i386-make-compat_vdso-runtime-selectable.patch.

Even mingetty at system startup finds it easy to trigger an oops
while reading /proc/PID/maps: though it has a good hold on the mm
itself, that cannot stop exit_mm() from resetting tsk->mm to NULL.

(It is usually show_map()'s call to get_gate_vma() which oopses,
and I expect we could change that to check priv->tail_vma instead;
but no matter, even m_start()'s call just after get_task_mm() is racy.)

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge d4f7a2c18e [PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO
Some versions of libc can't deal with a VDSO which doesn't have its
ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
a specific system-wide fixed address.  Previously this was all done at
build time, on the grounds that the fixed VDSO address is always at
the top of the address space.  However, a hypervisor may reserve some
of that address space, pushing the fixmap address down.

This patch does the adjustment dynamically at runtime, depending on
the runtime location of the VDSO fixmap.

[ Patch has been through several hands: Jan Beulich wrote the orignal
  version; Zach reworked it, and Jeremy converted it to relocate phdrs
  as well as sections. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge a6c4e076ee [PATCH] i386: clean up identify_cpu
identify_cpu() is used to identify both the boot CPU and secondary
CPUs, but it performs some actions which only apply to the boot CPU.
Those functions are therefore really __init functions, but because
they're called by identify_cpu(), they must be marked __cpuinit.

This patch splits identify_cpu() into identify_boot_cpu() and
identify_secondary_cpu(), and calls the appropriate init functions
from each.  Also, identify_boot_cpu() and all the functions it
dominates are marked __init.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge 1353ebb4b4 [PATCH] i386: Clean up asm-i386/bugs.h
Most of asm-i386/bugs.h is code which should be in a C file, so put it there.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-02 19:27:12 +02:00
Avi Kivity bbf30a1650 [PATCH] x86-64: fix arithmetic in comment
The xmm space on x86_64 is 256 bytes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:12 +02:00
Andi Kleen 5d02d7ae73 [PATCH] x86-64: Use X86_EFLAGS_IF in x86-64/irqflags.h.
As per i386 patch: move X86_EFLAGS_IF et al out to a new header:
processor-flags.h, so we can include it from irqflags.h and use it in
raw_irqs_disabled_flags().

As a side-effect, we could now use these flags in .S files.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Jan Beulich b92e9fac40 [PATCH] x86: fix amd64-agp aperture validation
Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all
subsequent pfn-s are also invalid is wrong. Thus replace this by
explicitly checking against the E820 map.

AK: make e820 on x86-64 not initdata

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge b00742d399 [PATCH] x86-64: Account for module percpu space separately from kernel percpu
Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE.  This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge 07f3331c6b [PATCH] i386: Add machine_ops interface to abstract halting and rebooting
machine_ops is an interface for the machine_* functions defined in
<linux/reboot.h>.  This is intended to allow hypervisors to intercept
the reboot process, but it could be used to implement other x86
subarchtecture reboots.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge 01a2f43556 [PATCH] i386: Add smp_ops interface
Add a smp_ops interface.  This abstracts the API defined by
<linux/smp.h> for use within arch/i386.  The primary intent is that it
be used by a paravirtualizing hypervisor to implement SMP, but it
could also be used by non-APIC-using sub-architectures.

This is related to CONFIG_PARAVIRT, but is implemented unconditionally
since it is simpler that way and not a highly performance-sensitive
interface.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-02 19:27:11 +02:00
Rusty Russell 4fbb596881 [PATCH] i386: cleanup GDT Access
Now we have an explicit per-cpu GDT variable, we don't need to keep the
descriptors around to use them to find the GDT: expose cpu_gdt directly.

We could go further and make load_gdt() pack the descriptor for us, or even
assume it means "load the current cpu's GDT" which is what it always does.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:11 +02:00
Adrian Bunk ca906e4231 [PATCH] x86: sys_ioperm() prototype cleanup
- there's no reason for duplicating the prototype from
  include/linux/syscalls.h in include/asm-x86_64/unistd.h
- every file should #include the headers containing the prototypes for
  it's global functions

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Christoph Lameter 2bff73830c [PATCH] x86-64: use lru instead of page->index and page->private for pgd lists management.
x86_64 currently simulates a list using the index and private fields of the
page struct.  Seems that the code was inherited from i386.  But x86_64 does
not use the slab to allocate pgds and pmds etc.  So the lru field is not
used by the slab and therefore available.

This patch uses standard list operations on page->lru to realize pgd
tracking.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Andi Kleen b4531e863d [PATCH] i386: Use X86_EFLAGS_IF in irqflags.h.
Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we
can include it from irqflags.h and use it in raw_irqs_disabled_flags().

As a side-effect, we could now use these flags in .S files.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Jan Beulich 6fb14755a6 [PATCH] x86: tighten kernel image page access rights
On x86-64, kernel memory freed after init can be entirely unmapped instead
of just getting 'poisoned' by overwriting with a debug pattern.

On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
can also be write-protected.

Compared to the first version, this one prevents re-creating deleted
mappings in the kernel image range on x86-64, if those got removed
previously. This, together with the original changes, prevents temporarily
having inconsistent mappings when cacheability attributes are being
changed on such pages (e.g. from AGP code). While on i386 such duplicate
mappings don't exist, the same change is done there, too, both for
consistency and because checking pte_present() before using various other
pte_XXX functions is a requirement anyway. At once, i386 code gets
adjusted to use pte_huge() instead of open coding this.

AK: split out cpa() changes

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Jan Beulich d01ad8dd56 [PATCH] x86: Improve handling of kernel mappings in change_page_attr
Fix various broken corner cases in i386 and x86-64 change_page_attr.

AK: split off from tighten kernel image access rights

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:10 +02:00
Rusty Russell 90a0a06aa8 [PATCH] i386: rationalize paravirt wrappers
paravirt.c used to implement native versions of all low-level
functions.  Far cleaner is to have the native versions exposed in the
headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
#define XXX native_XXX.

There are several nice side effects:

1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
   not "void *".

2) load_TLS is reintroduced to the for loop, not manually unrolled
   with a #error in case the bounds ever change.

3) Macros become inlines, with type checking.

4) Access to the native versions is trivial for KVM, lguest, Xen and
   others who might want it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Rusty Russell d2cbcc49e2 [PATCH] i386: clean up cpu_init()
We now have cpu_init() and secondary_cpu_init() doing nothing but calling
_cpu_init() with the same arguments.  Rename _cpu_init() to cpu_init() and use
it as a replcement for secondary_cpu_init().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Rusty Russell bf50467204 [PATCH] i386: Use per-cpu GDT immediately upon boot
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT.  This means initializing the cpu_gdt array in C.

The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated.  For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.

For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Rusty Russell ae1ee11be7 [PATCH] i386: Use per-cpu variables for GDT, PDA
Allocating PDA and GDT at boot is a pain.  Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).

[akpm@linux-foundation.org: build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:10 +02:00
Ian Campbell 79e030114a [PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace.  The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.

It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Horms <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
Rusty Russell eab0c72aec [PATCH] x86-64: Introduce load_TLS to the "for" loop.
GCC (4.1 at least) unrolls it anyway, but I can't believe this code
was ever justifiable.  (I've also submitted a patch which cleans up
i386, which is even uglier).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
Rusty Russell 692174b97d [PATCH] i386: Initialize esp0 properly all the time
Whenever we schedule, __switch_to calls load_esp0 which does:

	tss->esp0 = thread->esp0;

This is never initialized for the initial thread (ie "swapper"), so when we're
scheduling that, we end up setting esp0 to 0.  This is fine: the swapper never
leaves ring 0, so this field is never used.

lguest, however, gets upset that we're trying to used an unmapped page as our
kernel stack.  Rather than work around it there, let's initialize it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
David Rientjes 8b8ca80e19 [PATCH] x86-64: configurable fake numa node sizes
Extends the numa=fake x86_64 command-line option to allow for configurable
node sizes.  These nodes can be used in conjunction with cpusets for coarse
memory resource management.

The old command-line option is still supported:
  numa=fake=32	gives 32 fake NUMA nodes, ignoring the NUMA setup of the
		actual machine.

But now you may configure your system for the node sizes of your choice:
  numa=fake=2*512,1024,2*256
		gives two 512M nodes, one 1024M node, two 256M nodes, and
		the rest of system memory to a sixth node.

The existing hash function is maintained to support the various node sizes
that are possible with this implementation.

Each node of the same size receives roughly the same amount of available
pages, regardless of any reserved memory with its address range.  The total
available pages on the system is calculated and divided by the number of equal
nodes to allocate.  These nodes are then dynamically allocated and their
borders extended until such time as their number of available pages reaches
the required size.

Configurable node sizes are recommended when used in conjunction with cpusets
for memory control because it eliminates the overhead associated with scanning
the zonelists of many smaller full nodes on page_alloc().

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
john stultz 5a90cf205c [PATCH] x86: Log reason why TSC was marked unstable
Change mark_tsc_unstable() so it takes a string argument, which holds the
reason the TSC was marked unstable.

This is then displayed the first time mark_tsc_unstable is called.

This should help us better debug why the TSC was marked unstable on certain
systems and allow us to make sure we're not being overly paranoid when
throwing out this troublesome clocksource.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:08 +02:00
Vivek Goyal 1833d6bc72 [PATCH] i386: modpost apic related warning fixes
o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y

WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode'
WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check'
WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present'

o Some functions which are inline (acpi_madt_oem_check) are not inlined by
  compiler as these functions are accessed using function pointer. These
  functions are put in .text section and they in-turn access __init type
  functions hence modpost generates warnings.

o Do not iniline acpi_madt_oem_check, instead make it __init.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:08 +02:00
Ravikiran G Thirumalai e073ae1b34 [PATCH] x86-64: Set HASHDIST_DEFAULT to 1 for x86_64 NUMA
Enable system hashtable memory to be distributed among nodes on x86_64 NUMA

Forcing the kernel to use node interleaved vmalloc instead of bootmem for
the system hashtable memory (alloc_large_system_hash) reduces the memory
imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box:

Before the following patch, on bootup of a 8 node box:

Node 0 MemTotal:      3407488 kB
Node 0 MemFree:       3206296 kB
Node 0 MemUsed:        201192 kB
Node 0 Active:           7012 kB
Node 0 Inactive:          512 kB
Node 0 Dirty:               0 kB
Node 0 Writeback:           0 kB
Node 0 FilePages:        1912 kB
Node 0 Mapped:            420 kB
Node 0 AnonPages:        5612 kB
Node 0 PageTables:        468 kB
Node 0 NFS_Unstable:        0 kB
Node 0 Bounce:              0 kB
Node 0 Slab:             5408 kB
Node 0 SReclaimable:      644 kB
Node 0 SUnreclaim:       4764 kB

After the patch (or using hashdist=1 on the kernel command line):

Node 0 MemTotal:      3407488 kB
Node 0 MemFree:       3247608 kB
Node 0 MemUsed:        159880 kB
Node 0 Active:           3012 kB
Node 0 Inactive:          616 kB
Node 0 Dirty:               0 kB
Node 0 Writeback:           0 kB
Node 0 FilePages:        2424 kB
Node 0 Mapped:            380 kB
Node 0 AnonPages:        1200 kB
Node 0 PageTables:        396 kB
Node 0 NFS_Unstable:        0 kB
Node 0 Bounce:              0 kB
Node 0 Slab:             6304 kB
Node 0 SReclaimable:     1596 kB
Node 0 SUnreclaim:       4708 kB

I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA
since x86_64 has no dearth of vmalloc space?  Or maybe enable hash
distribution for all 64bit NUMA arches?  The following patch does it only
for x86_64.

I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of
memory and uses up tlb entries.  This was on a 4 way, 2 socket
Tyan AMD box (non vsmp), with 8G total memory (4G pernode).

The results with and without hash distribution are:

1. Vanilla - runtime of 1188.000s
2. With hashdist=1 runtime of 1154.000s

Oprofile output for the duration of run is:

1. Vanilla:
PU: AMD64 processors, speed 2411.16 MHz (estimated)
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
mask of 0x00 (No unit mask) count 500
samples  %        app name                 symbol name
163054    6.5513  libansys1.so             MultiFront::decompose(int, int,
Elemset *, int *, int, int, int)
162061    6.5114  libansys3.so             blockSaxpy6L_fd
162042    6.5107  libansys3.so             blockInnerProduct6L_fd
156286    6.2794  libansys3.so             maxb33_
87879     3.5309  libansys1.so             elmatrixmultpcg_
84857     3.4095  libansys4.so             saxpy_pcg
58637     2.3560  libansys4.so             .st4560
46612     1.8728  libansys4.so             .st4282
43043     1.7294  vmlinux-t                copy_user_generic_string
41326     1.6604  libansys3.so             blockSaxpyBackSolve6L_fd
41288     1.6589  libansys3.so             blockInnerProductBackSolve6L_fd

2. With hashdist=1
CPU: AMD64 processors, speed 2411.13 MHz (estimated)
Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
mask of 0x00 (No unit mask) count 500
samples  %        app name                 symbol name
162993    6.9814  libansys1.so             MultiFront::decompose(int, int,
Elemset *, int *, int, int, int)
160799    6.8874  libansys3.so             blockInnerProduct6L_fd
160459    6.8729  libansys3.so             blockSaxpy6L_fd
156018    6.6826  libansys3.so             maxb33_
84700     3.6279  libansys4.so             saxpy_pcg
83434     3.5737  libansys1.so             elmatrixmultpcg_
58074     2.4875  libansys4.so             .st4560
46000     1.9703  libansys4.so             .st4282
41166     1.7632  libansys3.so             blockSaxpyBackSolve6L_fd
41033     1.7575  libansys3.so             blockInnerProductBackSolve6L_fd
35762     1.5318  libansys1.so             inner_product_sub
35591     1.5245  libansys1.so             inner_product_sub2
28259     1.2104  libansys4.so             addVectors

Signed-off-by: Pravin B. Shelar <pravin.shelar@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:08 +02:00
Andrew Morton 184c44d204 [PATCH] x86-64: fix x86_64-mm-sched-clock-share
Fix for the following patch. Provide dummy cpufreq functions when
CPUFREQ is not compiled in.

Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:08 +02:00
Vivek Goyal 6a50a664ca [PATCH] x86-64: build-time checking
o X86_64 kernel should run from 2MB aligned address for two reasons.
	- Performance.
	- For relocatable kernels, page tables are updated based on difference
	  between compile time address and load time physical address.
	  This difference should be multiple of 2MB as kernel text and data
	  is mapped using 2MB pages and PMD should be pointing to a 2MB
	  aligned address. Life is simpler if both compile time and load time
	  kernel addresses are 2MB aligned.

o Flag the error at compile time if one is trying to build a kernel which
  does not meet alignment restrictions.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:08 +02:00
Vivek Goyal 1ab60e0f72 [PATCH] x86-64: Relocatable Kernel Support
This patch modifies the x86_64 kernel so that it can be loaded and run
at any 2M aligned address, below 512G.  The technique used is to
compile the decompressor with -fPIC and modify it so the decompressor
is fully relocatable.  For the main kernel the page tables are
modified so the kernel remains at the same virtual address.  In
addition a variable phys_base is kept that holds the physical address
the kernel is loaded at.  __pa_symbol is modified to add that when
we take the address of a kernel symbol.

When loaded with a normal bootloader the decompressor will decompress
the kernel to 2M and it will run there.  This both ensures the
relocation code is always working, and makes it easier to use 2M
pages for the kernel and the cpu.

AK: changed to not make RELOCATABLE default in Kconfig

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal 0dbf7028c0 [PATCH] x86: __pa and __pa_symbol address space separation
Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal cfd243d4af [PATCH] x86-64: Remove the identity mapping as early as possible
With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.

So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code.  The functions
 are subtly different thus the name change.

This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed.  Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal 7db681d7e4 [PATCH] x86-64: wakeup.S rename registers to reflect right names
o Use appropriate names for 64bit regsiters.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal 3c321bceb4 [PATCH] x86-64: Add EFER to the register set saved by save_processor_state
EFER varies like %cr4 depending on the cpu capabilities, and which cpu
capabilities we want to make use of.  So save/restore it make certain
we have the same EFER value when we are done.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal 30f4728954 [PATCH] x86-64: cleanup segments
Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
used when entering the kernel so putting it first is useful when
trying to keep boot gdt sizes to a minimum.

Set the accessed bit on all gdt entries.  We don't care
so there is no need for the cpu to burn the extra cycles,
and it potentially allows the pages to be immutable.  Plus
it is confusing when debugging and your gdt entries mysteriously
change.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:07 +02:00
Vivek Goyal 67dcbb6bc6 [PATCH] x86-64: Clean up the early boot page table
- Merge physmem_pgt and ident_pgt, removing physmem_pgt.  The merge
  is broken as soon as mm/init.c:init_memory_mapping is run.
- As physmem_pgt is gone don't export it in pgtable.h.
- Use defines from pgtable.h for page permissions.
- Fix the physical memory identity mapping so it is at the correct
  address.
- Remove the physical memory mapping from wakeup_level4_pgt it
  is at the wrong address so we can't possibly be usinging it.
- Simply NEXT_PAGE the work to calculate the phys_ alias
  of the labels was very cool.  Unfortuantely it was a brittle
  special purpose hack that makes maitenance more difficult.
  Instead just use label - __START_KERNEL_map like we do
  everywhere else in assembly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Vivek Goyal 9d291e787b [PATCH] x86-64: Assembly safe page.h and pgtable.h
This patch makes pgtable.h and page.h safe to include
in assembly files like head.S.  Allowing us to use
symbolic constants instead of hard coded numbers when
refering to the page tables.

This patch copies asm-sparc64/const.h to asm-x86_64 to
get a definition of _AC() a very convinient macro that
allows us to force the type when we are compiling the
code in C and to drop all of the type information when
we are using the constant in assembly.  Previously this
was done with multiple definition of the same constant.
const.h was modified slightly so that it works when given
CONFIG options as arguments.

This patch adds #ifndef __ASSEMBLY__ ... #endif
and _AC(1,UL) where appropriate so the assembler won't
choke on the header files.  Otherwise nothing
should have changed.

AK: added const.h to exported headers to fix headers_check

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Stephen Hemminger e658450455 [PATCH] x86-64: dma_ops as const
The dma_ops structure can be const since it never changes
after boot.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Joerg Roedel 6b37f5a20c [PATCH] x86-64: fix cpu MHz reporting on constant_tsc cpus
This patch fixes the reporting of cpu_mhz in /proc/cpuinfo on CPUs with
a constant TSC rate and a kernel with disabled cpufreq.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>

 arch/x86_64/kernel/apic.c     |    2 -
 arch/x86_64/kernel/time.c     |   58 +++++++++++++++++++++++++++++++++++++++---
 arch/x86_64/kernel/tsc.c      |   12 +++++---
 arch/x86_64/kernel/tsc_sync.c |    2 -
 include/asm-x86_64/proto.h    |    1
 5 files changed, 65 insertions(+), 10 deletions(-)
2007-05-02 19:27:06 +02:00
Glauber de Oliveira Costa fbc16f2c2a [PATCH] x86-64: Remove duplicated code for reading control registers
On Tue, Mar 13, 2007 at 05:33:09AM -0700, Randy.Dunlap wrote:
> On Tue, 13 Mar 2007, Glauber de Oliveira Costa wrote:
>
> > Tiny cleanup:
> >
> > In x86_64, the same functions for reading cr3 and writing cr{3,4} are
> > defined in tlbflush.h and system.h, whith just a name change.
> > The only difference is the clobbering of memory, which seems a safe, and
> > even needed change for the write_cr4. This patch removes the duplicate.
> > write_cr3() is moved to system.h for consistency.
>
> missing patch.....
>
thanks. Attached now

--
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Rusty Russell f9d09645d6 [PATCH] x86-64: Remove unused set_seg_base
The set_seg_base function isn't used anywhere (2.6.21-rc3-git1)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Jeremy Fitzhardinge 973efae21b [PATCH] i386: clean up mach_reboot_fixups
The reboot_fixups stuff seems to be a bit of a mess, specifically the
header is in linux/ when its a purely i386-specific piece of code.  I'm
not sure why it has its config option; its only currently needed for
"geode-gx1/cs5530a", so perhaps whatever config option controls that
hardware should enable this?

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Aneesh Kumar K.V 6d1c426158 [PATCH] i386: Update __copy_to_user_inatomic linuxdoc description
Explicity specify that the caller should pin the user memory
otherwise the function will sleep

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:06 +02:00
Prarit Bhargava 86c0baf123 [PATCH] i386: Change sysenter_setup to __cpuinit & improve __INIT, __INITDATA
Change sysenter_setup to __cpuinit.
Change __INIT & __INITDATA to be cpu hotplug aware.

Resolve MODPOST warnings similar to:

WARNING: vmlinux - Section mismatch: reference to .init.text:sysenter_setup from
 .text between 'identify_cpu' (at offset 0xc040a380) and 'detect_ht'

and

WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_end
from .text between 'sysenter_setup' (at offset 0xc041a269) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_int80_start from .text between 'sysenter_setup' (at offset
0xc041a26e) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_sysenter_end from .text between 'sysenter_setup' (at offset
0xc041a275) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_sysenter_start from .text between 'sysenter_setup' (at
offset 0xc041a27a) and 'enable_sep_cpu'

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Andi Kleen 803d80f650 [PATCH] x86-64: Some cleanup in time.c
Move prototypes into header files
Remove unneeded includes.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Simon Arlott 0949be3509 [PATCH] i386: Add an option for the VIA C7 which sets appropriate L1 cache
The VIA C7 is a 686 (with TSC) that supports MMX, SSE and SSE2, it also has
a cache line length of 64 according to
http://www.digit-life.com/articles2/cpu/rmma-via-c7.html.  This patch sets
gcc to -march=686 and select s the correct cache shift.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:05 +02:00
takada f5e8861583 [PATCH] i386: pit_latch_buggy has no effect
Eliminated the arch/i386/kernel/timers in 2.6.18, use clocksoures instead.
pit_latch_buggy was referred in timers/timer_tsc.c, and currently removed.
Therefore nobody refer it.

Until 2.6.17, MediaGX's TSC works correctly.  after 2.6.18, warned "TSC
appears to be running slowly.  Marking it as unstable".  So marked unstable
TSC when CS55x0.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Jeremy Fitzhardinge f76c392380 [PATCH] i386: No need to use -traditional for processing asm in i386/kernel/
No need to use -traditional for processing asm in i386/kernel/

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Jan Beulich 9964cf7d77 [PATCH] x86: consolidate smp_send_stop()
Synchronize i386's smp_send_stop() with x86-64's in only try-locking
the call lock to prevent deadlocks when called from panic().
In both version, disable interrupts before clearing the CPU off the
online map to eliminate races with IRQ handlers inspecting this map.
Also in both versions, save/restore interrupts rather than disabling/
enabling them.
On x86-64, eliminate one function used here by folding it into its
single caller, convert to static, and rename for consistency with i386
(lkcd may like this).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Jan Beulich b0354795c9 [PATCH] x86-64: adjust inclusion of asm/vsyscall32.h
Avoid including asm/vsyscall32.h in virtually every source file.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Jan Beulich 00f1ea6967 [PATCH] x86: adjust inclusion of asm/fixmap.h
Move inclusion of asm/fixmap.h to where it is really used rather than
where it may have been used long ago (requires a few other adjustments
to includes due to previous implicit dependencies).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Ingo Molnar 3c43f03908 [PATCH] x86: default to physical mode on hotplug CPU kernels
Default to physical mode on hotplug CPU kernels.  Furher simplify and clean up
the APIC initialization code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-05-02 19:27:04 +02:00
Ingo Molnar 07c7c47444 [PATCH] x86-64: always use physical delivery mode on > 8 CPUs
Remove clustered APIC mode.  There's little point in the use of clustered APIC
mode, broadcasting is limited to within the cluster only, and chipsets have
bugs in this area as well.  So default to physical APIC mode when the CPU
count is large, and default to logical APIC mode when the CPU count is 8 or
smaller.

(this patch only removes the use of genapic_cluster and cleans up the
resulting genapic.c file - removal of all remaining traces of clustered
mode will be done by another patch.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-05-02 19:27:04 +02:00
Andrew Morton a86f34b49f [PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525
Obsoleted by Ingo's genapic stuff.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Andrew Morton 3dc68d9b58 [PATCH] x86-64: revert x86_64-mm-add-genapic_force
This is obsoleted by new Ingo genapic patches.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:04 +02:00
Herbert Xu e196d62591 [CRYPTO] api: Add ablkcipher_request_set_tfm
This patch adds ablkcipher_request_set_tfm for those users that need
to manage the memory for ablkcipher requests directly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-05-02 14:38:33 +10:00
Herbert Xu 124b53d020 [CRYPTO] cryptd: Add software async crypto daemon
This patch adds the cryptd module which is a template that takes a
synchronous software crypto algorithm and converts it to an asynchronous
one by executing it in a kernel thread.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-05-02 14:38:32 +10:00
Herbert Xu a73e69965f [CRYPTO] api: Do not remove users unless new algorithm matches
As it is whenever a new algorithm with the same name is registered
users of the old algorithm will be removed so that they can take
advantage of the new algorithm.  This presents a problem when the
new algorithm is not equivalent to the old algorithm.  In particular,
the new algorithm might only function on top of the existing one.

Hence we should not remove users unless they can make use of the
new algorithm.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-05-02 14:38:32 +10:00
Herbert Xu b5b7f08869 [CRYPTO] api: Add async blkcipher type
This patch adds the mid-level interface for asynchronous block ciphers.
It also includes a generic queueing mechanism that can be used by other
asynchronous crypto operations in future.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-05-02 14:38:31 +10:00
Herbert Xu ebc610e5bc [CRYPTO] templates: Pass type/mask when creating instances
This patch passes the type/mask along when constructing instances of
templates.  This is in preparation for templates that may support
multiple types of instances depending on what is requested.  For example,
the planned software async crypto driver will use this construct.

For the moment this allows us to check whether the instance constructed
is of the correct type and avoid returning success if the type does not
match.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-05-02 14:38:31 +10:00
Herbert Xu 32e3983fe5 [CRYPTO] api: Add async block cipher interface
This patch adds the frontend interface for asynchronous block ciphers.
In addition to the usual block cipher parameters, there is a callback
function pointer and a data pointer.  The callback will be invoked only
if the encrypt/decrypt handlers return -EINPROGRESS.  In other words,
if the return value of zero the completion handler (or the equivalent
code) needs to be invoked by the caller.

The request structure is allocated and freed by the caller.  Its size
is determined by calling crypto_ablkcipher_reqsize().  The helpers
ablkcipher_request_alloc/ablkcipher_request_free can be used to manage
the memory for a request.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-05-02 14:38:30 +10:00
Haavard Skinnemoen 1c23af90dc i2c: Bitbanging I2C bus driver using the GPIO API
This is a very simple bitbanging I2C bus driver utilizing the new
arch-neutral GPIO API. Useful for chips that don't have a built-in
I2C controller, additional I2C busses, or testing purposes.

To use, include something similar to the following in the
board-specific setup code:

  #include <linux/i2c-gpio.h>

  static struct i2c_gpio_platform_data i2c_gpio_data = {
	.sda_pin	= GPIO_PIN_FOO,
	.scl_pin	= GPIO_PIN_BAR,
  };
  static struct platform_device i2c_gpio_device = {
	.name		= "i2c-gpio",
	.id		= 0,
	.dev		= {
		.platform_data	= &i2c_gpio_data,
	},
  };

Register this platform_device, set up the I2C pins as GPIO if
required and you're ready to go. This will use default values for
udelay and timeout, and will work with GPIO hardware that does not
support open drain mode, but allows sensing of the SDA and SCL lines
even when they are being driven.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:34 +02:00
Jean Delvare b86a1bc8e3 i2c: Restore i2c_smbus_read_block_data
Add back the i2c_smbus_read_block_data helper function, it is needed
by the upcoming lm93 hardware monitoring driver and possibly others.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:34 +02:00
Jean Delvare 424ed67c7d i2c-algo-bit: Implement a 50/50 SCL duty cycle
The original i2c-algo-bit implementation uses a 33/66 SCL duty cycle
when bits are being written on the bus. While the I2C specification
doesn't forbid it, this prevents us from driving the I2C bus to its
max speed, limiting us to 66 kbps max on standard I2C busses.

Implementing a 50/50 duty cycle instead lets us max out the bandwidth
up to the theoretical max of 100 kbps on standard I2C busses. This is
particularly important when large amounts of data need to be transfered
over the bus, as is the case with some TV adapters when the firmware is
being uploaded.

In fact this change even allows, at least in theory, fast-mode I2C
support at 125, 166 and 250 kbps. There's no way to reach the
theoretical max of 400 kbps with this implementation. But I don't
think we want to put efforts in that direction anyway: software-driven
I2C is very CPU-intensive and bad for latency.

Other timing changes:
* Don't set SDA high explicitly on error, we're going to issue a stop
  condition before we leave anyway.
* If an error occurs when sending the slave address, yield the CPU
  before retrying, and remove the additional delay after the new start
  condition.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:33 +02:00
Bryan Wu d24ecfcc39 i2c: Blackfin Two Wire Interface driver
The i2c linux driver for blackfin architecture which supports blackfin
on-chip TWI controller i2c operation.

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:32 +02:00
Jean Delvare b3e820968a i2c: Make i2c_del_driver a void function
Make i2c_del_driver a void function, like all other driver removal
functions. It always returned 0 even when errors occured, and nobody
ever actually checked the return value anyway. And we cannot fail
a module removal anyway.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:32 +02:00
Jean Delvare a97f1ed090 i2c: Move i2c-isa-only exported symbol declarations
Move the declaration of i2c-isa-only exported symbols to i2c-isa
itself, that's the best way to ensure nobody will attempt to use them.
Hopefully we'll get rid of the exports themselves soon anyway.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:32 +02:00
Jean Delvare 12b5053ac5 i2c: Add i2c_new_probed_device()
Add a new helper function to instantiate an i2c device. It is meant as a
replacement for i2c_new_device() when you don't know for sure at which
address your I2C/SMBus device lives. This happens frequently on TV
adapters for example, you know there is a tuner chip on the bus, but
depending on the exact board model and revision, it can live at different
addresses. So, the new i2c_new_probed_device() function will probe the bus
according to a list of addresses, and as soon as one of these addresses
responds, it will call i2c_new_device() on that one address.

This function will make it possible to port the old i2c drivers to the
new model quickly.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:31 +02:00
Jean Delvare 0f3b483852 i2c-algo-bit: Add i2c_bit_add_numbered_bus
Add i2c_bit_add_numbered_bus(), which is equivalent to i2c_bit_add_bus
except that it calls i2c_add_numbered_adapter() at the end instead of
i2c_add_adapter().

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:31 +02:00
David Brownell 6e13e64184 i2c: Add i2c_add_numbered_adapter()
This adds a call, i2c_add_numbered_adapter(), registering an I2C adapter
with a specific bus number and then creating I2C device nodes for any
pre-declared devices on that bus.  It builds on previous patches adding
I2C probe() and remove() support, and that pre-declaration of devices.

This completes the core support for "new style" I2C device drivers.
Those follow the standard driver model for binding devices to drivers
(using probe and remove methods) rather than a legacy model (where the
driver tries to autoconfigure each bus, and registers devices itself).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:31 +02:00
David Brownell 9c1600eda4 i2c: Add i2c_board_info and i2c_new_device()
This provides partial support for new-style I2C driver binding.  It builds
on "struct i2c_board_info" declarations that identify I2C devices on a given
board.  This is needed on systems with I2C devices that can't be fully probed
and/or autoconfigured, such as many embedded Linux configurations where the
way a given I2C device is wired may affect how it must be used.

There are two models for declaring such devices:

 * LATE -- using a public function i2c_new_device().  This lets modules
   declare I2C devices found *AFTER* a given I2C adapter becomes available.
   
   For example, a PCI card could create adapters giving access to utility
   chips on that card, and this would be used to associate those chips with
   those adapters.

 * EARLY -- from arch_initcall() level code, using a non-exported function
   i2c_register_board_info().  This copies the declarations *BEFORE* such
   an i2c_adapter becomes available, arranging that i2c_new_device() will
   be called later when i2c-core registers the relevant i2c_adapter.

   For example, arch/.../.../board-*.c files would declare the I2C devices
   along with their platform data, and I2C devices would behave much like
   PNPACPI devices.  (That is, both enumerate from board-specific tables.)

To match the exported i2c_new_device(), the previously-private function
i2c_unregister_device() is now exported.

Pending later patches using these new APIs, this is effectively a NOP.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:31 +02:00
David Brownell a1d9e6e49f i2c: i2c stack can remove()
More update for new style driver support:  add a remove() method, and
use it in the relevant code paths.

Again, nothing will use this yet since there's nothing to create devices
feeding this infrastructure.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:30 +02:00
David Brownell 7b4fbc50fa i2c: i2c stack can probe()
One of a series of I2C infrastructure updates to support enumeration using
the standard Linux driver model.

This patch updates probe() and associated hotplug/coldplug support, but
not remove().  Nothing yet _uses_ it to create I2C devices, so those
hotplug/coldplug mechanisms will be the only externally visible change.
This patch will be an overall NOP since the I2C stack doesn't yet create
clients/devices except as part of binding them to legacy drivers.

Some code is moved earlier in the source code, helping group more of the
per-device infrastructure in one place and simplifying handling per-device
attributes.

Terminology being adopted:  "legacy drivers" create devices (i2c_client)
themselves, while "new style" ones follow the driver model (the i2c_client
is handed to the probe routine).  It's an either/or thing; the two models
don't mix, and drivers that try mixing them won't even be registered.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:30 +02:00
Jean Delvare 7c59b6615f i2c: Cleanup the includes of <linux/i2c.h>
Clean up the includes of <linux/i2c.h>. Only include this header file
when we actually need it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:29 +02:00
Jean Delvare f75803de6a i2c-nforce2: Add support for the MCP61 and MCP65
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Hans-Frieder Vogt <hfvogt@gmx.net>
2007-05-01 23:26:29 +02:00
Jean Delvare 209d27c3b1 i2c: Emulate SMBus block read over I2C
Let the I2C bus drivers emulate the SMBus Block Read and Block Process
Call transactions if they wish. This requires to define a new message
flag, which i2c-core will use to let the underlying I2C bus driver
know that the first received byte will specify the length of the read
message.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:29 +02:00
David Brownell ef2c8321f5 i2c: Rename dev_to_i2c_adapter()
Rename dev_to_i2c_adapter() as to_i2c_adapter(), since the previous
syntax was a surprising and needless difference from normal naming
conventions in Linux.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:28 +02:00
David Brownell 2096b956d2 i2c: Shrink struct i2c_client
This shrinks the size of "struct i2c_client" by 40 bytes:

 - Substantially shrinks the string used to identify the chip type
 - The "flags" don't need to be so big
 - Removes some internal padding

It also adds kerneldoc for that struct, explaining how "name" is really a
chip type identifier; it's otherwise potentially confusing.

Because the I2C_NAME_SIZE symbol was abused for both i2c_client.name
and for i2c_adapter.name, this needed to affect i2c_adapter too.  The
adapters which used that symbol now use the more-obviously-correct
idiom of taking the size of that field.

JD: Shorten i2c_adapter.name from 50 to 48 bytes while we're here, to
avoid wasting space in padding.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:28 +02:00
Jean Delvare b31366f439 i2c: i2c_adapter devices need no driver
Kill i2c_adapter_driver as it doesn't make sense and it prevents
further i2c-core cleanups. i2c_adapter devices are virtual devices
(ex-class devices) and as such they don't need a driver.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:28 +02:00
Jean Delvare fccb56e4d8 i2c: Kill i2c_adapter.class_dev
Kill i2c_adapter.class_dev. Instead, set the class of i2c_adapter.dev
to i2c_adapter_class, so that a symlink will be created for every
i2c_adapter in /sys/class/i2c-adapter.

The same change must be mirrored to i2c-isa as it duplicates some
of the i2c-core functionalities.

User-space tools and libraries might need some adjustments. In
particular, libsensors from lm_sensors 2.10.3 or later is required for
proper discovery of i2c adapter names after this change.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-01 23:26:27 +02:00
Christoph Hellwig f3402a4e52 [VOYAGER] Convert the monitor thread to use the kthread API
full kthread conversion on the voyager power switch handling thread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2007-05-01 10:09:29 -05:00
Pierre Ossman bd76631261 mmc: remove old card states
Remove card states that no longer make any sense.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 16:11:57 +02:00
Philip Langdale 55556da012 MMC: Fix handling of low-voltage cards
Fix handling of low voltage MMC cards.

The latest MMC and SD specs both agree that support for
low-voltage operations is indicated by bit 7 in the OCR.
The MMC spec states that the low voltage range is
1.65-1.95V while the SD spec leaves the actual voltage
range undefined - meaning that there is still no such
thing as a low voltage SD card.

However, an old Sandisk spec implied that bits 7.0
represented voltages below 2.0V in 1V or 0.5V increments,
and the code was accordingly written with that expectation.

This confusion meant that host drivers attempting to support
the typical low voltage (1.8V) would set the wrong bits in
the host OCR mask (usually bits 5 and/or 6) resulting in the
the low voltage mode never being used.

This change corrects the low voltage range and adds sanity
checks on the reserved bits (0-6) and for SD cards that
claim to support low-voltage operations.

Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 14:14:50 +02:00
Philip Langdale 4be34c99a2 MMC: Consolidate voltage definitions
Consolidate the list of available voltages.

Up until now, a separate set of defines has been
used for host->vdd than that used for the OCR
voltage mask values. Having two sets of defines
allows them to get out of sync and the current
sets are already inconsistent with one claiming
to describe ranges and the other specific voltages.

Only the SDHCI driver uses the host->vdd defines and
it is easily fixed to use the OCR defines.

Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:42:28 +02:00
Pierre Ossman 7ea239d9e6 mmc: add bus handler
Delegate protocol handling to "bus handlers". This allows the core to
just handle the task of arbitrating the bus. Initialisation and
pampering of cards is now done by the different bus handlers.

This design also allows MMC and SD (and later SDIO) to be more cleanly
separated, allowing easier maintenance.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:41:06 +02:00
Pierre Ossman da7fbe58d2 mmc: Separate out protocol ops
Move protocol operations and definitions into their own files
in an effort to separate protocol handling and bus
arbitration more clearly.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:18 +02:00
Pierre Ossman aaac1b470b mmc: Move core functions to subdir
Create a "core" subdirectory to house the central bus handling
functions.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:18 +02:00
Pierre Ossman b855885e3b mmc: deprecate mmc bus topology
The classic MMC bus was defined as multi card bus
system, which is reflected in the design in the MMC
layer.

When SD showed up, the bus topology was abandoned
and a star topology (one card per host) was mandated.
MMC version 4 has followed this, officially deprecating
the bus topology.

As we do not have any known users of the bus
topology we can remove support for it. This will
simplify the code and rectify some incorrect
assumptions in the newer additions.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:18 +02:00
Pierre Ossman 3b91e5507c mmc: Flush pending detects on host removal
Make sure we kill of any pending detection runs when the host
is removed instead of when it is freed. Also add some debugging
to make sure the driver doesn't queue up more detection after it
has removed the host.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:17 +02:00
Pierre Ossman f74d132cec mmc: Move OCR bit defines
All host drivers were #include:ing mmc/protocol.h just to
get access to the OCR bit defines. Move these to host.h instead.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:16 +02:00
Pierre Ossman 9c2c0af950 mmc: add type field to cards
Split out the type of card into its own field as it hardly
qualifies as a state.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:16 +02:00
Pierre Ossman 85a18ad93e mmc: MMC sector based cards
Support for MMC 4.2 sector based cards. This tweaks the init a
bit and reads a new field out of the EXT_CSD.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:15 +02:00
Alex Dubov 91f8d0118a tifm: layout fixes, small changes to comments and printfs
Cosmetic changes to the code.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:15 +02:00
Alex Dubov 13cdf48ef1 tifm_sd: implement software scatter-gather
It was found that delays associated with issue and completion of the commands
severely limit performance of the new, fast SD cards. To alleviate this issue
scatter-gather emulation in software is implemented for both dma and pio
transfer modes. Non-block aligned and high memory sg entries are accounted
for.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:15 +02:00
Alex Dubov 72dc9d9619 tifm_sd: replace command completion state machine with full checking
State machine used to to track mmc command state was found to be fragile
and unreliable, making many cards unusable. The safer solution is to perform
all needed checks at every card event.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:14 +02:00
Alex Dubov 2428a8fe22 tifm: move common device management tasks from tifm_7xx1 to tifm_core
Some details of the device management (create, add, remove) are really
belong to the tifm_core, as they are not hardware specific.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:13 +02:00
Alex Dubov 6113ed73e6 tifm: move common adapter management tasks from tifm_7xx1 to tifm_core
Some details of the adapter management (create, add, remove) are really
belong to the tifm_core, as they are not hardware specific.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:13 +02:00
Alex Dubov 3540af8ffd tifm: replace per-adapter kthread with freezeable workqueue
Freezeable workqueue makes sure that adapter work items (device insertions
and removals) would be handled after the system is fully resumed. Previously
this was achieved by explicit freezing of the kthread.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:13 +02:00
Alex Dubov e23f2b8a1a tifm: simplify bus match and uevent handlers
Remove code duplicating the kernel functionality and clean up data
structures involved in driver matching.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:13 +02:00
Alex Dubov 8dc4a61eca tifm: use bus methods to handle probe/remove instead of driver ones.
Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:12 +02:00
Alex Dubov 4552f0cbd4 tifm: hide details of interrupt processing from socket drivers
Instead of passing transformed value of adapter interrupt status to
socket drivers, implement two separate callbacks - one for card events
and another for dma events.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-01 13:04:12 +02:00
David Teigland 72c2be776b [DLM] interface for purge (2/2)
Add code to accept purge commands from userland.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:12 +01:00
Steve Dickson 74dd34e6e8 NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
READDIRPLUS can be a performance hindrance when the client is working with
large directories. In addition, some servers still have bugs in their
implementations (e.g. Tru64 returns wrong values for the fsid).

Add a mount flag to enable users to turn it off at mount time following the
implementation in Apple's NFS client.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:16 -07:00
Chuck Lever 4c2eaf073f SUNRPC: remove old portmapper
net/sunrpc/pmap_clnt.c has been replaced by net/sunrpc/rpcb_clnt.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:15 -07:00
Chuck Lever a509050bd3 SUNRPC: introduce rpcbind: replacement for in-kernel portmapper
Introduce a replacement for the in-kernel portmapper client that supports
all 3 versions of the rpcbind protocol.  This code is not used yet.

Original code by Groupe Bull updated for the latest kernel, with multiple
bug fixes.

Note that rpcb_clnt.c does not yet support registering via versions 3 and
4 of the rpcbind protocol.  That is planned for a later patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:12 -07:00
Chuck Lever c5a4dd8b7c SUNRPC: Eliminate side effects from rpc_malloc
Currently rpc_malloc sets req->rq_buffer internally.  Make this a more
generic interface:  return a pointer to the new buffer (or NULL) and
make the caller set req->rq_buffer and req->rq_bufsize.  This looks much
more like kmalloc and eliminates the side effects.

To fix a potential deadlock, this patch also replaces GFP_NOFS with
GFP_NOWAIT in rpc_malloc.  This prevents async RPCs from sleeping outside
the RPC's task scheduler while allocating their buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:11 -07:00
Chuck Lever 2bea90d43a SUNRPC: RPC buffer size estimates are too large
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two.  That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:10 -07:00
Chuck Lever 511d2e8855 NLM: Shrink the maximum request size of NLM4 requests
NLM version 4 requests estimate the call and reply header sizes rather
conservatively, using the very maximum size allowed in the protocol even
though Linux always uses only a small fraction of the allowable space.

Reduce the size of caller and lock arguments to conserve RPC buffer space
while XDR encoding NLM4 arguments.  Add compile-time checks to ensure the
hostname string won't overflow NLM protocol maximums.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:09 -07:00
Trond Myklebust ca52fec152 NFS: Use pgoff_t in structures and functions that pass page cache offsets
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:09 -07:00
Trond Myklebust 8d5658c949 NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:07 -07:00
Trond Myklebust c63c7b0513 NFS: Fix a race when doing NFS write coalescing
Currently we do write coalescing in a very inefficient manner: one pass in
generic_writepages() in order to lock the pages for writing, then one pass
in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
the locked pages for coalescing into RPC requests of size "wsize".

In fact, it turns out there is actually a deadlock possible here since we
only start I/O on the second pass. If the user signals the process while
we're in nfs_sync_mapping_wait(), for instance, then we may exit before
starting I/O on all the requests that have been queued up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:06 -07:00
Trond Myklebust 8b09bee308 NFS: Cleanup for nfs_readpages()
Do the coalescing of read requests into block sized requests at start of
I/O as we scan through the pages instead of going through a second pass.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:05 -07:00
Trond Myklebust bcb71bba7e NFS: Another cleanup of the read/write request coalescing code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:04 -07:00
Trond Myklebust d8a5ad75cc NFS: Cleanup the coalescing code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:04 -07:00
Roman Moravcik 84767d00a8 Input: gpio_keys - add support for switches (EV_SW)
Signed-off-by: Roman Moravcik <roman.moravcik@gmail.com>
Signed-off-by: Paul Sokolovsky <pmiscml@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2007-05-01 00:39:13 -04:00
Dmitry Torokhov bc95f3669f Merge master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/usb/input/Makefile
	drivers/usb/input/gtco.c
2007-05-01 00:24:54 -04:00
David Rientjes 14e38ac823 pm: include EIO from errno-base.h
For backwards compatibility, call_platform_enable_wakeup() can return 0
instead of -EIO since we aren't guaranteed to have errno defined.

Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:41 -07:00
Jeremy Fitzhardinge 11443ec7d9 Add kvasprintf()
Add a kvasprintf() function to complement kasprintf().

No in-tree users yet, but I have some coming up.

[akpm@linux-foundation.org: EXPORT it]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Keir Fraser <keir@xensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg 9684e51cd1 power management: force pm_ops.valid callback to be assigned
This patch changes the docs and behaviour from "all states valid" to "no
states valid" if no .valid callback is assigned.  Users of pm_ops that only
need mem sleep can assign pm_valid_only_mem without any overhead, others
will require more elaborate callbacks.

Now that all users of pm_ops have a .valid callback this is a safe thing to
do and prevents things from getting messy again as they were before.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg e8c9c50269 power management: implement pm_ops.valid for everybody
Almost all users of pm_ops only support mem sleep, don't check in .valid and
don't reject any others in .prepare so users can be confused if they check
/sys/power/state, especially when new states are added (these would then
result in s-t-r although they're supposed to be something different).

This patch implements a generic pm_valid_only_mem function that is then
exported for users and puts it to use in almost all existing pm_ops.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: linux-pm@lists.linux-foundation.org
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg 11d77d0c01 power management: remove firmware disk mode
This patch removes the firmware disk suspend mode which is the wrong approach,
it is supposed to be used for implementing firmware-based disk suspend but
cannot actually be used for that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Johannes Berg fe0c935a6c rework pm_ops pm_disk_mode, kill misuse
This patch series cleans up some misconceptions about pm_ops.  Some users of
the pm_ops structure attempt to use it to stop the user from entering suspend
to disk, this, however, is not possible since the user can always use
"shutdown" in /sys/power/disk and then the pm_ops are never invoked.  Also,
platforms that don't support suspend to disk simply should not allow
configuring SOFTWARE_SUSPEND (read the help text on it, it only selects
suspend to disk and nothing else, all the other stuff depends on PM).

The pm_ops structure is actually intended to provide a way to enter
platform-defined sleep states (currently supported states are "standby" and
"mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured)
allows a platform to support a platform specific way to enter low-power mode
once everything has been saved to disk.  This is currently only used by ACPI
(S4).

This patch:

The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really
seems to understand what it actually does.

This patch clarifies the pm_disk_mode description.

It also removes all the arm and sh users that think they can veto suspend to
disk via pm_ops; not so since the user can always do echo shutdown >
/sys/power/disk, they need to find a better way involving Kconfig or such.

ACPI is the only user left with a non-zero pm_disk_mode.

The patch also sets the default mode to shutdown again, but when a new pm_ops
is registered its pm_disk_mode is selected as default, that way the default
stays for ACPI where it is apparently required.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Robert Peterson 42e380832a Extend print_symbol capability
Today's print_symbol function dumps a kernel symbol with printk.  This
patch extends the functionality of kallsyms.c so that the symbol lookup
function may be used without the printk.  This is useful for modules that
want to dump symbols elsewhere, for example, to debugfs.  I intend to use
the new function call in the GFS2 file system (which will be a separate
patch).

[akpm@linux-foundation.org: build fix]
[clameter@sgi.com: sprint_symbol should return length of string like sprintf]
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:39 -07:00
Tony Luck d29182534c Pull mem-attribute into release branch 2007-04-30 13:56:17 -07:00
Tony Luck b643b0fdbc Pull percpu-dtc into release branch 2007-04-30 13:56:00 -07:00
Tony Luck e0cc09e295 Pull error-inject into release branch 2007-04-30 13:55:43 -07:00