* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
Revert "proc: revert /proc/uptime to ->read_proc hook"
proc 2/2: remove struct proc_dir_entry::owner
proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc
proc: fix sparse warnings in pagemap_read()
proc: move fs/proc/inode-alloc.txt comment into a source file
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Everyone defines it, and only one person uses it
(arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-mips@linux-mips.org
* 'percpu-cpumask-x86-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (682 commits)
percpu: fix spurious alignment WARN in legacy SMP percpu allocator
percpu: generalize embedding first chunk setup helper
percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk()
percpu: make x86 addr <-> pcpu ptr conversion macros generic
linker script: define __per_cpu_load on all SMP capable archs
x86: UV: remove uv_flush_tlb_others() WARN_ON
percpu: finer grained locking to break deadlock and allow atomic free
percpu: move fully free chunk reclamation into a work
percpu: move chunk area map extension out of area allocation
percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free()
x86, percpu: setup reserved percpu area for x86_64
percpu, module: implement reserved allocation and use it for module percpu variables
percpu: add an indirection ptr for chunk page map access
x86: make embedding percpu allocator return excessive free space
percpu: use negative for auto for pcpu_setup_first_chunk() arguments
percpu: improve first chunk initial area map handling
percpu: cosmetic renames in pcpu_setup_first_chunk()
percpu: clean up percpu constants
x86: un-__init fill_pud/pmd/pte
x86: remove vestigial fix_ioremap prototypes
...
Manually merge conflicts in arch/ia64/kernel/irq_ia64.c
Conflicts:
arch/sparc/kernel/time_64.c
drivers/gpu/drm/drm_proc.c
Manual merge to resolve build warning due to phys_addr_t type change
on x86:
drivers/gpu/drm/drm_info.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to a different size of ino_t ustat needs a compat handler, but
currently only x86 and mips provide one. Add a generic compat_sys_ustat
and switch all architectures over to it. Instead of doing various
user copy hacks compat_sys_ustat just reimplements sys_ustat as
it's trivial. This was suggested by Arnd Bergmann.
Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
stack smashing warnings on RHEL/Fedora due to the too large amount of
data writen by the syscall.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
arch/ia64/kernel/irq_ia64.c: In function 'ia64_handle_irq':
arch/ia64/kernel/irq_ia64.c:498: error: 'struct kernel_stat' has no member named 'irqs'
arch/ia64/kernel/irq_ia64.c:500: error: 'struct kernel_stat' has no member named 'irqs'
arch/ia64/kernel/irq_ia64.c: In function 'ia64_process_pending_intr':
arch/ia64/kernel/irq_ia64.c:556: error: 'struct kernel_stat' has no member named 'irqs'
arch/ia64/kernel/irq_ia64.c:558: error: 'struct kernel_stat' has no member named 'irqs'
Fix build breakage due to recent kstat_this_cpu changes in:
d7e51e6689
sparseirq: make some func to be used with genirq
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: disable __do_IRQ support
sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
genirq: deprecate obsolete typedefs and defines
genirq: deprecate __do_IRQ
genirq: add doc to struct irqaction
genirq: use kzalloc instead of explicit zero initialization
genirq: make irqreturn_t an enum
genirq: remove redundant if condition
genirq: remove unused hw_irq_controller typedef
irq: export remove_irq() and setup_irq() symbols
irq: match remove_irq() args with setup_irq()
irq: add remove_irq() for freeing of setup_irq() irqs
genirq: assert that irq handlers are indeed running in hardirq context
irq: name 'p' variables a bit better
irq: further clean up the free_irq() code flow
irq: refactor and clean up the free_irq() code flow
irq: clean up manage.c
irq: use GFP_KERNEL for action allocation in request_irq()
kernel/irq: fix sparse warning: make symbol static
irq: optimize init_kstat_irqs/init_copy_kstat_irqs
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
ixgbe: Allow Priority Flow Control settings to survive a device reset
net: core: remove unneeded include in net/core/utils.c.
e1000e: update version number
e1000e: fix close interrupt race
e1000e: fix loss of multicast packets
e1000e: commonize tx cleanup routine to match e1000 & igb
netfilter: fix nf_logger name in ebt_ulog.
netfilter: fix warning in ebt_ulog init function.
netfilter: fix warning about invalid const usage
e1000: fix close race with interrupt
e1000: cleanup clean_tx_irq routine so that it completely cleans ring
e1000: fix tx hang detect logic and address dma mapping issues
bridge: bad error handling when adding invalid ether address
bonding: select current active slave when enslaving device for mode tlb and alb
gianfar: reallocate skb when headroom is not enough for fcb
Bump release date to 25Mar2009 and version to 0.22
r6040: Fix second PHY address
qeth: fix wait_event_timeout handling
qeth: check for completion of a running recovery
qeth: unregister MAC addresses during recovery.
...
Manually fixed up conflicts in:
drivers/infiniband/hw/cxgb3/cxio_hal.h
drivers/infiniband/hw/nes/nes_nic.c
Include the newly introduced msidef.h to solve the build issues.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
For kvm's MSI support, it needs these macros defined in ia64_msi.c, and
to avoid duplicate them, move them to one header file and share with
kvm.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or >= 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Account for number of online cpus and use that in loops iterating over
the list of vpus instead of scanning the full array unconditionally.
This patch is a building block to facilitate allowing to bump up
the size of MAX_VCPUS significantly.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by : Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
For windows 2008, it needs more pal calls to implement for booting.
In addition, also changes the name of set_{sal, pal}_call_result to
get_{sal,pal}_call_result for readability.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add the support to translate PAL Call's pointer args.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Simply the logic of hash vTLB, and export kvm_gpa_to_mpa.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently KVM has a static routing from GSI numbers to interrupts (namely,
0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to
the IOAPIC). This is insufficient for several reasons:
- HPET requires non 1:1 mapping for the timer interrupt
- MSIs need a new method to assign interrupt numbers and dispatch them
- ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the
ioapics
This patch implements an interrupt routing table (as a linked list, but this
can be easily changed) and a userspace interface to replace the table. The
routing table is initialized according to the current hardwired mapping.
Signed-off-by: Avi Kivity <avi@redhat.com>
Two KVM archs support irqchips and two don't. Add a Kconfig item to
make selecting between the two models easier.
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove some unnecessary blank lines to accord with Kernel's coding style.
Also remove vcpu_get_itir_on_fault due to no reference to it.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove the remaining arch fragments of the old guest debug interface
that now break non-x86 builds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Implement KVM_IA64_VCPU_[GS]ET_STACK ioctl calls. This is required
for live migrations.
Patch is based on previous implementation that was part of old
GET/SET_REGS ioctl calls.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.
To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.
The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.
Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Impact: __per_cpu_load available on all SMP capable archs
Percpu now requires three symbols to be defined - __per_cpu_load,
__per_cpu_start and __per_cpu_end. There were three archs which
didn't have it. Update them as follows.
* powerpc: can use generic PERCPU() macro. Compile tested for
powerpc32, compile/boot tested for powerpc64.
* ia64: can use generic PERCPU_VADDR() macro. __phys_per_cpu_start is
identical to __per_cpu_load. Compile tested and symbol table looks
identical after the change except for the additional __per_cpu_load.
* arm: added explicit __per_cpu_load definition. Currently uses
unified .init output section so can't use the generic macro. Dunno
whether the unified .init ouput section is required by arch
peculiarity so I left it alone. Please break it up and use PERCPU()
if possible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Pat Gefre <pfg@sgi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
We recently discovered a problem with passing of DMA attributes on SN
systems with the older PIC chips.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeremy Higdon <jeremy@sgi.com>
Cc: <habeck@sgi.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
vi arch/ia64/kernel/iosapic.c +142
static struct iosapic_intr_info {
...
} iosapic_intr_info[NR_IRQS];
But at line 510 we have:
for (i = 0; i <= NR_IRQS; i++) {
s/<=/</
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
static struct {
... :114
unsigned short hash[UNW_HASH_SIZE];
... :2152
for (index = 0; index <= UNW_HASH_SIZE; ++index) {
This is a bug, isn't it?
s/<=/</
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The previous commit which introduced the DMAR_DEFAULT_ON setting in
drivers/pci/dmar.c neglected to add the ability for ia64 to enable
the IOMMU by default. Rectify that mistake, doh!
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/xen/xen_pv_ops.c:156: error: xen_init_ops causes a section type conflict
arch/ia64/xen/xen_pv_ops.c:340: error: xen_iosapic_ops causes a section type conflict
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes xen related Kconfigs and add default config
file for ia64 xen domU.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
This reverts commit e7b140365b.
Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:
RCU happily ignores CPUs that don't have their bits set in
cpu_online_map, so if there are RCU read-side critical sections
in the irq handlers being run, RCU will ignore them. If the
other CPUs were running, they might sequence through the RCU
state machine, which could result in data structures being
yanked out from under those irq handlers, which in turn could
result in oopses or worse.
Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.
This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:
Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
[<a000000100016930>] show_stack+0x50/0xa0
sp=e0000009c922fa00 bsp=e0000009c92214d0
[<a0000001000171a0>] show_regs+0x820/0x860
sp=e0000009c922fbd0 bsp=e0000009c9221478
[<a00000010003c700>] die+0x1a0/0x2e0
sp=e0000009c922fbd0 bsp=e0000009c9221438
[<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
sp=e0000009c922fbd0 bsp=e0000009c92213d8
[<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
sp=e0000009c922fc60 bsp=e0000009c92213d8
[<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
sp=e0000009c922fe30 bsp=e0000009c9221398
[<a00000010003bb90>] timer_interrupt+0x170/0x3e0
sp=e0000009c922fe30 bsp=e0000009c9221330
[<a00000010013a800>] handle_IRQ_event+0x80/0x120
sp=e0000009c922fe30 bsp=e0000009c92212f8
[<a00000010013aa00>] __do_IRQ+0x160/0x4a0
sp=e0000009c922fe30 bsp=e0000009c9221290
[<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
sp=e0000009c922fe30 bsp=e0000009c9221208
[<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
sp=e0000009c922fe30 bsp=e0000009c92211a8
[<a00000010005bd80>] __cpu_disable+0x140/0x240
sp=e0000009c922fe30 bsp=e0000009c9221168
[<a0000001006c5870>] take_cpu_down+0x50/0xa0
sp=e0000009c922fe30 bsp=e0000009c9221148
[<a000000100122610>] stop_cpu+0xd0/0x200
sp=e0000009c922fe30 bsp=e0000009c92210f0
[<a0000001000e0440>] kthread+0xc0/0x140
sp=e0000009c922fe30 bsp=e0000009c92210c8
[<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
sp=e0000009c922fe30 bsp=e0000009c92210a0
[<a00000010000a4c0>] start_kernel_thread+0x20/0x40
sp=e0000009c922fe30 bsp=e0000009c92210a0
I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.
Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.
As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:
- 8-way rx6600
- randomly toggling CPU online/offline state every 2 seconds
- running CPU exercisers, memory hog, disk exercisers, and
network stressors
- average system load around ~160
In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).
One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:
arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
[remove cpu from cpu_online_map]
fixup_irqs():
for_each_irq:
[break CPU affinities]
local_irq_enable();
mdelay(1);
local_irq_disable();
So they are doing implicitly what ia64 is doing explicitly.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>