* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
powerpc/nvram: Enable use Generic NVRAM driver for different size chips
powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
powerpc/ps3: Workaround for flash memory I/O error
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
powerpc/perf_counters: Reduce stack usage of power_check_constraints
powerpc: Fix bug where perf_counters breaks oprofile
powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
powerpc/irq: Improve nanodoc
powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
powerpc/book3e: Add missing page sizes
powerpc/pseries: Fix to handle slb resize across migration
powerpc/powermac: Thermal control turns system off too eagerly
powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
powerpc/405ex: support cuImage via included dtb
powerpc/405ex: provide necessary fixup function to support cuImage
powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
powerpc/44x: Update Arches defconfig
powerpc/44x: Update Arches dts
...
Fix up conflicts in drivers/char/agp/uninorth-agp.c
* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp/intel: remove restore in resume
agp: fix uninorth build
intel-agp: Set dma mask for i915
agp: kill phys_to_gart() and gart_to_phys()
intel-agp: fix sglist allocation to avoid vmalloc()
intel-agp: Move repeated sglist free into separate function
agp: Switch agp_{un,}map_page() to take struct page * argument
agp: tidy up handling of scratch pages w.r.t. DMA API
intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
agp: Add generic support for graphics dma remapping
agp: Switch mask_memory() method to take address argument again, not page
console_print() is an old legacy interface mostly unused in the entire
kernel tree. It's best to clean up its existing use and let developers
use their own implementation of it as they feel fit.
Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
fsync: wait for data writeout completion before calling ->fsync
vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()
fat: Opencode sync_page_range_nolock()
pohmelfs: Use new syncing helper
xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()
ocfs2: Update syncing after splicing to match generic version
ntfs: Use new syncing helpers and update comments
ext4: Remove syncing logic from ext4_file_write
ext3: Remove syncing logic from ext3_file_write
ext2: Update comment about generic_osync_inode
vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
vfs: Rename generic_file_aio_write_nolock
ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock
vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
vfs: Export __generic_file_aio_write() and add some comments
vfs: Introduce filemap_fdatawait_range
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
As early pci resume has already restored config for host
bridge and graphics device, don't need to restore it again,
This removes an original order hack for graphics device restore.
This fixed the resume hang issue found by Alan Stern on 845G,
caused by extra config restore on graphics device.
Cc: Stable Team <stable@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (102 commits)
crypto: sha-s390 - Fix warnings in import function
crypto: vmac - New hash algorithm for intel_txt support
crypto: api - Do not displace newly registered algorithms
crypto: ansi_cprng - Fix module initialization
crypto: xcbc - Fix alignment calculation of xcbc_tfm_ctx
crypto: fips - Depend on ansi_cprng
crypto: blkcipher - Do not use eseqiv on stream ciphers
crypto: ctr - Use chainiv on raw counter mode
Revert crypto: fips - Select CPRNG
crypto: rng - Fix typo
crypto: talitos - add support for 36 bit addressing
crypto: talitos - align locks on cache lines
crypto: talitos - simplify hmac data size calculation
crypto: mv_cesa - Add support for Orion5X crypto engine
crypto: cryptd - Add support to access underlaying shash
crypto: gcm - Use GHASH digest algorithm
crypto: ghash - Add GHASH digest algorithm for GCM
crypto: authenc - Convert to ahash
crypto: api - Fix aligned ctx helper
crypto: hmac - Prehash ipad/opad
...
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: check for registered bdi in flusher add and inode dirty
writeback: add name to backing_dev_info
writeback: add some debug inode list counters to bdi stats
writeback: get rid of pdflush completely
writeback: switch to per-bdi threads for flushing data
writeback: move dirty inodes from super_block to backing_dev_info
writeback: get rid of generic_sync_sb_inodes() export
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (54 commits)
[S390] tape: Use pr_xxx instead of dev_xxx in shared driver code
[S390] Wire up page fault events for software perf counters.
[S390] Remove smp_cpu_not_running.
[S390] Get rid of cpuid.h header file.
[S390] Limit cpu detection to 256 physical cpus.
[S390] tape: Fix device online messages
[S390] Enable guest page hinting by default.
[S390] use generic scatterlist.h
[S390] s390dbf: Add description for usage of "%s" in sprintf events
[S390] Initialize __LC_THREAD_INFO early.
[S390] fix recursive locking on page_table_lock
[S390] kvm: use console_initcall() to initialize s390 virtio console
[S390] tape: reversed order of labels
[S390] hypfs: Use "%u" instead of "%d" for unsigned ints in snprintf
[S390] kernel: Print an error message if kernel NSS cannot be defined
[S390] zcrypt: Free ap_device if dev_set_name fails.
[S390] zcrypt: Use spin_lock_bh in suspend callback
[S390] xpram: Remove checksum validation for suspend/resume
[S390] vmur: Invalid allocation sequence for vmur class
[S390] hypfs: remove useless variable qname
...
Don't use kfree directly after device registration started.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Remove the reliance on a staticly defined NVRAM size, allowing
platforms to support NVRAMs with sizes differing from the standard.
A fall back value is provided for platforms not supporting this extension.
Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When probing the device in tpm_tis_init the call request_locality
uses timeout_a, which wasn't being initalized until after
request_locality. This results in request_locality falsely timing
out if the chip is still starting. Move the initialization to before
request_locality.
This probably only matters for embedded cases (ie mine), a BIOS likely
gets the TPM into a state where this code path isn't necessary.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
agp/intel: support for new chip variant of IGDNG mobile
drm/i915: Unref old_obj on get_fence_reg() error path
drm/i915: increase default latency constant (v2 w/comment)
The whole write-room thing is something that is up to the _caller_ to
worry about, not the pty layer itself. The total buffer space will
still be limited by the buffering routines themselves, so there is no
advantage or need in having pty_write() artificially limit the size
somehow.
And what happened was that the caller (the n_tty line discipline, in
this case) may have verified that there is room for 2 bytes to be
written (for NL -> CRNL expansion), and it used to then do those writes
as two single-byte writes. And if the first byte written (CR) then
caused a new tty buffer to be allocated, pty_space() may have returned
zero when trying to write the second byte (LF), and then incorrectly
failed the write - leading to a lost newline character.
This should finally fix
http://bugzilla.kernel.org/show_bug.cgi?id=14015
Reported-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When translating CR to CRNL in the n_tty line discipline, we did it as
two tty_put_char() calls. Which works, but is stupid, and has caused
problems before too with bad interactions with the write_room() logic.
The generic USB serial driver had that problem, for example.
Now the pty layer had similar issues after being moved to the generic
tty buffering code (in commit d945cb9cce:
"pty: Rework the pty layer to use the normal buffering logic").
So stop doing the silly separate two writes, and do it as a single write
instead. That's what the n_tty layer already does for the space
expansion of tabs (XTABS), and it means that we'll now always have just
a single write for the CRNL to match the single 'tty_write_room()' test,
which hopefully means that the next time somebody screws up buffering,
it won't cause weeks of debugging.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New variant of IGDNG mobile chip has new host bridge id.
[anholt: Note that this new PCI ID doesn't impact the DRM, which doesn't
care about the PCI ID of the bridge]
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Trivial patch which adds the __init/__exit macros to the module_init/
module_exit functions of char/hvc_vio.c
Please have a look at the small patch and either pull it through
your tree, or please ack' it so Jiri can pull it through the trivial tree.
linux version 2.6.31-rc6 - linus git tree, Do 20. Aug 22:26:06 CEST 2009
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When I rewrote tty ldisc code to use proper reference counts (commits
65b770468e and cbe9352fa0) in order to avoid a race with hangup, the
test-program that Eric Biederman used to trigger the original problem
seems to have exposed another long-standing bug: the hangup code did the
'tty_ldisc_halt()' to stop any buffer flushing activity, but unlike the
other call sites it never actually flushed any pending work.
As a result, if you get just the right timing, the pending work may be
just about to execute (ie the timer has already triggered and thus
cancel_delayed_work() was a no-op), when we then re-initialize the ldisc
from under it.
That, in turn, results in various random problems, usually seen as a
NULL pointer dereference in run_timer_softirq() or a BUG() in
worker_thread (but it can be almost anything).
Fix it by adding the required 'flush_scheduled_work()' after doing the
tty_ldisc_halt() (this also requires us to move the ldisc halt to before
taking the ldisc mutex in order to avoid a deadlock with the workqueue
executing do_tty_hangup, which requires the mutex).
The locking should be cleaned up one day (the requirement to do this
outside the ldisc_mutex is very annoying, and weakens the lock), but
that's a larger and separate undertaking.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Xiaotian Feng <xtfeng@gmail.com>
Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Check whether index is within bounds prior to calculating a
possibly-invalid address.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Bernd Petrovitsch <bernd@firmix.at>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Map the GART table uncached, so we don't always need to flush the CPU caches
explicitly after updates.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using the radeon KMS test functionality, I verified that the AGP bridge of the
Intrepid2 chipset in my PowerBook supports aperture sizes up to 256M. So allow
aperture sizes up to 256M on pre-U3 bridges as well, and bump the default size
to 256M. It's possible that older revisions only support smaller sizes, but
it'll be easy to verify that with the raden KMS test functionality. Also,
there's only a problem on an actual attempt to access the aperture beyond the
maximum size supported by the hardware, and non-KMS X still defaults to using
only 32M.
Also use ARRAY_SIZE for the aperture size arrays.
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The result of container_of should not be NULL. In particular, in this case
the argument to the enclosing function has passed though INIT_WORK, which
dereferences it, implying that its container cannot be NULL.
A simplified version of the semantic patch that makes this change is as
follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
identifier fn,work,x,fld;
type T;
expression E1,E2;
statement S;
@@
static fn(struct work_struct *work) {
... when != work = E1
x = container_of(work,T,fld)
... when != x = E2
- if (x == NULL) S
...
}
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit d945cb9cc ("pty: Rework the pty layer to use the normal buffering
logic") dropped the test for 'tty->stopped' in pty_write_room(), which
then causes the n_tty line discipline thing to not throttle the data
properly when the tty is stopped.
So instead of pausing the write due to the tty being stopped, the ldisc
layer would go ahead and push it down to the pty. The pty write()
routine would then refuse to take the data (because it _did_ check
'stopped'), and the data wouldn't actually be written.
This whole stopped test should eventually be moved into the tty ldisc
layer rather than have low-level tty drivers care about these things,
but right now the fix is to just re-instate the missing pty 'stopped'
handling.
Reported-and-tested-by: Artur Skawina <art.08.09@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If DMAR is configured in but absent, we really do want to make sure that
the dma mask is set appropriately. Otherwise we get mapping failures on
highmem. Spotted by Zhenyu Wang.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
tty-ldisc: be more careful in 'put_ldisc' locking
tty-ldisc: turn ldisc user count into a proper refcount
tty-ldisc: make refcount be atomic_t 'users' count
Use 'atomic_dec_and_lock()' to make sure that we always hold the
tty_ldisc_lock when the ldisc count goes to zero. That way we can never
race against 'tty_ldisc_try()' increasing the count again.
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
By using the user count for the actual lifetime rules, we can get rid of
the silly "wait_for_idle" logic, because any busy ldisc will
automatically stay around until the last user releases it. This avoids
a host of odd issues, and simplifies the code.
So now, when the last ldisc reference is dropped, we just release the
ldisc operations struct reference, and free the ldisc.
It looks obvious enough, and it does work for me, but the counting
_could_ be off. It probably isn't (bad counting in the new version would
generally imply that the old code did something really bad, like free an
ldisc with a non-zero count), but it does need some testing, and
preferably somebody looking at it.
With this change, both 'tty_ldisc_put()' and 'tty_ldisc_deref()' are
just aliases for the new ref-counting 'put_ldisc()'. Both of them
decrement the ldisc user count and free it if it goes down to zero.
They're identical functions, in other words.
But the reason they still exist as sepate functions is that one of them
was exported (tty_ldisc_deref) and had a stupid name (so I don't want to
use it as the main name), and the other one was used in multiple places
(and I didn't want to make the patch larger just to rename the users).
In addition to the refcounting, I did do some minimal cleanup. For
example, now "tty_ldisc_try()" actually returns the ldisc it got under
the lock, rather than returning true/false and then the caller would
look up the ldisc again (now without the protection of the lock).
That said, there's tons of dubious use of 'tty->ldisc' without obviously
proper locking or refcounting left. I expressly did _not_ want to try to
fix it all, keeping the patch minimal. There may or may not be bugs in
that kind of code, but they wouldn't be _new_ bugs.
That said, even if the bugs aren't new, the timing and lifetime will
change. For example, some silly code may depend on the 'tty->ldisc'
pointer not changing because they hold a refcount on the 'ldisc'. And
that's no longer true - if you hold a ref on the ldisc, the 'ldisc'
itself is safe, but tty->ldisc may change.
So the proper locking (remains) to hold tty->ldisc_mutex if you expect
tty->ldisc to be stable. That's not really a _new_ rule, but it's an
example of something that the old code might have unintentionally
depended on and hidden bugs.
Whatever. The patch _looks_ sensible to me. The only users of
ldisc->users are:
- get_ldisc() - atomically increment the count
- put_ldisc() - atomically decrements the count and releases if zero
- tty_ldisc_try_get() - creates the ldisc, and sets the count to 1.
The ldisc should then either be released, or be attached to a tty.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is pure preparation of changing the ldisc reference counting to be
a true refcount that defines the lifetime of the ldisc. But this is a
purely syntactic change for now to make the next steps easier.
This patch should make no semantic changes at all. But I wanted to make
the ldisc refcount be an atomic (I will be touching it without locks
soon enough), and I wanted to rename it so that there isn't quite as
much confusion between 'ldo->refcount' (ldisk operations refcount) and
'ld->refcount' (ldisc refcount itself) in the same file.
So it's now an atomic 'ld->users' count. It still starts at zero,
despite having a reference from 'tty->ldisc', but that will change once
we turn it into a _real_ refcount.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When graphics dma remapping engine is active, we must fill
gart table with dma address from dmar engine, as now graphics
device access to graphics memory must go through dma remapping
table to get real physical address.
Add this support to all drivers which use intel_i915_insert_entries()
Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
New driver hooks for support graphics memory dma remapping
are introduced in this patch. It makes generic code can
tell if current device needs dma remapping, then call driver
provided interfaces for mapping and unmapping. Change has
also been made to handle scratch_page in remapping case.
Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In commit 07613ba2 ("agp: switch AGP to use page array instead of
unsigned long array") we switched the mask_memory() method to take a
'struct page *' instead of an address. This is painful, because in some
cases it has to be an IOMMU-mapped virtual bus address (in fact,
shouldn't it _always_ be a dma_addr_t returned from pci_map_xxx(), and
we just happen to get lucky most of the time?)
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As Andrew noted, my previous patch ("debug lockups: Improve lockup
detection") broke/removed SysRq-L support from architecture that do
not provide a __trigger_all_cpu_backtrace implementation.
Restore a fallback path and clean up the SysRq-L machinery a bit:
- Rename the arch method to arch_trigger_all_cpu_backtrace()
- Simplify the define
- Document the method a bit - in the hope of more architectures
adding support for it.
[ The patch touches Sparc code for the rename. ]
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
LKML-Reference: <20090802140809.7ec4bb6b.akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix those compiler warnings, which indeed point to a bug:
drivers/char/agp/parisc-agp.c:228: warning: initialization from incompatible pointer type
drivers/char/agp/parisc-agp.c:201: warning: 'parisc_agp_page_mask_memory' defined but not used
Signed-off-by: Helge Deller <deller@gmx.de>
When debugging a recent lockup bug i found various deficiencies
in how our current lockup detection helpers work:
- SysRq-L is not very efficient as it uses a workqueue, hence
it cannot punch through hard lockups and cannot see through
most soft lockups either.
- The SysRq-L code depends on the NMI watchdog - which is off
by default.
- We dont print backtraces from the RCU code's built-in
'RCU state machine is stuck' debug code. This debug
code tends to be one of the first (and only) mechanisms
that show that a lockup has occured.
This patch changes the code so taht we:
- Trigger the NMI backtrace code from SysRq-L instead of using
a workqueue (which cannot punch through hard lockups)
- Trigger print-all-CPU-backtraces from the RCU lockup detection
code
Also decouple the backtrace printing code from the NMI watchdog:
- Dont use variable size cpumasks (it might not be initialized
and they are a bit more fragile anyway)
- Trigger an NMI immediately via an IPI, instead of waiting
for the NMI tick to occur. This is a lot faster and can
produce more relevant backtraces. It will also work if the
NMI watchdog is disabled.
- Dont print the 'dazed and confused' message when we print
a backtrace from the NMI
- Do a show_regs() plus a dump_stack() to get maximum info
out of the dump. Worst-case we get two stacktraces - which
is not a big deal. Sometimes, if register content is
corrupted, the precise stack walker in show_regs() wont
give us a full backtrace - in this case dump_stack() will
do it.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit d6580a9f15 ("kexec: sysrq: simplify
sysrq-c handler") changed the behavior of sysrq-c to unconditional
dereference of NULL pointer. So in cases with CONFIG_KEXEC, where
crash_kexec() was directly called from sysrq-c before, now it can be said
that a step of "real oops" was inserted before starting kdump.
However, in contrast to oops via SysRq-c from keyboard which results in
panic due to in_interrupt(), oops via "echo c > /proc/sysrq-trigger" will
not become panic unless panic_on_oops=1. It means that even if dump is
properly configured to be taken on panic, the sysrq-c from proc interface
might not start crashdump while the sysrq-c from keyboard can start
crashdump. This confuses traditional users of kdump, i.e. people who
expect sysrq-c to do common behavior in both of the keyboard and proc
interface.
This patch brings the keyboard and proc interface behavior of sysrq-c in
line, by forcing panic_on_oops=1 before oops in sysrq-c handler.
And some updates in documentation are included, to clarify that there is
no longer dependency with CONFIG_KEXEC, and that now the system can just
crash by sysrq-c if no dump mechanism is configured.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Brayan Arraes <brayan@yack.com.br>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>