Commit Graph

119572 Commits (37787e449b39202afae13b6c47505ae9e977cae3)

Author SHA1 Message Date
Al Viro df6b07949b xen_play_dead() is __cpuinit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:38 -08:00
Al Viro 37af46efa5 xen_setup_vcpu_info_placement() is not init on x86
... so get xen-ops.h in agreement with xen/smp.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 23a14b9e9d kvm_setup_secondary_clock() is cpuinit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 2236d252e0 enable_IR_x2apic() needs to be __init
calls __init, called only from __init

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro ad04d31e5f pci_setup() is init, not devinit
for fsck sake, it's used only when parsing kernel command line...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 4bcc17dd8e alpha: pcibios_resource_to_bus() is callable from normal code
pci_enable_rom(), specifically.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 56d74dd5f7 tricky one: hisax sections
a) hisax_init_pcmcia() needs to be defined only if we have
   CONFIG_HOTPLUG (no PCMCIA support otherwise) and can be declared
   __devinit.

b) HiSax_inithardware() can go __init

c) hisax_register() is passing to checkcard() full-blown hisax_cs_setup_card():
	checkcard(i, id, NULL, hisax_d_if->owner, hisax_cs_setup_card);
   The problem with it is that
	* hisax_cs_setup_card() is __devinit
	* hisax_register() is not
	* hisax_cs_setup_card() is a switch from hell, calling a lot of
	  setup_some_weirdcard() depending on card->typ.  _These_ are also
	  __devinit.

   However, in hisax_register() we have card->typ equal to
   ISDN_CTYPE_DYNAMIC, which reduces hisax_cs_setup_card() to "nevermind
   all that crap, just do nothing and return 2".  So we add a
   trimmed-down callback doing just that and passed to checkcard() by
   hisax_register().  _This_ is non-init (we can stand the impact on
   .text size).

Voila - no section warnings from drivers/isdn

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 8419641450 cpuinit fixes in kernel/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro b0385146bc uninorth-agp section mess
'aperture' is declared devinitdata (the whole word of it) and
is used from ->fetch_size() which can, AFAICS, be used on
!HOTPLUG after init time.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 37d33d1514 rapidio section noise
functions calling devinit and called only from devinit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro f57628d76b section errors in smc911x/smc91x
a) ->probe() can be __devinit; no need to put it into .text
 b) calling __init stuff from it, OTOH, is wrong
 c) ->remove() is __devexit fodder

Acked-by: rmk+kernel@arm.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 5bac287ea5 fix the section noise in sparc head.S
usual .text.head trick

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 1c4567aeed m32r: section noise in head.S
usual "introduce .text.head, put it in front of TEXT_TEXT in vmlinux.lds.S,
make the stuff up to jump to start_kernel live in it", same as on other
targets.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 8814b5050d section misannotation in ibmtr_cs
ibmtr_resume() is calling ibmtr_probe(), which is devinit.  Whether
that's the right thing to do there is a separate question, but
since it's PCMCIA and thus will never compile without HOTPLUG...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 43ced651d1 ixgbe section fixes
ixgbe_init_interrupt_scheme() is called from ixgbe_resume().  Build that
with CONFIG_PM and without CONFIG_HOTPLUG and you've got a problem.
Several helpers called by it also are misannotated __devinit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 31421a6f6e rackmeter section fixes
* rackmeter_remove() reference needs devexit_p
 * rackmeter_setup() is calls devinit and is called only from devinit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro ced7172ad9 gdth section fixes
PCI side of driver should be devinit, not init

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro e669dae614 of_platform_driver noise on sparce
switch to __init for those; unlike powerpc sparc has no hotplug support
for that stuff and their ->probe() tends to call __init functions while
being declared __devinit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro 30037818f7 advansys fix on ISA-less configs
The code

        if (shost->dma_channel != NO_ISA_DMA)
                free_dma(shost->dma_channel);

in there is triggerable only if we have CONFIG_ISA (we only set ->dma_channel to
something other than NO_ISA_DMA under #ifdef CONFIG_ISA).  OTOH, free_dma() is
not guaranteed to be there in absense of CONFIG_ISA.  IOW, driver runs into
undefined symbols on PCI-but-not-ISA configs (e.g. on frv) and it's a false
positive.

Fix: put the entire if () under #ifdef CONFIG_ISA; behaviour doesn't change and
dependency on free_dma() disappears for !CONFIG_ISA.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro 2fceab0bd8 W1_MASTER_DS1WM should depend on HAVE_CLK
Uses clk_...() a lot

Acked-by: rmk+kernel@arm.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro d16d7667f9 icside section warnings
icside_register_v[56] is called from (__devinit) icside_probe

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro 596f103419 fix talitos
talitos_remove() can be called from talitos_probe() on failure
exit path, so it can't be __devexit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro 6005e3eb89 istallion section warnings
stli_findeisabrds() and stli_initbrds() are using __init and called only
from __init.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:36 -08:00
Al Viro 8c29890aef sparc64 trivial section misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Al Viro 409832f548 sparc32 cpuinit flase positives
All noise since we don't have CPU hotplug there.  However, they
did expose something very odd-looking in there - poke_viking()
does a bunch of identical btfixup each time it's called (i.e.
for each CPU).  That one is left alone for now; just the trivial
misannotation fixes.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Al Viro 4ea8fb9c1c powerpc set_huge_psize() false positive
called only from __init, calls __init.  Incidentally, it ought to be static
in file.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Al Viro 7d6a8a1c48 false __cpuinit positives on alpha
pure noise - alpha doesn't have CPU hotplug

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Al Viro 31168481c3 meminit section warnings
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Ingo Molnar af6d596fd6 sched: prevent divide by zero error in cpu_avg_load_per_task, update
Regarding the bug addressed in:

  4cd4262: sched: prevent divide by zero error in cpu_avg_load_per_task

Linus points out that the fix is not complete:

> There's nothing that keeps gcc from deciding not to reload
> rq->nr_running.
>
> Of course, in _practice_, I don't think gcc ever will (if it decides
> that it will spill, gcc is likely going to decide that it will
> literally spill the local variable to the stack rather than decide to
> reload off the pointer), but it's a valid compiler optimization, and
> it even has a name (rematerialization).
>
> So I suspect that your patch does fix the bug, but it still leaves the
> fairly unlikely _potential_ for it to re-appear at some point.
>
> We have ACCESS_ONCE() as a macro to guarantee that the compiler
> doesn't rematerialize a pointer access. That also would clarify
> the fact that we access something unsafe outside a lock.

So make sure our nr_running value is immutable and cannot change
after we check it for nonzero.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 20:45:15 +01:00
Ingo Molnar 1583715ddb sched, cpusets: fix warning in kernel/cpuset.c
this warning:

  kernel/cpuset.c: In function ‘generate_sched_domains’:
  kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function

triggers because GCC does not recognize that ndoms stays uninitialized
only if doms is NULL - but that flow is covered at the end of
generate_sched_domains().

Help out GCC by initializing this variable to 0. (that's prudent anyway)

Also, this function needs a splitup and code flow simplification:
with 160 lines length it's clearly too long.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 20:39:29 +01:00
Stefan Richter 2642b11295 ieee1394: sbp2: fix race condition in state change
An intermediate transition from _RUNNING to _IN_SHUTDOWN could have been
missed by the former code.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-11-29 17:07:56 +01:00
Stefan Richter e47c1feb17 ieee1394: fix list corruption (reported at module removal)
If there is more than one FireWire controller present, dummy_zero_addr
and dummy_max_addr were added multiple times to different lists, thus
corrupting the lists.  Fix this by allocating them dynamically per host
instead of just once globally.

(Perhaps a better address space allocation algorithm could rid us of the
two dummy address spaces.)

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10129 .

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2008-11-29 17:07:56 +01:00
Jack Morgenstein 9a5aa622dd mlx4_core: Save/restore default port IB capability mask
Commit 7ff93f8b ("mlx4_core: Multiple port type support") introduced
support for different port types.  As part of that support, SET_PORT
is invoked to set the port type during driver startup.  However, as a
side-effect, for IB ports the invocation of this command also sets the
port's capability mask to zero (losing the default value set by FW).

To fix this, get the default ib port capabilities (via a MAD_IFC Port
Info query) during driver startup, and save them for use in the
mlx4_SET_PORT command when setting the port-type to Infiniband.

This patch fixes problems with subnet manager (SM) failover such as
<https://bugs.openfabrics.org/show_bug.cgi?id=1183>, which occurred
because the IsTrapSupported bit in the capability mask was zeroed.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-28 21:29:46 -08:00
Arjan van de Ven 23d0a65cf2 toshiba_acpi: close race in toshiba_acpi driver
the toshiba ACPI driver will, in a failure case, free the rfkill state
before stopping the polling timer that would use this state. More interesting,
in the same failure case handling, it calls the exit function, which also
frees the rfkill state, but after stopping the polling.

If the race happens, a NULL pointer is passed to rfkill_force_state()
which then causes a nice dereference.

Fix the race by just not doing the too-early freeing of the rfkill state.

This appears to be the cause of a hot issue on kerneloops.org; while I
have no solid evidence of that this patch will fix the issue, the race
appears rather real.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-28 14:21:53 -05:00
Jean Delvare 7b964f7337 i2c-parport: Fix misplaced parport_release call
We shouldn't release the parallel port until we are actually done with
it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-11-28 15:24:39 +01:00
Jean Delvare 79b93e1359 i2c: Remove i2c clients in reverse order
i2c clients should be removed in reverse order compared to the probe
(actually: bind) order. This matters when several clients depend on
each other.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
2008-11-28 15:24:38 +01:00
David Brownell d1846b0e7a i2c/isp1301_omap: Build fixes
Build fixes for isp1301_omap; no behavior changes:

  - fix incorrect probe() signature (it changed many months ago)
  - provide missing functions on H3 and H4 boards
  - "sparse" fixes (static, NULL-vs-0)

The H3 build bits subset some of the stuff that was previously in
the OMAP tree but never went to mainline.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-11-28 15:24:38 +01:00
Jan Scholz ee8a1a0a1a HID: Apple ALU wireless keyboards are bluetooth devices
While parsing 'hid_blacklist' in the apple alu wireless keyboard is not found.
This happens because in the blacklist it is declared with HID_USB_DEVICE
although the keyboards are really bluetooth devices.  The same holds for
'apple_devices' list.

This patch fixes it by changing HID_USB_DEVICE to HID_BLUETOOTH_DEVICE in those
two lists.

Signed-off-by: Jan Scholz <Scholz@fias.uni-frankfurt.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-11-28 15:09:26 +01:00
Russell King af38d90d6a Merge branch 'for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 2008-11-27 23:50:31 +00:00
Russell King 487ff32082 Allow architectures to override copy_user_highpage()
With aliasing VIPT cache support, the ARM implementation of
clear_user_page() and copy_user_page() sets up a temporary kernel space
mapping such that we have the same cache colour as the userspace page.
This avoids having to consider any userspace aliases from this operation.

However, when highmem is enabled, kmap_atomic() have to setup mappings.
The copy_user_highpage() and clear_user_highpage() call these functions
before delegating the copies to copy_user_page() and clear_user_page().

The effect of this is that each of the *_user_highpage() functions setup
their own kmap mapping, followed by the *_user_page() functions setting
up another mapping.  This is rather wasteful.

Thankfully, copy_user_highpage() can be overriden by architectures by
defining __HAVE_ARCH_COPY_USER_HIGHPAGE.  However, replacement of
clear_user_highpage() is more difficult because its inline definition
is not conditional.  It seems that you're expected to define
__HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE and provide a replacement
__alloc_zeroed_user_highpage() implementation instead.

The allocation itself is fine, so we don't want to override that.  What
we really want to do is to override clear_user_highpage() with our own
version which doesn't kmap_atomic() unnecessarily.

Other VIPT architectures (PARISC and SH) would also like to override
this function as well.

Acked-by: Hugh Dickins <hugh@veritas.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-11-27 23:39:48 +00:00
Jan Kara 52b19ac993 udf: Fix BUG_ON() in destroy_inode()
udf_clear_inode() can leave behind buffers on mapping's i_private list (when
we truncated preallocation). Call invalidate_inode_buffers() so that the list
is properly cleaned-up before we return from udf_clear_inode(). This is ugly
and suggest that we should cleanup preallocation earlier than in clear_inode()
but currently there's no such call available since drop_inode() is called under
inode lock and thus is unusable for disk operations.

Signed-off-by: Jan Kara <jack@suse.cz>
2008-11-27 17:38:28 +01:00
Marek Vasut a730b327ca [ARM] pxa/palmtx: misc fixes to use generic GPIO API
Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Eric Miao <eric.miao@marvell.com>
2008-11-27 22:47:26 +08:00
Joerg Roedel b627c8b17c x86: always define DECLARE_PCI_UNMAP* macros
Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off

Currently these macros evaluate to a no-op except the kernel is compiled
with GART or Calgary support. But we also need these macros when we have
SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at
least with SWIOTLB we can define these macros always.

This patch is also for stable backport for the same reason the SWIOTLB
default selection patch is.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 12:44:08 +01:00
Russell King 6417a917b5 Merge branch 'omap-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 2008-11-27 11:13:10 +00:00
Martin Schwidefsky abd942194d [S390] Update default configuration.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-11-27 11:06:58 +01:00
Heiko Carstens 0778dc3a62 [S390] Fix alignment of initial kernel stack.
We need an alignment of 16384 bytes for the initial kernel stack if
the kernel is configured for 16384 bytes stacks but the linker script
currently guarantees only an alignment of 8192 bytes.

So fix this and simply use THREAD_SIZE as alignment value which will
always do the right thing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-11-27 11:06:58 +01:00
Christian Borntraeger 2944a5c971 [S390] pgtable.h: Fix oops in unmap_vmas for KVM processes
When running several kvm processes with lots of memory overcommitment,
we have seen an oops during process shutdown:
------------[ cut here ]------------
Kernel BUG at 0000000000193434 [verbose debug info unavailable]
addressing exception: 0005 [#1] PREEMPT SMP
Modules linked in: kvm sunrpc qeth_l2 dm_mod qeth ccwgroup
CPU: 10 Not tainted 2.6.28-rc4-kvm-bigiron-00521-g0ccca08-dirty #8
Process kuli (pid: 14460, task: 0000000149822338, ksp: 0000000024f57650)
Krnl PSW : 0704e00180000000 0000000000193434 (unmap_vmas+0x884/0xf10)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000002 0000000000000000 000000051008d000 000003e05e6034e0
           00000000001933f6 00000000000001e9 0000000407259e0a 00000002be88c400
           00000200001c1000 0000000407259608 0000000407259e08 0000000024f577f0
           0000000407259e09 0000000000445fa8 00000000001933f6 0000000024f577f0
Krnl Code: 0000000000193426: eb22000c000d sllg %r2,%r2,12
           000000000019342c: a7180000 lhi %r1,0
           0000000000193430: b2290012 iske %r1,%r2
          >0000000000193434: a7110002 tmll %r1,2
           0000000000193438: a7840006 brc 8,193444
           000000000019343c: 9602c000 oi 0(%r12),2
           0000000000193440: 96806000 oi 0(%r6),128
           0000000000193444: a7110004 tmll %r1,4
Call Trace:
([<00000000001933f6>] unmap_vmas+0x846/0xf10)
[<0000000000199680>] exit_mmap+0x210/0x458
[<000000000012a8f8>] mmput+0x54/0xfc
[<000000000012f714>] exit_mm+0x134/0x144
[<000000000013120c>] do_exit+0x240/0x878
[<00000000001318dc>] do_group_exit+0x98/0xc8
[<000000000013e6b0>] get_signal_to_deliver+0x30c/0x358
[<000000000010bee0>] do_signal+0xec/0x860
[<0000000000112e30>] sysc_sigpending+0xe/0x22
[<000002000013198a>] 0x2000013198a
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000000001a68d0>] free_swap_and_cache+0x1a0/0x1a4
<4>---[ end trace bc19f1d51ac9db7c ]---

The faulting instruction is the storage key operation (iske) in
ptep_rcp_copy (called by pte_clear, called by unmap_vmas). iske
reads dirty and reference bit information for a physical page and
requires a valid physical address. Since we are in pte_clear, we
cannot rely on the pte containing a valid address. Fortunately we
dont need these information in pte_clear - after all there is no
mapping. The best fix is to remove the needless call to ptep_rcp_copy
that contains the iske.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-11-27 11:06:57 +01:00
Christian Borntraeger 8107d8296b [S390] fix/cleanup sched_clock
CONFIG_PRINTK_TIME reveals that sched_clock has a wrong offset during boot:
..
[    0.000000]   Movable zone: 0 pages used for memmap
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 775679
[    0.000000] Kernel command line: dasd=4b6c root=/dev/dasda1 ro noinitrd
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[6920575.975232] console [ttyS0] enabled
[6920575.987586] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[6920575.991404] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
..

The s390 implementation of sched_clock uses the store clock instruction and
subtracts jiffies_timer_cc.
jiffies_timer_cc is a local variable in arch/s390/kernel/time.c and only used
for sched_clock and monotonic clock. For historical reasons there is an offset
on that value. With todays code this offset is unnecessary. By removing that
offset we can get a sched_clock which returns the nanoseconds after time_init.
This improves CONFIG_PRINTK_TIME.

Since sched_clock is the only user, I have also renamed jiffies_timer_cc to
sched_clock_base_cc. In addition, the local variable init_timer_cc is redundant
and can be romved as well.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-11-27 11:06:57 +01:00
Martin Schwidefsky 59da21398e [S390] fix system call parameter functions.
syscall_get_nr() currently returns a valid result only if the call
chain of the traced process includes do_syscall_trace_enter(). But
collect_syscall() can be called for any sleeping task, the result of
syscall_get_nr() in general is completely bogus.

To make syscall_get_nr() work for any sleeping task the traps field
in pt_regs is replace with svcnr - the system call number the process
is executing. If svcnr == 0 the process is not on a system call path.

The syscall_get_arguments and syscall_set_arguments use regs->gprs[2]
for the first system call parameter. This is incorrect since gprs[2]
may have been overwritten with the system call number if the call
chain includes do_syscall_trace_enter. Use regs->orig_gprs2 instead.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-11-27 11:06:56 +01:00
Steven Rostedt 4cd4262034 sched: prevent divide by zero error in cpu_avg_load_per_task
Impact: fix divide by zero crash in scheduler rebalance irq

While testing the branch profiler, I hit this crash:

divide error: 0000 [#1] PREEMPT SMP
[...]
RIP: 0010:[<ffffffff8024a008>]  [<ffffffff8024a008>] cpu_avg_load_per_task+0x50/0x7f
[...]
Call Trace:
 <IRQ> <0> [<ffffffff8024fd43>] find_busiest_group+0x3e5/0xcaa
 [<ffffffff8025da75>] rebalance_domains+0x2da/0xa21
 [<ffffffff80478769>] ? find_next_bit+0x1b2/0x1e6
 [<ffffffff8025e2ce>] run_rebalance_domains+0x112/0x19f
 [<ffffffff8026d7c2>] __do_softirq+0xa8/0x232
 [<ffffffff8020ea7c>] call_softirq+0x1c/0x3e
 [<ffffffff8021047a>] do_softirq+0x94/0x1cd
 [<ffffffff8026d5eb>] irq_exit+0x6b/0x10e
 [<ffffffff8022e6ec>] smp_apic_timer_interrupt+0xd3/0xff
 [<ffffffff8020e4b3>] apic_timer_interrupt+0x13/0x20

The code for cpu_avg_load_per_task has:

	if (rq->nr_running)
		rq->avg_load_per_task = rq->load.weight / rq->nr_running;

The runqueue lock is not held here, and there is nothing that prevents
the rq->nr_running from going to zero after it passes the if condition.

The branch profiler simply made the race window bigger.

This patch saves off the rq->nr_running to a local variable and uses that
for both the condition and the division.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 10:29:52 +01:00