Commit graph

7446 commits

Author SHA1 Message Date
Steven Rostedt
eb4a03780d function-graph: disable when both x86_32 and optimize for size are configured
On x86_32, when optimize for size is set, gcc may align the frame pointer
and make a copy of the the return address inside the stack frame.
The return address that is located in the stack frame may not be
the one used to return to the calling function. This will break the
function graph tracer.

The function graph tracer replaces the return address with a jump to a hook
function that can trace the exit of the function. If it only replaces
a copy, then the hook will not be called when the function returns.
Worse yet, when the parent function returns, the function graph tracer
will return back to the location of the child function which will
easily crash the kernel with weird results.

To see the problem, when i386 is compiled with -Os we get:

c106be03:       57                      push   %edi
c106be04:       8d 7c 24 08             lea    0x8(%esp),%edi
c106be08:       83 e4 e0                and    $0xffffffe0,%esp
c106be0b:       ff 77 fc                pushl  0xfffffffc(%edi)
c106be0e:       55                      push   %ebp
c106be0f:       89 e5                   mov    %esp,%ebp
c106be11:       57                      push   %edi
c106be12:       56                      push   %esi
c106be13:       53                      push   %ebx
c106be14:       81 ec 8c 00 00 00       sub    $0x8c,%esp
c106be1a:       e8 f5 57 fb ff          call   c1021614 <mcount>

When it is compiled with -O2 instead we get:

c10896f0:       55                      push   %ebp
c10896f1:       89 e5                   mov    %esp,%ebp
c10896f3:       83 ec 28                sub    $0x28,%esp
c10896f6:       89 5d f4                mov    %ebx,0xfffffff4(%ebp)
c10896f9:       89 75 f8                mov    %esi,0xfffffff8(%ebp)
c10896fc:       89 7d fc                mov    %edi,0xfffffffc(%ebp)
c10896ff:       e8 d0 08 fa ff          call   c1029fd4 <mcount>

The compile with -Os will align the stack pointer then set up the
frame pointer (%ebp), and it copies the return address back into
the stack frame. The change to the return address in mcount is done
to the copy and not the real place holder of the return address.

Then compile with -O2 sets up the frame pointer first, this makes
the change to the return address by mcount affect where the function
will jump on exit.

Reported-by: Jake Edge <jake@lwn.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18 18:39:30 -04:00
Steven Rostedt
4b221f0313 ring-buffer: have benchmark test print to trace buffer
Currently the output of the ring buffer benchmark/test prints to
the console. This test runs for ten seconds every ten seconds and
ouputs the result after every iteration. This needlessly fills up
the logs.

This patch makes the ring buffer benchmark/test print to the ftrace
buffer using trace_printk. To view the test results, you must examine
the debug/tracing/trace file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17 17:01:09 -04:00
Steven Rostedt
8d707e8eb4 ring-buffer: do not grab locks in nmi
If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it
will need to read from the ring buffer. But the read side of the
ring buffer still takes locks. This patch adds a check on the read
side that if it is in an NMI, then it will disable the ring buffer
and not take any locks.

Reads can still happen on a disabled ring buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17 14:16:27 -04:00
Steven Rostedt
d47882078f ring-buffer: add locks around rb_per_cpu_empty
The checking of whether the buffer is empty or not needs to be serialized
among the readers. Add the reader spin lock around it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17 14:16:23 -04:00
Steven Rostedt
5f78abeebb ring-buffer: check for less than two in size allocation
The ring buffer must have at least two pages allocated for the
reader page swap to work.

The page count check will miss the case of a zero size passed in.
Even though a zero size ring buffer would probably fail an allocation,
making the min size check for less than two instead of equal to one makes
the code a bit more robust.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17 14:16:20 -04:00
Steven Rostedt
0dcd4d6c3e ring-buffer: remove useless compile check for buffer_page size
The original version of the ring buffer had a hack to map the
page struct that held the pages of the buffer to also be the structure
that the ring buffer would keep the pages in a link list.

This overlap of the page struct was very dangerous and that hack was
removed a while ago.

But there was a check to make sure the buffer_page never became bigger
than the page struct, and would fail the compile if it did. The
check was only meaningful when we had the hack. Now that we have separate
allocated descriptors for the buffer pages, we can remove this check.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17 14:16:07 -04:00
Steven Rostedt
c6a9d7b55e ring-buffer: remove useless warn on check
A check if "write > BUF_PAGE_SIZE" is done right after a

	if (write > BUF_PAGE_SIZE)
		return ...;

Thus the check is actually testing the compiler and not the
kernel. This is useless, remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 21:19:26 -04:00
Steven Rostedt
22f470f8da ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
The index of the event is found by masking PAGE_MASK to it and
subtracting the header size. Currently the header size is calculate
by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro
BUF_PAGE_HDR_SIZE to define it.

If we want to change BUF_PAGE_SIZE to something less than filling
the rest of the page (this is done for debugging), then we break
the algorithm to find the index.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 21:19:23 -04:00
Li Zefan
00e95830a4 tracing/filters: fix race between filter setting and module unload
Module unload is protected by event_mutex, while setting filter is
protected by filter_mutex. This leads to the race:

echo 'bar == 0 || bar == 10' \    |
		> sample/filter   |
                                  |  insmod sample.ko
  add_pred("bar == 0")            |
    -> n_preds == 1               |
  add_pred("bar == 100")          |
    -> n_preds == 2               |
                                  |  rmmod sample.ko
                                  |  insmod sample.ko
  add_pred("&&")                  |
    -> n_preds == 1 (should be 3) |

Now event->filter->preds is corrupted. An then when filter_match_preds()
is called, the WARN_ON() in it will be triggered.

To avoid the race, we remove filter_mutex, and replace it with event_mutex.

[ Impact: prevent corruption of filters by module removing and loading ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 16:25:37 -04:00
Li Zefan
57be88878e tracing/filters: free filter_string in destroy_preds()
filter->filter_string is not freed when unloading a module:

 # insmod trace-events-sample.ko
 # echo "bar < 100" > /mnt/tracing/events/sample/foo_bar/filter
 # rmmod trace-events-sample.ko

[ Impact: fix memory leak when unloading module ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A375A30.9060802@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 16:25:35 -04:00
Steven Rostedt
fa7439531d ring-buffer: use commit counters for commit pointer accounting
The ring buffer is made up of three sets of pointers.

The head page pointer, which points to the next page for the reader to
get.

The commit pointer and commit index, which points to the page and index
of the last committed write respectively.

The tail pointer and tail index, which points to the page and the index
of the last reserved data respectively (non committed).

The commit pointer is only moved forward by the outer most writer.
If a nested writer comes in, it will not move the pointer forward.

The current implementation has a flaw. It assumes that the outer most
writer successfully reserved data. There's a small race window where
the outer most writer could find the tail pointer, but a nested
writer could come in (via interrupt) and move the tail forward, and
even the commit forward.

The outer writer would not realized the commit moved forward and the
accounting will break.

This patch changes the design to use counters in the per cpu buffers
to keep track of commits. The counters are incremented at the start
of the commit, and decremented at the end. If the end commit counter
is 1, then it moves the commit pointers. A loop is made to check for
races between checking and moving the commit pointers. Only the outer
commit should move the pointers anyway.

The test of knowing if a reserve is equal to the last commit update
is still needed to know for time keeping. The time code is much less
racey than the commit updates.

This change not only solves the mentioned race, but also makes the
code simpler.

[ Impact: fix commit race and simplify code ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 16:25:33 -04:00
Steven Rostedt
263294f3e1 ring-buffer: remove unused variable
Fix the compiler error:

kernel/trace/ring_buffer.c: In function 'rb_move_tail':
kernel/trace/ring_buffer.c:1236: warning: unused variable 'event'

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 16:24:39 -04:00
Steven Rostedt
9086c7b90a ring-buffer: have benchmark test handle discarded events
With the addition of commit:

  c7b0930857
  ring-buffer: prevent adding write in discarded area

The ring buffer may now add discarded events when a write passes
the end of a buffer page. Before, a discarded event was only added
when the tracer deliberately created one. The ring buffer benchmark
test does not handle discarded events when it reads the buffer and
fails when it encounters one.

Also fix the increment for large data entries (luckily, the test did
not add any yet).

[ Impact: fix false failure of ring buffer self test ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16 13:48:52 -04:00
Steven Rostedt
c7b0930857 ring-buffer: prevent adding write in discarded area
This a very tight race where an interrupt could come in and not
have enough data to put into the end of a buffer page, and that
it would fail to write and need to go to the next page.

But if this happened when another writer was about to reserver
their data, and that writer has smaller data to reserve, then
it could succeed even though the interrupt moved the tail page.

To pervent that, if we fail to store data, and by subtracting the
amount we reserved we still have room for smaller data, we need
to fill that space with "discarded" data.

[ Impact: prevent race were buffer data may be lost ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:19 -04:00
Li Zefan
0ac2058f68 tracing/filters: strloc should be unsigned short
I forgot to update filter code accordingly in
"tracing/events: change the type of __str_loc_item to unsigned short"
(commt b0aae68cc5)

It can cause system crash:

 # echo 1 > tracing/events/irq/irq_handler_entry/enable
 # echo 'name == eth0' > tracing/events/irq/irq_handler_entry/filter

[ Impact: fix crash while filtering on __string() field ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B905.3090500@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:18 -04:00
Li Zefan
5e4904cb63 tracing/filters: operand can be negative
This should be a bug:

 # cat format
 name: foo_bar
 ID: 71
 format:
	 ...
         field:int bar;  offset:24;      size:4;
 # echo 'bar < 0' > filter
 # echo 'bar < -1' > filter
 bash: echo: write error: Invalid argument

[ Impact: fix to allow negative operand in filer expr ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B8DF.60400@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:16 -04:00
Li Zefan
e4f2d10f47 tracing: replace a GFP_ATOMIC with GFP_KERNEL allocation
Atomic allocation is not needed here.

[ Impact: clean up of memory alloction type ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B898.2050607@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:14 -04:00
Li Zefan
215368e8e5 tracing: fix a typo in tracing_cpumask_write()
It's tracing_cpumask_new that should be kfree()ed.

This causes tracing_cpumask to be freed due to the typo:

 # echo z > tracing_cpumask
 bash: echo: write error: Invalid argument

And subsequent reads/writes to tracing_cpuamsk will access this
already-freed tracing_cpumask, thus may lead to crash.

[ Impact: fix leak and crash when writing invalid val to tracing_cpumask ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:37:12 -04:00
Rusty Russell
3f237a79dd cpumask: use new operators in kernel/trace
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <200906122115.30787.rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15 11:36:42 -04:00
Linus Torvalds
45e3e1935e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits)
  .gitignore: ignore *.lzma files
  kbuild: add generic --set-str option to scripts/config
  kbuild: simplify argument loop in scripts/config
  kbuild: handle non-existing options in scripts/config
  kallsyms: generalize text region handling
  kallsyms: support kernel symbols in Blackfin on-chip memory
  documentation: make version fix
  kbuild: fix a compile warning
  gitignore: Add GNU GLOBAL files to top .gitignore
  kbuild: fix delay in setlocalversion on readonly source
  README: fix misleading pointer to the defconf directory
  vmlinux.lds.h update
  kernel-doc: cleanup perl script
  Improve vmlinux.lds.h support for arch specific linker scripts
  kbuild: fix headers_exports with boolean expression
  kbuild/headers_check: refine extern check
  kbuild: fix "Argument list too long" error for "make headers_check",
  ignore *.patch files
  Remove bashisms from scripts
  menu: fix embedded menu presentation
  ...
2009-06-14 14:12:18 -07:00
Linus Torvalds
489f7ab6c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits)
  trivial: remove the trivial patch monkey's name from SubmittingPatches
  trivial: Fix a typo in comment of addrconf_dad_start()
  trivial: usb: fix missing space typo in doc
  trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug
  trivial: Remove the hyphen from git commands
  trivial: fix ETIMEOUT -> ETIMEDOUT typos
  trivial: Kconfig: .ko is normally not included in module names
  trivial: SubmittingPatches: fix typo
  trivial: Documentation/dell_rbu.txt: fix typos
  trivial: Fix Pavel's address in MAINTAINERS
  trivial: ftrace:fix description of trace directory
  trivial: unnecessary (void*) cast removal in sound/oss/msnd.c
  trivial: input/misc: Fix typo in Kconfig
  trivial: fix grammo in bus_for_each_dev() kerneldoc
  trivial: rbtree.txt: fix rb_entry() parameters in sample code
  trivial: spelling fix in ppc code comments
  trivial: fix typo in bio_alloc kernel doc
  trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt
  trivial: Miscellaneous documentation typo fixes
  trivial: fix typo milisecond/millisecond for documentation and source comments.
  ...
2009-06-14 13:46:25 -07:00
Linus Torvalds
a2ee2981ae Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
  x86, mce: Add boot options for corrected errors
  x86, mce: Fix mce printing
  x86, mce: fix for mce counters
  x86, mce: support action-optional machine checks
  x86, mce: define MCE_VECTOR
  x86, mce: rename mce_notify_user to mce_notify_irq
  x86: fix panic with interrupts off (needed for MCE)
  x86, mce: export MCE severities coverage via debugfs
  x86, mce: implement new status bits
  x86, mce: print header/footer only once for multiple MCEs
  x86, mce: default to panic timeout for machine checks
  x86, mce: improve mce_get_rip
  x86, mce: make non Monarch panic message "Fatal machine check" too
  x86, mce: switch x86 machine check handler to Monarch election.
  x86, mce: implement panic synchronization
  x86, mce: implement bootstrapping for machine check wakeups
  x86, mce: check early in exception handler if panic is needed
  x86, mce: add table driven machine check grading
  x86, mce: remove TSC print heuristic
  x86, mce: log corrected errors when panicing
  ...
2009-06-13 13:14:51 -07:00
Linus Torvalds
947ec0b0c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add empty suspend/resume device irq functions
  PM/Hibernate: Move NVS routines into a seperate file (v2).
  PM/Hibernate: Rename disk.c to hibernate.c
  PM: Separate suspend to RAM functionality from core
  Driver Core: Rework platform suspend/resume, print warning
  PM: Remove device_type suspend()/resume()
  PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  PM/Suspend: Do not shrink memory before suspend
  PM: Remove bus_type suspend_late()/resume_early() V2
  PM core: rename suspend and resume functions
  PM: Rename device_power_down/up()
  PM: Remove unused asm/suspend.h
  x86: unify power/cpu_(32|64).c
  x86: unify power/cpu_(32|64) copyright notes
  x86: unify power/cpu_(32|64) regarding restoring processor state
  x86: unify power/cpu_(32|64) regarding saving processor state
  x86: unify power/cpu_(32|64) global variables
  x86: unify power/cpu_(32|64) headers
  PM: Warn if interrupts are enabled during suspend-resume of sysdevs
  PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c
2009-06-12 13:17:27 -07:00
Linus Torvalds
4ddbac9898 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
  perf_counter: Add forward/backward attribute ABI compatibility
  perf record: Explicity program a default counter
  perf_counter: Remove PERF_TYPE_RAW special casing
  perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
  powerpc, perf_counter: Fix performance counter event types
  perf_counter/x86: Add a quirk for Atom processors
  perf_counter tools: Remove one L1-data alias
2009-06-12 13:16:52 -07:00
Cornelia Huck
fce2b111fa PM/Hibernate: Move NVS routines into a seperate file (v2).
The *_nvs_* routines in swsusp.c make use of the io*map()
functions, which are only provided for HAS_IOMEM, thus
breaking compilation if HAS_IOMEM is not set. Fix this
by moving the *_nvs_* routines into hibernate_nvs.c, which
is only compiled if HAS_IOMEM is set.

[rjw: Change the name of the new file to hibernate_nvs.c, add the
 license line to the header comment.]

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:33 +02:00
Rafael J. Wysocki
8b759b84c8 PM/Hibernate: Rename disk.c to hibernate.c
Change the name of kernel/power/disk.c to kernel/power/hibernate.c
in analogy with the file names introduced by the changes that
separated the suspend to RAM and standby funtionality from the
common PM functions.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
2009-06-12 21:32:33 +02:00
Rafael J. Wysocki
a9d7052363 PM: Separate suspend to RAM functionality from core
Move the suspend to RAM and standby code from kernel/power/main.c
to two separate files, kernel/power/suspend.c containing the basic
functions and kernel/power/suspend_test.c containing the automatic
suspend test facility based on the RTC clock alarm.

There are no changes in functionality related to these modifications.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
2009-06-12 21:32:33 +02:00
Rafael J. Wysocki
fe419535d8 PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
A future patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that.  For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .

[rev. 2: Make some functions static and remove their headers from
 kernel/power/power.h]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
2009-06-12 21:32:32 +02:00
Rafael J. Wysocki
c6f37f1219 PM/Suspend: Do not shrink memory before suspend
Remove the shrinking of memory from the suspend-to-RAM code, where
it is not really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
2009-06-12 21:32:32 +02:00
Alan Stern
d161630297 PM core: rename suspend and resume functions
This patch (as1241) renames a bunch of functions in the PM core.
Rather than go through a boring list of name changes, suffice it to
say that in the end we have a bunch of pairs of functions:

	device_resume_noirq	dpm_resume_noirq
	device_resume		dpm_resume
	device_complete		dpm_complete
	device_suspend_noirq	dpm_suspend_noirq
	device_suspend		dpm_suspend
	device_prepare		dpm_prepare

in which device_X does the X operation on a single device and dpm_X
invokes device_X for all devices in the dpm_list.

In addition, the old dpm_power_up and device_resume_noirq have been
combined into a single function (dpm_resume_noirq).

Lastly, dpm_suspend_start and dpm_resume_end are the renamed versions
of the former top-level device_suspend and device_resume routines.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:31 +02:00
Magnus Damm
e39a71ef80 PM: Rename device_power_down/up()
Rename the functions performing "_noirq" dev_pm_ops
operations from device_power_down() and device_power_up()
to device_suspend_noirq() and device_resume_noirq().

The new function names are chosen to show that the functions
are responsible for calling the _noirq() versions to finalize
the suspend/resume operation. The current function names do
not perform power down/up anymore so the names may be misleading.

Global function renames:
- device_power_down() -> device_suspend_noirq()
- device_power_up() -> device_resume_noirq()

Static function renames:
- suspend_device_noirq() -> __device_suspend_noirq()
- resume_device_noirq() -> __device_resume_noirq()

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:31 +02:00
Linus Torvalds
6d21491838 Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slab: setup cpu caches later on when interrupts are enabled
  slab,slub: don't enable interrupts during early boot
  slab: fix gfp flag in setup_cpu_cache()
  x86: make zap_low_mapping could be used early
  irq: slab alloc for default irq_affinity
  memcg: fix page_cgroup fatal error in FLATMEM
2009-06-12 09:52:30 -07:00
Linus Torvalds
7f3591cfac Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits)
  lguest: add support for indirect ring entries
  lguest: suppress notifications in example Launcher
  lguest: try to batch interrupts on network receive
  lguest: avoid sending interrupts to Guest when no activity occurs.
  lguest: implement deferred interrupts in example Launcher
  lguest: remove obsolete LHREQ_BREAK call
  lguest: have example Launcher service all devices in separate threads
  lguest: use eventfds for device notification
  eventfd: export eventfd_signal and eventfd_fget for lguest
  lguest: allow any process to send interrupts
  lguest: PAE fixes
  lguest: PAE support
  lguest: Add support for kvm_hypercall4()
  lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
  lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
  lguest: map switcher with executable page table entries
  lguest: fix writev returning short on console output
  lguest: clean up length-used value in example launcher
  lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
  lguest: beyond ARRAY_SIZE of cpu->arch.gdt
  ...
2009-06-12 09:32:26 -07:00
Jean Delvare
3ac49a1c99 trivial: fix ETIMEOUT -> ETIMEDOUT typos
fix ETIMEOUT -> ETIMEDOUT typos

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:50 +02:00
Manish Katiyar
1dc492a0a4 trivial: kernel/power/poweroff.c: whitespace fix
Fix coding style whitespace fixes. Patch compile tested
Before :-
total: 1 errors, 0 warnings, 46 lines checked
After
total: 0 errors, 0 warnings, 46 lines checked

Before :-
  text	   data	    bss	    dec	    hex	filename
    107	     48	      0	    155	     9b	kernel/power/poweroff.o
After
   text	   data	    bss	    dec	    hex	filename
    107	     48	      0	    155	     9b	kernel/power/poweroff.o

Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:46 +02:00
Rusty Russell
b43e352139 sched: export kick_process
lguest needs kick_process: wake_up_process() does nothing if a process
is running, which isn't sufficient (we need it in the kernel).

And lguest support is usually modular.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-12 22:27:01 +09:30
Peter Zijlstra
974802eaa1 perf_counter: Add forward/backward attribute ABI compatibility
Provide for means of extending the perf_counter_attr in a 'natural' way.

We allow growing the structure by appending fields at the end by specifying
the full structure size inside it.

When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
When an old kernel sees a larger (new) structure, it will verify the tail
consists of 0s, otherwise fail.

If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
native attribe size back into the provided structure.

Furthermore, add some attribute verification, so that we'll fail counter
creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
the __reserved fields).

(This ABI detail is introduced while keeping the existing syscall ABI.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 14:28:52 +02:00
Peter Zijlstra
081fad8617 perf_counter: Remove PERF_TYPE_RAW special casing
The PERF_TYPE_RAW special case seems superfluous these days. Remove
it and add it to the switch() stmt like the others.

[ Impact: cleanup ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 14:28:51 +02:00
Rusty Russell
ad6561dffa module: trim exception table on init free.
It's theoretically possible that there are exception table entries
which point into the (freed) init text of modules.  These could cause
future problems if other modules get loaded into that memory and cause
an exception as we'd see the wrong fixup.  The only case I know of is
kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).

Amerigo fixed this long-standing FIXME in the x86 version, but this
patch is more general.

This implements trim_init_extable(); most archs are simple since they
use the standard lib/extable.c sort code.  Alpha and IA64 use relative
addresses in their fixups, so thier trimming is a slight variation.

Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
yet it defines its own sort_extable() which overrides the one in lib.
It doesn't sort, so we have to mark deleted entries instead of
actually trimming them.

Inspired-by: Amerigo Wang <amwang@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-alpha@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
2009-06-12 21:47:04 +09:30
Rusty Russell
fddd520122 module_param: allow 'bool' module_params to be bool, not just int.
Impact: API cleanup

For historical reasons, 'bool' parameters must be an int, not a bool.
But there are around 600 users, so a conversion seems like useless churn.

So we use __same_type() to distinguish, and handle both cases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:46:58 +09:30
Rusty Russell
45fcc70c0b module_param: split perm field into flags and perm
Impact: cleanup

Rather than hack KPARAM_KMALLOCED into the perm field, separate it out.
Since the perm field was 32 bits and only needs 16, we don't add bloat.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:46:56 +09:30
Rusty Russell
9a71af2c36 module_param: invbool should take a 'bool', not an 'int'
It takes an 'int' for historical reasons, and there are only two
users: simply switch it over to bool.

The other user (uvesafb.c) will get a (harmless-on-x86) warning until
the next patch is applied.

Cc: Brad Douglas <brad@neruo.com>
Cc: Michal Januszewski <spock@gentoo.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:46:56 +09:30
Yinghai Lu
28be225b23 irq: slab alloc for default irq_affinity
Ingo had

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at mm/bootmem.c:537 alloc_arch_preferred_bootmem+0x2b/0x71()
[    0.000000] Hardware name: System Product Name
[    0.000000] Modules linked in:
[    0.000000] Pid: 0, comm: swapper Tainted: G        W  2.6.30-tip-03087-g0bb2618-dirty #52506
[    0.000000] Call Trace:
[    0.000000]  [<81032588>] warn_slowpath_common+0x60/0x90
[    0.000000]  [<810325c5>] warn_slowpath_null+0xd/0x10
[    0.000000]  [<819d1bc0>] alloc_arch_preferred_bootmem+0x2b/0x71
[    0.000000]  [<819d1c31>] ___alloc_bootmem_nopanic+0x2b/0x9a
[    0.000000]  [<81050a0a>] ? lock_release+0xac/0xb2
[    0.000000]  [<819d1d4c>] ___alloc_bootmem+0xe/0x2d
[    0.000000]  [<819d1e9f>] __alloc_bootmem+0xa/0xc
[    0.000000]  [<819d7c63>] alloc_bootmem_cpumask_var+0x21/0x26
[    0.000000]  [<819d0cc8>] early_irq_init+0x15/0x10d
[    0.000000]  [<819bb75a>] start_kernel+0x167/0x326
[    0.000000]  [<819bb06b>] __init_begin+0x6b/0x70
[    0.000000] ---[ end trace 4eaa2a86a8e2da23 ]---
[    0.000000] NR_IRQS:2304 nr_irqs:424
[    0.000000] CPU 0 irqstacks, hard=821e6000 soft=821e7000

we need to update init_irq_default_affinity

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12 13:50:23 +03:00
Alessio Igor Bogani
337eb00a2c Push BKL down into ->remount_fs()
[xfs, btrfs, capifs, shmem don't need BKL, exempt]

Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:11 -04:00
Al Viro
589ff870ed Switch collect_mounts() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:01 -04:00
Ingo Molnar
0d5959723e Merge branch 'linus' into x86/mce3
Conflicts:
	arch/x86/kernel/cpu/mcheck/mce_64.c
	arch/x86/kernel/irq.c

Merge reason: Resolve the conflicts above.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 23:31:52 +02:00
Linus Torvalds
512626a04e Merge branch 'for-linus' of git://linux-arm.org/linux-2.6
* 'for-linus' of git://linux-arm.org/linux-2.6:
  kmemleak: Add the corresponding MAINTAINERS entry
  kmemleak: Simple testing module for kmemleak
  kmemleak: Enable the building of the memory leak detector
  kmemleak: Remove some of the kmemleak false positives
  kmemleak: Add modules support
  kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
  kmemleak: Add the vmalloc memory allocation/freeing hooks
  kmemleak: Add the slub memory allocation/freeing hooks
  kmemleak: Add the slob memory allocation/freeing hooks
  kmemleak: Add the slab memory allocation/freeing hooks
  kmemleak: Add documentation on the memory leak detector
  kmemleak: Add the base support

Manual conflict resolution (with the slab/earlyboot changes) in:
	drivers/char/vt.c
	init/main.c
	mm/slab.c
2009-06-11 14:15:57 -07:00
Linus Torvalds
8a1ca8cedd Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
  perf_counter: Turn off by default
  perf_counter: Add counter->id to the throttle event
  perf_counter: Better align code
  perf_counter: Rename L2 to LL cache
  perf_counter: Standardize event names
  perf_counter: Rename enums
  perf_counter tools: Clean up u64 usage
  perf_counter: Rename perf_counter_limit sysctl
  perf_counter: More paranoia settings
  perf_counter: powerpc: Implement generalized cache events for POWER processors
  perf_counters: powerpc: Add support for POWER7 processors
  perf_counter: Accurate period data
  perf_counter: Introduce struct for sample data
  perf_counter tools: Normalize data using per sample period data
  perf_counter: Annotate exit ctx recursion
  perf_counter tools: Propagate signals properly
  perf_counter tools: Small frequency related fixes
  perf_counter: More aggressive frequency adjustment
  perf_counter/x86: Fix the model number of Intel Core2 processors
  perf_counter, x86: Correct some event and umask values for Intel processors
  ...
2009-06-11 14:01:07 -07:00
Linus Torvalds
b640f042fa Merge branch 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  vgacon: use slab allocator instead of the bootmem allocator
  irq: use kcalloc() instead of the bootmem allocator
  sched: use slab in cpupri_init()
  sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var()
  memcg: don't use bootmem allocator in setup code
  irq/cpumask: make memoryless node zero happy
  x86: remove some alloc_bootmem_cpumask_var calling
  vt: use kzalloc() instead of the bootmem allocator
  sched: use kzalloc() instead of the bootmem allocator
  init: introduce mm_init()
  vmalloc: use kzalloc() instead of alloc_bootmem()
  slab: setup allocators earlier in the boot sequence
  bootmem: fix slab fallback on numa
  bootmem: use slab if bootmem is no longer available
2009-06-11 12:25:06 -07:00
Oleg Nesterov
b415c49a86 slow_work_thread() should do the exclusive wait
slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up all
kslowd threads.  This is not what we want, so we change slow_work_thread() to
use prepare_to_wait_exclusive() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11 11:26:38 -07:00