Commit Graph

349160 Commits (fba7a78220fd969beeaa9abbb97c5f7027a7b4fd)

Author SHA1 Message Date
Steven Rostedt 567cd4da54 ring-buffer: User context bit recursion checking
Using context bit recursion checking, we can help increase the
performance of the ring buffer.

Before this patch:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253

(average: 10.3012)

Now we have:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 9.712
Time: 9.824
Time: 9.861
Time: 9.827
Time: 9.962
Time: 9.905
Time: 9.886
Time: 10.088
Time: 9.861
Time: 9.834

(average: 9.876)

 a 4% savings!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:38:03 -05:00
Steven Rostedt 897f68a48b ftrace: Use only the preempt version of function tracing
The function tracer had two different versions of function tracing.

The disabling of irqs version and the preempt disable version.

As function tracing in very intrusive and can cause nasty recursion
issues, it has its own recursion protection. But the old method to
do this was a flat layer. If it detected that a recursion was happening
then it would just return without recording.

This made the preempt version (much faster than the irq disabling one)
not very useful, because if an interrupt were to occur after the
recursion flag was set, the interrupt would not be traced at all,
because every function that was traced would think it recursed on
itself (due to the context it preempted setting the recursive flag).

Now that we have a recursion flag for every context level, we
no longer need to worry about that. We can disable preemption,
set the current context recursion check bit, and go on. If an
interrupt were to come along, it would check its own context bit
and happily continue to trace.

As the preempt version is faster than the irq disable version,
there's no more reason to keep the preempt version around.
And the irq disable version still had an issue with missing
out on tracing NMI code.

Remove the irq disable function tracer version and have the
preempt disable version be the default (and only version).

Before this patch we had from running:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 12.028
Time: 11.945
Time: 11.925
Time: 11.964
Time: 12.002
Time: 11.910
Time: 11.944
Time: 11.929
Time: 11.941
Time: 11.924

(average: 11.9512)

Now we have:

 # echo function > /debug/tracing/current_tracer
 # for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253

(average: 10.3012)

 a 13.8% savings!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:38:02 -05:00
Steven Rostedt edc15cafcb tracing: Avoid unnecessary multiple recursion checks
When function tracing occurs, the following steps are made:
  If arch does not support a ftrace feature:
   call internal function (uses INTERNAL bits) which calls...
  If callback is registered to the "global" list, the list
   function is called and recursion checks the GLOBAL bits.
   then this function calls...
  The function callback, which can use the FTRACE bits to
   check for recursion.

Now if the arch does not suppport a feature, and it calls
the global list function which calls the ftrace callback
all three of these steps will do a recursion protection.
There's no reason to do one if the previous caller already
did. The recursion that we are protecting against will
go through the same steps again.

To prevent the multiple recursion checks, if a recursion
bit is set that is higher than the MAX bit of the current
check, then we know that the check was made by the previous
caller, and we can skip the current check.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:38:01 -05:00
Steven Rostedt e46cbf75c6 tracing: Make the trace recursion bits into enums
Convert the bits into enums which makes the code a little easier
to maintain.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:38:00 -05:00
Steven Rostedt c29f122cd7 ftrace: Add context level recursion bit checking
Currently for recursion checking in the function tracer, ftrace
tests a task_struct bit to determine if the function tracer had
recursed or not. If it has, then it will will return without going
further.

But this leads to races. If an interrupt came in after the bit
was set, the functions being traced would see that bit set and
think that the function tracer recursed on itself, and would return.

Instead add a bit for each context (normal, softirq, irq and nmi).

A check of which context the task is in is made before testing the
associated bit. Now if an interrupt preempts the function tracer
after the previous context has been set, the interrupt functions
can still be traced.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:38:00 -05:00
Steven Rostedt 0a016409e4 ftrace: Optimize the function tracer list loop
There is lots of places that perform:

       op = rcu_dereference_raw(ftrace_control_list);
       while (op != &ftrace_list_end) {

Add a helper macro to do this, and also optimize for a single
entity. That is, gcc will optimize a loop for either no iterations
or more than one iteration. But usually only a single callback
is registered to the function tracer, thus the optimized case
should be a single pass. to do this we now do:

	op = rcu_dereference_raw(list);
	do {
		[...]
	} while (likely(op = rcu_dereference_raw((op)->next)) &&
	       unlikely((op) != &ftrace_list_end));

An op is always registered (ftrace_list_end when no callbacks is
registered), thus when a single callback is registered, the link
list looks like:

 top => callback => ftrace_list_end => NULL.

The likely(op = op->next) still must be performed due to the race
of removing the callback, where the first op assignment could
equal ftrace_list_end. In that case, the op->next would be NULL.
But this is unlikely (only happens in a race condition when
removing the callback).

But it is very likely that the next op would be ftrace_list_end,
unless more than one callback has been registered. This tells
gcc what the most common case is and makes the fast path with
the least amount of branches.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:37:59 -05:00
Steven Rostedt 9640388b63 ftrace: Fix function tracing recursion self test
The function tracing recursion self test should not crash
the machine if the resursion test fails. If it detects that
the function tracing is recursing when it should not be, then
bail, don't go into an infinite recursive loop.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:37:58 -05:00
Steven Rostedt 6350379452 ftrace: Fix global function tracers that are not recursion safe
If one of the function tracers set by the global ops is not recursion
safe, it can still be called directly without the added recursion
supplied by the ftrace infrastructure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:37:57 -05:00
Steven Rostedt 05cbbf643b tracing: Fix selftest function recursion accounting
The test that checks function recursion does things differently
if the arch does not support all ftrace features. But that really
doesn't make a difference with how the test runs, and either way
the count variable should be 2 at the end.

Currently the test wrongly fails for archs that don't support all
the ftrace features.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:35:11 -05:00
Steven Rostedt 34600f0e9c tracing: Fix race with max_tr and changing tracers
There's a race condition between the setting of a new tracer and
the update of the max trace buffers (the swap). When a new tracer
is added, it sets current_trace to nop_trace before disabling
the old tracer. At this moment, if the old tracer uses update_max_tr(),
the update may trigger the warning against !current_trace->use_max-tr,
as nop_trace doesn't have that set.

As update_max_tr() requires that interrupts be disabled, we can
add a check to see if current_trace == nop_trace and bail if it
does. Then when disabling the current_trace, set it to nop_trace
and run synchronize_sched(). This will make sure all calls to
update_max_tr() have completed (it was called with interrupts disabled).

As a clean up, this commit also removes shrinking and recreating
the max_tr buffer if the old and new tracers both have use_max_tr set.
The old way use to always shrink the buffer, and then expand it
for the next tracer. This is a waste of time.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 23:33:07 -05:00
Luciano Coelho a7e2ca1703 Revert "drivers/misc/ti-st: remove gpio handling"
This reverts commit eccf2979b2.

The reason is that it broke TI WiLink shared transport on Panda.
Also, callback functions should not be added to board files anymore,
so revert to implementing the power functions in the driver itself.

Additionally, changed a variable name ('status' to 'err') so that this
revert compiles properly.

Cc: stable <stable@vger.kernel.org> [3.7]
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-22 17:22:47 -08:00
Linus Torvalds 1d85490853 PCI updates for v3.8:
Hotplug
       PCI: pciehp: Use per-slot workqueues to avoid deadlock
       PCI: shpchp: Make shpchp_wq non-ordered
       PCI: shpchp: Handle push button event asynchronously
       PCI: shpchp: Use per-slot workqueues to avoid deadlock
   Power management
       PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported
   Misc
       PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put()
       PCI: remove depends on CONFIG_EXPERIMENTAL
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/vtQAAoJEPGMOI97Hn6zR+EP/0yvgixQauWTJPmcxdfLeklo
 TfgrjntnsWvUM0ltQxQwq4yi+kBQLp2KDK4/A8w50vJTdN3DeXJ5u27pafpgkac0
 TsySRuYMYKqE/rzGDG3QSVKO3NCCn99AfGAweM5gqzMClPfEcyKZr3wFV1BdEC4e
 tTbALJJp4hAZwrS+Flhfpkx26kEQgXSb8k4IqoFvAeZn7zZcdeYjl+vZhAQUTcjK
 QNBkOvzNEGDkfluVKcfFfgJhmeaDBUUf6o+f0W6XOAflGhyE8TxfsUJA9Mxc4r2S
 1jOVdR/emJ0rC8UvPYROfGWgPTH+4Cv1bKdFHxh3FpHeWb3QmzeXEtoxLahdW3zM
 kPA6PLjZ/VU3ZCoPl6KO4QaBlc21ouK6R3kHZ7FssoUN10o9tQAJpaMdPkGFMGbB
 rZaCupngUOr3ildknrNgncewpeuFAbMKa8xTSVF/j75dhUFijapLV7Tw9HHDQlio
 WNfkmL7OnyDibpSu+nHgZII0UvOQreKdrQvtGxbbZhgeyavYobkAlagjtbYuNX0e
 5dw5bLpoFJgSnFNgatwQhFSReYGhpqKs6287ZPfJItCtwZzbBHxstERjL/J2wGL1
 45/q1U6u/HOSfqXgXkHpiGBNdzA5oLD2v0uFGgJNbeVRGlFIe1f050TA8MjzI3xP
 WmIRrYQCEaRlbpGTHA9B
 =lhvi
 -----END PGP SIGNATURE-----

Merge tag '3.8-pci-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "The most important is a fix for a pciehp deadlock that occurs when
  unplugging a Thunderbolt adapter.  We also applied the same fix to
  shpchp, removed CONFIG_EXPERIMENTAL dependencies, fixed a
  pcie_aspm=force problem, and fixed a refcount leak.

  Details:

   - Hotplug
      PCI: pciehp: Use per-slot workqueues to avoid deadlock
      PCI: shpchp: Make shpchp_wq non-ordered
      PCI: shpchp: Handle push button event asynchronously
      PCI: shpchp: Use per-slot workqueues to avoid deadlock

   - Power management
      PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported

   - Misc
      PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put()
      PCI: remove depends on CONFIG_EXPERIMENTAL"

* tag '3.8-pci-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: remove depends on CONFIG_EXPERIMENTAL
  PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported
  PCI: shpchp: Use per-slot workqueues to avoid deadlock
  PCI: shpchp: Handle push button event asynchronously
  PCI: shpchp: Make shpchp_wq non-ordered
  PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put()
  PCI: pciehp: Use per-slot workqueues to avoid deadlock
2013-01-22 16:36:23 -08:00
Tejun Heo f56c3196f2 async: fix __lowest_in_progress()
Commit 083b804c4d ("async: use workqueue for worker pool") made it
possible that async jobs are moved from pending to running out-of-order.
While pending async jobs will be queued and dispatched for execution in
the same order, nothing guarantees they'll enter "1) move self to the
running queue" of async_run_entry_fn() in the same order.

Before the conversion, async implemented its own worker pool.  An async
worker, upon being woken up, fetches the first item from the pending
list, which kept the executing lists sorted.  The conversion to
workqueue was done by adding work_struct to each async_entry and async
just schedules the work item.  The queueing and dispatching of such work
items are still in order but now each worker thread is associated with a
specific async_entry and moves that specific async_entry to the
executing list.  So, depending on which worker reaches that point
earlier, which is non-deterministic, we may end up moving an async_entry
with larger cookie before one with smaller one.

This broke __lowest_in_progress().  running->domain may not be properly
sorted and is not guaranteed to contain lower cookies than pending list
when not empty.  Fix it by ensuring sort-inserting to the running list
and always looking at both pending and running when trying to determine
the lowest cookie.

Over time, the async synchronization implementation became quite messy.
We better restructure it such that each async_entry is linked to two
lists - one global and one per domain - and not move it when execution
starts.  There's no reason to distinguish pending and running.  They
behave the same for synchronization purposes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 16:21:24 -08:00
Linus Torvalds ed06ef318a perf/urgent fixes:
. revert 20b279 - require exclude_guest to use PEBS - kernel side,
   now older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.
 
 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from
   Sebastian Andrzej Siewior
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ7yQhAAoJENZQFvNTUqpAjj4P/RR+8WeXpV02yzndhry+Yjav
 L/WQH0CkRRmyUA5akTcpgFrohJCEOi+auHl7ivmDX4XWFavcAkX3H1Yz1FytCOkb
 Cbb9Lv1rGdlTno8fTVUn8mNyTG64AlWrAw3ICixWw6q6I/k6SO7EkKigCxPhmY+2
 BE2EvkZmOY/PEUXgM6HtUdifORatX48p1toS7p3CDQ31cxBN5OVNZUXa1FakJpyH
 7R+1imKLsjuyi/G7Bt061LyWQkOh7L/ITWN+5Rx4RsUwRRT3vm1H9nlqUBsPS0PW
 qfkktkCmn/cFpKbfBipuglnt16jHPMfI/pghKvzx8n2uJMNEGXbfFDJefpzcdih9
 wIRgB6a5bvA8VF6Xpcn0I5JhqLAcnWTer07JgjZevjqYCdZStpbJjvE5131JjTLw
 Dnm7UshE+VFBcA3iXNX64p/X7WDJSk+SIDsJDuNe57dktFVLw76Ibb55XG18Ex7e
 c9QcIEhD1P19VzOniDZQZNEJhqnu5Vjle/eG+JRVRCm/BgoQFyJuD3EooKPN7hHR
 Op4oqf5RhDf7XNH0+Y4rOdRMZRiumdfcEl6kdcQGPJPycxpD7xCJNzBLWK/BvQgT
 Kl0AEkRC0KE2c5LyFttW+g1Byu1rctlMz2TVJDTskTm0XGOQ9mTzsQBP5rgBVt+b
 hQsMMWNVSI+jfm8bTJwx
 =LTjZ
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 . revert 20b279 - require exclude_guest to use PEBS - kernel side, now
   older binaries will continue working for things like cycles:pp
   without needing to pass extra modifiers, from David Ahern.

 . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI,
   from Sebastian Andrzej Siewior

[ Pulling directly, Ingo would normally pull but has been unresponsive ]

* tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf tools: Fix building from 'make perf-*-src-pkg' tarballs
  perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22 14:32:07 -08:00
Linus Torvalds 343391b1d1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
 "Improve the stability of the linux kernel on the parisc architecture"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: sigaltstack doesn't round ss.ss_sp as required
  parisc: improve ptrace support for gdb single-step
  parisc: don't claim cpu irqs more than once
  parisc: avoid undefined shift in cnv_float.h
2013-01-22 14:30:35 -08:00
Linus Torvalds 262060ea46 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "This contain a bugfix for CUSE and miscellaneous small fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unused variable in fuse_try_move_page()
  fuse: make fuse_file_fallocate() static
  fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig
  cuse: fix uninitialized variable warnings
  cuse: do not register multiple devices with identical names
  cuse: use mutex as registration lock instead of spinlocks
2013-01-22 11:53:19 -08:00
Linus Torvalds b75b25b00d GPIO fixes for the v3.8 series:
- Remove a bad #include from the Samsung driver
 - Some Kconfig hazzle for the Samsungs
 - Skip gpiolib registration on EXYNOS5440
 - Don't free the MVEBU label
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/ltzAAoJEEEQszewGV1z/IwP/0caelXLbBS7vC2cjLHpQuXA
 AdmXJUGcOSRaQ31CzZnGpBp0EvQliz52YtKWTbMBsQHAXs0pMqBshiDXDIczTY18
 chlI7sTHyI6Kkeu6pRwbbFrY5rjFxXyv2MY3hTqithKDt+Ma+uGfoDPzFU2hLtkf
 m1N1TFXq1WxEnjQuQhIywSvVihnSz6F25S9UrUlZi/QNiyHn5wmS2UsJfyVYXAsD
 33M/+epzj9x4OdopSZJUq7zffegOTDBspYtdE9/K3SqLz7GusmArcx5sR+fRm/TV
 N3/8y2rpdvh5m+ftIjtl1L5PDZST4eSwWFdsie0vXhuAOJ/PsPYhk7OTstB4qO0R
 BgxcaN/GpFZABhzMkcJZnqx+c2YM/mVjmBKSGy1pSnEoLvEFrZ3sr2PVYtbk6I2h
 dgLXcruLzmpg04nTbjCAK5gvT3Inr7z381AHYiIs/Fl8fkWNNyQV6QvEX8zeKkkv
 tiE/FVo7ESeYxRkrW+fWJ5ADnCuWPhWDRH0iTFEDoxCoJVjuxzNXdpiGaCNVyR6M
 YThRKnc19pUtuiuTivNLUvEyfyFHKg1dmQ0fiQ8qa47SXwm/FPv1N0xbawZQLqgp
 2K0Go3rn1NGHahjf0aI7YVVRaFDtu7WXAB9N2xm5WpFJOWJo/f8qQ1FdRDDGA5oV
 h5vor6GrJgaq9fA9XTjx
 =N1rq
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-v3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "Here are some GPIO fixes I stacked up in my GPIO tree:

   - Remove a bad #include from the Samsung driver
   - Some Kconfig hazzle for the Samsungs
   - Skip gpiolib registration on EXYNOS5440
   - Don't free the MVEBU label"

* tag 'fixes-for-v3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: mvebu: Don't free chip label memory
  gpio: samsung: skip gpio lib registration for EXYNOS5440
  gpio: samsung: silent build warning for EXYNOS5 SoCs
  gpio: samsung: fix pinctrl condition for exynos and exynos5440
  gpio: samsung: remove inclusion <mach/regs-clock.h>
2013-01-22 11:52:23 -08:00
Linus Torvalds 05c2cf35c3 f2fs fixes for v3.8-rc5
o Support swap file and link generic_file_remap_pages
 o Enhance the bio streaming flow and free section control
 o Major bug fix on recovery routine
 o Minor bug/warning fixes and code cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/fpIAAoJEEAUqH6CSFDS6BAP/jz/JIPoSu4NunWgVH68tzA4
 xx4lsmJQJQG+1241uA9v4MMkcTzxE0QVNTOX3BpUXBhCMc00aPOPWlWjULridLI9
 0vRE6LpG+NntkyKF+M5T1ydGuQoDlEvqoGs6c5p3yaI6PxzbzmBmipsmqXUA8fyu
 280+OWJoAcALEMJiQ8JsHcDmvM9wdQ+BV/j3BNCm4dqBUA4dYPfDzRKUJYfwqiig
 qmVRseJJaekxrQ2lHG/K/WPAXa8aRcV6khP9tv/BPGRMt+I/fli/J4sWEFT6c73B
 +qYmhrf/RLNJ1O13dePlo3URwMu083PL8QN355GAKUJbMaX/UPjEnq0DLbBOkCS2
 KBQI5O1eiFauEE6YU7p7GuvnLeVkukcXSQNVnRrnWzTUA9CMThZZ2mAgb2lz+iEP
 oZWNirRwnxdcTQRXjPyTCtpPTCgJi4GO1WS5s6HeLP8G8Muo4PzDzcEwMX0Mw5ih
 s4n4wpQ+Zp3h53cc/DxdFAK15uM3XYtUfb92kJwqaEG5VmBy6KvliXDRnNXg7WMI
 imCb08c0Fr0M8ZHOd+UCveICcndFj25jkjx8w7PoE1KBbGJkKf+cjpZ3OhOAliux
 sGtH3EZLV6jx1MjV79OpDXQGoVsvktesCbRcaxc7BbqljXYVNiap8QbzjPnmAt6z
 KKN0GU32eolGgK4Zd/iF
 =vJQc
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:
 o Support swap file and link generic_file_remap_pages
 o Enhance the bio streaming flow and free section control
 o Major bug fix on recovery routine
 o Minor bug/warning fixes and code cleanups

* tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (22 commits)
  f2fs: use _safe() version of list_for_each
  f2fs: add comments of start_bidx_of_node
  f2fs: avoid issuing small bios due to several dirty node pages
  f2fs: support swapfile
  f2fs: add remap_pages as generic_file_remap_pages
  f2fs: add __init to functions in init_f2fs_fs
  f2fs: fix the debugfs entry creation path
  f2fs: add global mutex_lock to protect f2fs_stat_list
  f2fs: remove the blk_plug usage in f2fs_write_data_pages
  f2fs: avoid redundant time update for parent directory in f2fs_delete_entry
  f2fs: remove redundant call to set_blocksize in f2fs_fill_super
  f2fs: move f2fs_balance_fs to punch_hole
  f2fs: add f2fs_balance_fs in several interfaces
  f2fs: revisit the f2fs_gc flow
  f2fs: check return value during recovery
  f2fs: avoid null dereference in f2fs_acl_from_disk
  f2fs: initialize newly allocated dnode structure
  f2fs: update f2fs partition info about SIT/NAT layout
  f2fs: update f2fs document to reflect SIT/NAT layout correctly
  f2fs: remove unneeded INIT_LIST_HEAD at few places
  ...
2013-01-22 10:33:17 -08:00
Linus Torvalds 3c2a9f84e9 vfio fixes for v3.8-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/cZWAAoJECObm247sIsibDUP/jyFstqk8Nhg99qJvecoI7RA
 9x7IpKpV1xDQfOeQZWf+5NXQwr/OBBApEYN7VNun3rBZJxuGNKzJSnMaAJWy+8qW
 J1rNzw/O49fejECVfYBDUvEV1WC9F/zUIcGOXYcyU0ZDT/3lAC7ZzVKY1l4hWgqx
 dL3BOhPPpvgQN3otUQnGK/7sVjptjl4ks8js7GFlbmQnv86GyqjB0DBzy1/xZrFO
 E59LK8FZup3Vs3UrkI8A3Op3E6YkNiZKBkCrjduHNFHfFwjcv8vRZHYB2BJCm69j
 2HmcQWLVjZPdlkyjLgiJ10S7ZBTc/bjXZeq30cuwJalI0kcdKaD6f00qG12qZOce
 p15oXI68hXsK9JTYUHBQrbjGhIKAfkcmJQB0Q+JDF56Zi1x9LMzmQ/BK/5yq8uWQ
 Y3jm8krAuoU/Y49MiExEM2MuVh06ftHOcTICuZj2M2d2gR8mCGE7l7F7BgHmFaDc
 MdyUELJ1L6kTvqMREI8i14ZG77KEfMRnu6KI0w0kIyjUt4CApAwCfe/HpNicfP/N
 nSMbRonwm8myuEX3uPKauTxRrKklRGLlxS7LrrlawsVMki+FhxNq8vZkZfkjuFaW
 fZxgt+dj0+JsOYej6y+6+CmYA7TXcXt0svOAqCCthEVgPAkWA2RraBA3YsSW1Ste
 GnnnFaWcddPo4gB9Z70g
 =nRaP
 -----END PGP SIGNATURE-----

Merge tag 'vfio-for-v3.8-rc5' of git://github.com/awilliam/linux-vfio

Pull vfio fix from Alex Williamson.
 "vfio-pci: Fix buffer overfill"

* tag 'vfio-for-v3.8-rc5' of git://github.com/awilliam/linux-vfio:
  vfio-pci: Fix buffer overfill
2013-01-22 10:31:57 -08:00
Linus Torvalds d26d45253b Kprobes now uses the function tracer if it can. That is, if a probe
is placed on a function mcount/nop location, and the arch supports it,
 instead of adding a breakpoint, kprobes will register a function callback
 as that is much more efficient.
 
 The function tracer requires to update modules before they run, and
 uses the module notifier to do so. But if something else in the module
 notifiers registers a kprobe at one of these locations, before ftrace
 can get to it, then the system could fail.
 
 The function tracer must be initialized early, otherwise module notifiers
 that probe will only work by chance.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQ/ZaaAAoJEOdOSU1xswtMQsoH/02S5ngidMgIXwYCGzwxR6Ym
 f2krSz76rBkFn9ivFPJFfq2Wi05UhhVfFr4gpKzoK+/rBqwVkV5tRIQ4AhJ1/Q3d
 wCXB2mcRI6ky/kB/ts8q6gjj+rlJ18NgZf79TA5y5q7FEYc6DvB2dZ+vE+rvCnXk
 q/Bx6q4phEdLbVqet+Ga36qzi9u+pW0P+ntZmi0EQLVnz9p+mtdVaRz32qKp0FYi
 XhHtPMHQDZoOu8utPNtPcfSOZp1sNN8tXqjEAJ4/Ba54badUfg4WJ+RXGPsYH+4T
 lOwpVv7Xp0Sp2b1HivEVfCs1bx2RNzluZisPahBEwdBv+XssvqMbAfG+X3EVk3w=
 =6EYI
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Kprobes now uses the function tracer if it can.  That is, if a probe
  is placed on a function mcount/nop location, and the arch supports it,
  instead of adding a breakpoint, kprobes will register a function
  callback as that is much more efficient.

  The function tracer requires to update modules before they run, and
  uses the module notifier to do so.  But if something else in the
  module notifiers registers a kprobe at one of these locations, before
  ftrace can get to it, then the system could fail.

  The function tracer must be initialized early, otherwise module
  notifiers that probe will only work by chance."

* tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Be first to run code modification on modules
2013-01-22 10:30:49 -08:00
Linus Torvalds 0944c0a034 1) ahci: Fix typo that caused erronenous error handling.
Thought:  I wonder if sparse could have caught this, somehow.
 
 2) ahci: support a slightly odd Enmotus variant
 
 3) core: fix a drive detection problem by correcting the logic
    by which the DevSlp timing variables are obtained and used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUP2a/CWzCDIBeCsvAQJBhA/+JowfM2CKII3+k4Kq2UbaR+uJHALABtxh
 rRWBana8ESxoXuY6Usylkdms5fQamp3r/YB29ZzUw3nsHP4u9Y4L2+NIYryXPYll
 xNeg9tI/y2ypC8wNnXQIrbvYGH303BqyBpUTQ7qfQ+zrKq9qSg9vTBC9rbGE6YdM
 sR8gcz1wuY3eFeEw2FXHyWWnXTmB4/UmgISjJL13c9rJoQpK4k1svhUnPAqSM8RZ
 IB4z4NUne1REvpDdhyIIi5hGeOFtULMtEdjIxNuHNmQXgi6Q225qDNS4afhEQLVu
 lf6VLMYmMziNYNhyQDl1Vw82/zics01u9HOnCpfs/De+6DE4DeH9z0wG9Hz35YHZ
 6ZcSQPLVk8/Jfp2jSsovc3h51o6W3DavuJxvs9EC1gyhDpPnYPPBOgiVPkb/km1n
 hoBBIKqwFrV8MzpbUgSqlrWRlQ4Mco95l/Nwnr9mWe1CV81dYRXQ0cH2WYjveigy
 tpbKgpDiEMpmUHT9rUX5AXMdv6JczMY6mP+FGmOeBio8hG0uozaZYCczbQAX9PD4
 UFkqms0C9C/SH1k3LJWkSZaxwupCZpvEyXu1RezcJsv0qObfrbnGxtSG4lBdv0x/
 SlSMLQRG7cE9la9amEwzqhUQXbS/rPcpZ8GK+2Qjhb36qO5GJpgUUEp+izfHcy4v
 ATS8puL+GhQ=
 =7WDW
 -----END PGP SIGNATURE-----

Merge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

Pull libata fixes from Jeff Garzik:

 1) ahci: Fix typo that caused erronenous error handling.

    Thought: I wonder if sparse could have caught this, somehow.

 2) ahci: support a slightly odd Enmotus variant

 3) core: fix a drive detection problem by correcting the logic by which
    the DevSlp timing variables are obtained and used.

* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] replace sata_settings with devslp_timing
  [libata] ahci: Add support for Enmotus Bobcat device.
  [libata] ahci: Fix lack of command retry after a success error handler.
2013-01-22 10:10:34 -08:00
Linus Torvalds a7ed6c4320 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem bugfixes from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security/device_cgroup: lock assert fails in dev_exception_clean()
  evm: checking if removexattr is not a NULL
2013-01-22 10:10:10 -08:00
Oleg Nesterov 9067ac85d5 wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().

TASK_ALL has no other users, probably can be killed.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:17 -08:00
Oleg Nesterov 9899d11f65 ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack.  However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.

set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.

As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it.  Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.

Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().

While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().

Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 10:08:00 -08:00
Will Deacon f1b99392ca arm64: makefile: fix uname munging when setting ARCH on native machine
By popular demand, arch/aarch64 is now known as arch/arm64. However,
uname -m (and indeed the GNU triplet) still use aarch64 as the machine
string.

This patch fixes native builds of both the kernel and perf tools by
updating the relevant Makefiles to munge the output of uname -m and
set the ARCH variable appropriately.

Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-01-22 17:51:00 +00:00
Will Deacon 9cf2b72b25 arm64: elf: fix core dumping to match what glibc expects
The kernel's internal definition of ELF_NGREG uses struct pt_regs, which
means that we disagree with userspace on the size of coredumps since
glibc correctly uses the user-visible struct user_pt_regs.

This patch fixes our ELF_NGREG definition to use struct user_pt_regs
and introduces our own ELF_CORE_COPY_REGS to convert between the user
and kernel structure definitions.

Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-01-22 17:50:59 +00:00
Alan Stern 9debc1793b USB: EHCI: add a name for the platform-private field
This patch (as1642) adds an ehci->priv field for private use by EHCI
platform drivers.  The space was provided some time ago, but it didn't
have a name.

Until now none of the platform drivers has used this private space,
but that's about to change in the next patch of this series.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-22 09:22:13 -08:00
Alan Stern 9ce45ef86c USB: EHCI: fix incorrect configuration test
This patch (as1641) fixes a minor bug in ehci-hcd left over from when
the Chipidea driver was converted to the "ehci-hcd is a library"
scheme.  The test for whether the Chipidea platform driver is active
should be IS_ENABLED(), not defined().

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-22 09:21:23 -08:00
Roger Quadros 9ec6e9d3cb USB: EHCI: Move definition of EHCI_STATS to ehci.h
Without this, platform drivers e.g. ehci-omap.c will see a
different version of struct ehci_hcd than ehci-hcd.c and
break reference to 'debug_dir' and 'priv' members when
CONFIG_USB_DEBUG is enabled.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-22 09:21:23 -08:00
Steven Rostedt 0a71e4c6d7 tracing: Remove trace.h header from trace_clock.c
As trace_clock is used by other things besides tracing, and it
does not require anything from trace.h, it is best not to include
the header file in trace_clock.c.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22 12:06:56 -05:00
Alan Stern 0f815a0a70 USB: UHCI: fix IRQ race during initialization
This patch (as1644) fixes a race that occurs during startup in
uhci-hcd.  If the IRQ line is shared with other devices, it's possible
for the handler routine to be called before the data structures are
fully initialized.

The problem is fixed by adding a check to the IRQ handler routine.  If
the initialization hasn't finished yet, the routine will return
immediately.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Don Zickus <dzickus@redhat.com>
Tested-by: "Huang, Adrian (ISS Linux TW)" <adrian.huang@hp.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-22 08:55:13 -08:00
Oleg Nesterov 910ffdb18a ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()
Cleanup and preparation for the next change.

signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.

Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.

This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22 08:50:08 -08:00
Steven Rostedt b000c8065a tracing: Remove the extra 4 bytes of padding in events
Due to a userspace issue with PowerTop v2beta, which hardcoded
the offset of event fields that it was using, it broke when
we removed the Big Kernel Lock counter from the event header.

 (commit e6e1e2593 "tracing: Remove lock_depth from event entry")

Because this broke userspace, it was determined that we must
keep those 4 bytes around.

 (commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"")

This unfortunately wastes space in the ring buffer. 4 bytes per
event, where a lot of events are just 24 bytes. That's 16% of the
buffer wasted. A million events will add 4 megs of white space
into the buffer.

It was later noticed that PowerTop v2beta could not work on systems
where the kernel was 64 bit but the userspace was 32 bits.
The reason was because the offsets are different between the
two and the hard coded offset of one would not work with the other.

With PowerTop v2 final, it implemented the same interface that both
perf and trace-cmd use. That is, it reads the format file of
the event to find the offsets of the fields it needs. This fixes
the problem with running powertop on a 32 bit userspace running
on a 64 bit kernel. It also no longer requires the 4 byte padding.

As PowerTop v2 has been out for a while, and is included in all
major distributions, it is time that we can safely remove the
4 bytes of padding. Users of PowerTop v2beta should upgrade to
PowerTop v2 final.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 21:05:41 -05:00
Dan Carpenter d8b79b2f94 f2fs: use _safe() version of list_for_each
This is calling list_del() inside a loop which is a problem when we try
move to the next item on the list.  I've converted it to use the _safe
version.  And also, as a cleanup, I've converted it to use
list_for_each_entry instead of list_for_each.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:49:00 +09:00
Jaegeuk Kim 9af45ef5ab f2fs: add comments of start_bidx_of_node
The caller of start_bidx_of_node() should give proper node offsets which
point only direct node blocks. Otherwise, it is a caller's bug.
This patch adds comments to make it clear.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim a7fdffbd3e f2fs: avoid issuing small bios due to several dirty node pages
If some small bios of dirty node pages are supposed to be issued during the
sequential data writes, there-in well-produced consecutive data bios are able
to be split by the small node bios, resulting in performance degradation.
So, let's collect a number of dirty node pages until reaching a threshold.
And, by default, I set the threshold as 2MB, a segment size.

This improves sequential write performance on i5, 512GB SSD (830 w/ SATA2) as
follows.
Before: 231 MB/s -> After: 255 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim c01e54b770 f2fs: support swapfile
This patch adds f2fs_bmap operation to the data address space.
This enables f2fs to support swapfile.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:58 +09:00
Jaegeuk Kim 692bb55d1a f2fs: add remap_pages as generic_file_remap_pages
This was added for all the file systems before.

See the following commit.

commit id: 0b173bc4da

[PATCH] mm: kill vma flag VM_CAN_NONLINEAR

This patch moves actual ptes filling for non-linear file mappings
into special vma operation: ->remap_pages().

File system must implement this method to get non-linear mappings support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support."

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:58 +09:00
Namjae Jeon 6e6093a8f1 f2fs: add __init to functions in init_f2fs_fs
Add __init to functions in init_f2fs_fs for code consistency.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:38 +09:00
Greg Kroah-Hartman ad2e632966 usb: fixes for v3.8-rc5
Finally we have a build fix for fsl-mxc-udc UDC driver.
 
 We also have a fix for ep0 maxburst setting on DWC3
 which could confuse the HW if we tell it we had way
 too many streams on that endpoint when it _has_ to be
 only one.
 
 cppi_dma support for MUSB got a fix when running as a
 module. By dropping the wrong __init annotation, the
 function will be available even when we're modules and
 we're done with .init.text section.
 
 Last, but not least, we have a fix on FunctionFS which
 was causing a bug on our option parsing algorithm.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ+T2OAAoJEIaOsuA1yqRE/eEP/jJTOGu8LJegFohGFSVY80RK
 IuMhPCRa3uE26YFaP6OQOFHwVSrzMYfITcptQQsrQKv6zE25m+ZYOpD7pL8MQV8q
 YNhvBS44G7j42akQVoE0kqN00PCWyQgjGxx93+NSOik8IOUy4s9Em8yTkAsXwGa6
 CHId1KmEgEq3IbT/+XJAEjMvV7FHONF7p08LK1riPRtIC7sb6BODjdAPKj5Hv5r6
 o2xta0LslSNHMIs8CUpeK6qol9n6n9Oz7NRj2yEFiC/bebLjQ0X0Eo44QfwkLtqj
 u3hFL1BPfv39U47fmAugZqeelKfmPgpohY+CUrCc2Wqrvo1aae7nEJrU4xQV02WH
 kFj1I8EVtcSmq9TvnXaAW75MNPRUtKAJsxWTTVqTIzPBRYvVv0i+8whDDSiEj+xN
 FlqzpHd1ojKnqk+BS9nI9XXPGFV022xecPGj5rg9gTEuoutYndHl2LDBB+u6lraP
 FnhUHkQ4UtJH3yzuZ3Xv7ugrGK9FQX3zJKtXJXS85QTNtFlL80cK3Ph12d6xOCs1
 oGadH0wdCOE4lEjeTloaaC+/tMuiSWzLPQBcIUC5IAXrH9zLXuFnkB6RFEqspSJH
 ebFeJRNKodOaZxge4YbO2/kaY2MQa1hJfzlJtXexZNCKxK7iA+fL9a70KgmWJUs+
 DwPlhoa+JFwJLSYVnOow
 =4n68
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-v3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

  usb: fixes for v3.8-rc5

  Finally we have a build fix for fsl-mxc-udc UDC driver.

  We also have a fix for ep0 maxburst setting on DWC3
  which could confuse the HW if we tell it we had way
  too many streams on that endpoint when it _has_ to be
  only one.

  cppi_dma support for MUSB got a fix when running as a
  module. By dropping the wrong __init annotation, the
  function will be available even when we're modules and
  we're done with .init.text section.

  Last, but not least, we have a fix on FunctionFS which
  was causing a bug on our option parsing algorithm.
2013-01-21 11:37:57 -08:00
Masami Hiramatsu f684199f5d kprobes/x86: Move kprobes stuff under arch/x86/kernel/kprobes/
Move arch-dep kprobes stuff under arch/x86/kernel/kprobes.

Link: http://lkml.kernel.org/r/20120928081522.3560.75469.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed whitespace and s/__attribute__((packed))/__packed/ ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:37 -05:00
Masami Hiramatsu e7dbfe349d kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c
Split ftrace-based kprobes code from kprobes, and introduce
CONFIG_(HAVE_)KPROBES_ON_FTRACE Kconfig flags.
For the cleanup reason, this also moves kprobe_ftrace check
into skip_singlestep.

Link: http://lkml.kernel.org/r/20120928081520.3560.25624.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:36 -05:00
Masami Hiramatsu 06aeaaeabf ftrace: Move ARCH_SUPPORTS_FTRACE_SAVE_REGS in Kconfig
Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.

Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:35 -05:00
Steven Rostedt 8741db532e tracing/fgraph: Add max_graph_depth to limit function_graph depth
Add the file max_graph_depth to the debug tracing directory that lets
the user define the depth of the function graph.

A very useful operation is to set the depth to 1. Then it traces only
the first function that is called when entering the kernel. This can
be used to determine what system operations interrupt a process.

For example, to work on NOHZ processes (single tasks running without
a timer tick), if any interrupt goes off and preempts that task, this
code will show it happening.

  # cd /sys/kernel/debug/tracing
  # echo 1 > max_graph_depth
  # echo function_graph > current_tracer
  # cat per_cpu/cpu/<cpu-of-process>/trace

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:34 -05:00
Steven Rostedt 0f1ac8fd25 tracing/lockdep: Disable lockdep first in entering NMI
When function tracing with either debug locks enabled or tracing
preempt disabled, the add_preempt_count() is traced. This is an
issue with lockdep and function tracing. As function tracing
can disable interrupts, and lockdep records that change,
lockdep may not be able to handle this recursion if it happens from
an NMI context.

The first thing that an NMI does is:

 #define nmi_enter()					\
	do {							\
		ftrace_nmi_enter();				\
		BUG_ON(in_nmi());				\
		add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET);	\
		lockdep_off();					\
		rcu_nmi_enter();				\
		trace_hardirq_enter();				\
	} while (0)

When the add_preempt_count() is traced, and the tracing callback
disables interrupts, it will jump into the lockdep code. There's
some places in lockdep that can't handle this re-entrance, and
causes lockdep to fail.

As the lockdep_off() (and lockdep_on) is a simple:

void lockdep_off(void)
{
	current->lockdep_recursion++;
}

and is never traced, it can be called first in nmi_enter()
and lockdep_on() last in nmi_exit().

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:34 -05:00
Steven Rostedt 84c6cf0db6 tracing: Remove unneeded check of max_tr->buffer before tracing_reset
There's now a check in tracing_reset_online_cpus() if the buffer is
allocated or NULL. No need to do a check before calling it with max_tr.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:33 -05:00
Hiraku Toyooka a54164114b tracing: Add checks if tr->buffer is NULL in tracing_reset{_online_cpus}
max_tr->buffer could be NULL in the tracing_reset{_online_cpus}. In this
case, a NULL pointer dereference happens, so we should return immediately
from these functions.

Note, the current code does not call tracing_reset*() with max_tr when
its buffer is NULL, but future code will. This patch is needed to prevent
the future code from crashing.

Link: http://lkml.kernel.org/r/20121219070234.31200.93863.stgit@liselsia

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:32 -05:00
Fengguang Wu 6aea49cb5f tracing/syscalls: Make local functions static
Some functions in the syscall tracing is used only locally to
the file, but they are labeled global. Convert them to static functions.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:31 -05:00
Jovi Zhang d24d7dbf3c tracing: Verify target file before registering a uprobe event
Without this patch, we can register a uprobe event for a directory.
Enabling such a uprobe event would anyway fail.

Example:
$ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events

However dirctories cannot be valid targets for uprobe.
Hence verify if the target is a regular file during the probe
registration.

Link: http://lkml.kernel.org/r/20130103004212.690763002@goodmis.org

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[ cleaned up whitespace and removed redundant IS_DIR() check ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:31 -05:00
Shan Wei d8a0349c0c tracing: Use this_cpu_ptr per-cpu helper
typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024].
But, typeof(&buffer[0]) is a pointer to char which match the return type of get_trace_buf().
As well-known, the value of &buffer is equal to &buffer[0].
so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast.

Link: http://lkml.kernel.org/r/50A1A800.3020102@gmail.com

Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 13:22:30 -05:00