Commit graph

2499 commits

Author SHA1 Message Date
Becky Bruce
738ef42e32 powerpc: Change archdata dma_data to a union
Sometimes this is used to hold a simple offset, and sometimes
it is used to hold a pointer.  This patch changes it to a union containing
void * and dma_addr_t.  get/set accessors are also provided, because it was
getting a bit ugly to get to the actual data.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:43 +10:00
Becky Bruce
1cebd7a0f6 powerpc: Rename get_dma_direct_offset get_dma_offset
The former is no longer really accurate with the swiotlb case now
a possibility.  I also move it into dma-mapping.h - it no longer
needs to be in dma.c, and there are about to be some more accessors
that should all end up in the same place.  A comment is added to
indicate that this function is not used in configs where there is no
simple dma offset, such as the iommu case.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:43 +10:00
Huang Weiyi
5c8f382c0b powerpc/book3e-64: Remove duplicated #include
Remove duplicated #include('s) in
  arch/powerpc/kernel/exceptions-64e.S

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:41 +10:00
roel kluin
0f3372741f powerpc: kmalloc failure ignored in vio_build_iommu_table()
Prevent NULL dereference if kmalloc() fails.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:38 +10:00
Linus Torvalds
94a8d5caba Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
  cpumask: Move deprecated functions to end of header.
  cpumask: remove unused deprecated functions, avoid accusations of insanity
  cpumask: use new-style cpumask ops in mm/quicklist.
  cpumask: use mm_cpumask() wrapper: x86
  cpumask: use mm_cpumask() wrapper: um
  cpumask: use mm_cpumask() wrapper: mips
  cpumask: use mm_cpumask() wrapper: mn10300
  cpumask: use mm_cpumask() wrapper: m32r
  cpumask: use mm_cpumask() wrapper: arm
  cpumask: Use accessors for cpu_*_mask: um
  cpumask: Use accessors for cpu_*_mask: powerpc
  cpumask: Use accessors for cpu_*_mask: mips
  cpumask: Use accessors for cpu_*_mask: m32r
  cpumask: remove arch_send_call_function_ipi
  cpumask: arch_send_call_function_ipi_mask: s390
  cpumask: arch_send_call_function_ipi_mask: powerpc
  cpumask: arch_send_call_function_ipi_mask: mips
  cpumask: arch_send_call_function_ipi_mask: m32r
  cpumask: arch_send_call_function_ipi_mask: alpha
  cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
  ...
2009-09-23 18:14:11 -07:00
Alexey Dobriyan
2bcd57ab61 headers: utsname.h redux
* remove asm/atomic.h inclusion from linux/utsname.h --
   not needed after kref conversion
 * remove linux/utsname.h inclusion from files which do not need it

NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 18:13:10 -07:00
Rusty Russell
ea0f1cab6e cpumask: Use accessors for cpu_*_mask: powerpc
Use the accessors rather than frobbing bits directly (the new versions
are const).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2009-09-24 09:34:48 +09:30
Rusty Russell
f063ea02fb cpumask: arch_send_call_function_ipi_mask: powerpc
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask(), and by defining
it, the old arch_send_call_function_ipi is defined by the core code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:34:45 +09:30
Linus Torvalds
c37efa9325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
  Use macros for .data.page_aligned section.
  Use macros for .bss.page_aligned section.
  Use new __init_task_data macro in arch init_task.c files.
  kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
  arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
  kbuild: add static to prototypes
  kbuild: fail build if recordmcount.pl fails
  kbuild: set -fconserve-stack option for gcc 4.5
  kbuild: echo the record_mcount command
  gconfig: disable "typeahead find" search in treeviews
  kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
  checkincludes.pl: add option to remove duplicates in place
  markup_oops: use modinfo to avoid confusion with underscored module names
  checkincludes.pl: provide usage helper
  checkincludes.pl: close file as soon as we're done with it
  ctags: usability fix
  kernel hacking: move STRIP_ASM_SYMS from General
  gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
  kbuild: Check if linker supports the -X option
  kbuild: introduce ld-option
  ...

Fix trivial conflict in scripts/basic/fixdep.c
2009-09-23 15:37:02 -07:00
Linus Torvalds
31bbb9b58d Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  itimers: Add tracepoints for itimer
  hrtimer: Add tracepoint for hrtimers
  timers: Add tracepoints for timer_list timers
  cputime: Optimize jiffies_to_cputime(1)
  itimers: Simplify arm_timer() code a bit
  itimers: Fix periodic tics precision
  itimers: Merge ITIMER_VIRT and ITIMER_PROF

Trivial header file include conflicts in kernel/fork.c
2009-09-23 09:46:15 -07:00
James Morris
88e9d34c72 seq_file: constify seq_operations
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:29 -07:00
Linus Torvalds
7fa07729e4 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_event, powerpc: Fix compilation after big perf_counter rename
2009-09-22 08:11:04 -07:00
Linus Torvalds
342ff1a1b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  trivial: fix typo in aic7xxx comment
  trivial: fix comment typo in drivers/ata/pata_hpt37x.c
  trivial: typo in kernel-parameters.txt
  trivial: fix typo in tracing documentation
  trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
  trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
  trivial: remove unnecessary semicolons
  trivial: Fix duplicated word "options" in comment
  trivial: kbuild: remove extraneous blank line after declaration of usage()
  trivial: improve help text for mm debug config options
  trivial: doc: hpfall: accept disk device to unload as argument
  trivial: doc: hpfall: reduce risk that hpfall can do harm
  trivial: SubmittingPatches: Fix reference to renumbered step
  trivial: fix typos "man[ae]g?ment" -> "management"
  trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
  trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
  trivial: fix missing printk space in amd_k7_smp_check
  trivial: fix typo s/ketymap/keymap/ in comment
  trivial: fix typo "to to" in multiple files
  trivial: fix typos in comments s/DGBU/DBGU/
  ...
2009-09-22 07:51:45 -07:00
Paul Mackerras
a8f90e9067 perf_event, powerpc: Fix compilation after big perf_counter rename
This fixes two places in the powerpc perf_event (perf_counter) code
where 'list_entry' needs to be changed to 'group_entry', but were
missed in commit 65abc865 ("perf_counter: Rename list_entry ->
group_entry, counter_list -> group_list").

This also changes 'event' back to 'counter' in a couple of
contexts:

* Field and function names that deal with the limited-function
  counters: it's really the hardware counters whose function is
  limited, not the events that they count.  Hence:

  MAX_LIMITED_HWEVENTS -> MAX_LIMITED_HWCOUNTERS
  limited_event -> limited_counter
  freeze/thaw_limited_events -> freeze/thaw_limited_counters

* The machine-specific PMU description struct (struct power_pmu): this
  renames 'n_event' back to 'n_counter' since it really describes how
  many hardware counters the machine has.  (Renaming this back avoids
  a compile error in each of the machine-specific PMU back-ends where
  they initialize their power_pmu struct.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19128.4280.813369.589704@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-22 09:30:40 +02:00
Anand Gadiyar
411c940385 trivial: fix typo "for for" in multiple files
trivial: fix typo "for for" in multiple files

Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:54 +02:00
Ingo Molnar
57c0c15b52 perf: Tidy up after the big rename
- provide compatibility Kconfig entry for existing PERF_COUNTERS .config's

 - provide courtesy copy of old perf_counter.h, for user-space projects

 - small indentation fixups

 - fix up MAINTAINERS

 - fix small x86 printout fallout

 - fix up small PowerPC comment fallout (use 'counter' as in register)

Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:34:11 +02:00
Ingo Molnar
cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Ingo Molnar
ae82bfd61c Merge branch 'linus' into perfcounters/rename
Merge reason: pull in all the latest code before doing the rename.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 12:51:42 +02:00
Paul Mackerras
cd74c86bdf perf_counter, powerpc, sparc: Fix compilation after perf_counter_overflow() change
Commit 5622f295 ("x86, perf_counter, bts: Optimize BTS overflow
handling") removed the regs field from struct perf_sample_data and
added a regs parameter to perf_counter_overflow().  This breaks the
build on powerpc (and Sparc) as reported by Sachin Sant:

  arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
  arch/powerpc/kernel/perf_counter.c:1165: error: unknown field 'regs' specified in initializer

This adjusts arch/powerpc/kernel/perf_counter.c to correspond with the
new struct perf_sample_data and perf_counter_overflow().

[ v2: also fix Sparc, Markus Metzger <markus.t.metzger@intel.com> ]

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@ozlabs.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19127.8400.376239.586120@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 09:28:40 +02:00
Tim Abbott
abe1ee3a22 Use macros for .data.page_aligned section.
This patch changes the remaining direct references to
.data.page_aligned in C and assembly code to use the macros in
include/linux/linkage.h.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-09-21 06:27:08 +02:00
Joe Perches
d200c922bc Use new __init_task_data macro in arch init_task.c files.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-09-21 06:27:08 +02:00
Sam Ravnborg
f86fd30660 kbuild: rename ld-option to cc-ldoption
ld-option is misnamed as it test options to gcc, not to ld.
Renamed it to reflect this.

Cc: Andi Kleen <andi@firstfloor.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-09-20 12:27:42 +02:00
Linus Torvalds
a03fdb7612 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
  time: Prevent 32 bit overflow with set_normalized_timespec()
  clocksource: Delay clocksource down rating to late boot
  clocksource: clocksource_select must be called with mutex locked
  clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
  timers: Drop a function prototype
  clocksource: Resolve cpu hotplug dead lock with TSC unstable
  timer.c: Fix S/390 comments
  timekeeping: Fix invalid getboottime() value
  timekeeping: Fix up read_persistent_clock() breakage on sh
  timekeeping: Increase granularity of read_persistent_clock(), build fix
  time: Introduce CLOCK_REALTIME_COARSE
  x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
  clocksource: Avoid clocksource watchdog circular locking dependency
  clocksource: Protect the watchdog rating changes with clocksource_mutex
  clocksource: Call clocksource_change_rating() outside of watchdog_lock
  timekeeping: Introduce read_boot_clock
  timekeeping: Increase granularity of read_persistent_clock()
  timekeeping: Update clocksource with stop_machine
  timekeeping: Add timekeeper read_clock helper functions
  timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
  ...

Fix trivial conflict due to MIPS lemote -> loongson renaming.
2009-09-18 09:15:24 -07:00
Linus Torvalds
4406c56d0a Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
  PCI hotplug: clean up acpi_run_hpp()
  PCI hotplug: acpiphp: use generic pci_configure_slot()
  PCI hotplug: shpchp: use generic pci_configure_slot()
  PCI hotplug: pciehp: use generic pci_configure_slot()
  PCI hotplug: add pci_configure_slot()
  PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
  PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
  PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
  PCI: Clear saved_state after the state has been restored
  PCI PM: Return error codes from pci_pm_resume()
  PCI: use dev_printk in quirk messages
  PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
  PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
  PCI Hotplug: acpiphp: find bridges the easy way
  PCI: pcie portdrv: remove unused variable
  PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
  ACPI PM: Replace wakeup.prepared with reference counter
  PCI PM: Introduce device flag wakeup_prepared
  PCI / ACPI PM: Rework some debug messages
  PCI PM: Simplify PCI wake-up code
  ...

Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases.  The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
2009-09-16 07:49:54 -07:00
Linus Torvalds
723e9db7a4 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
  powerpc/nvram: Enable use Generic NVRAM driver for different size chips
  powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
  powerpc/ps3: Workaround for flash memory I/O error
  powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
  powerpc/perf_counters: Reduce stack usage of power_check_constraints
  powerpc: Fix bug where perf_counters breaks oprofile
  powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
  powerpc/irq: Improve nanodoc
  powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
  powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
  powerpc/book3e: Add missing page sizes
  powerpc/pseries: Fix to handle slb resize across migration
  powerpc/powermac: Thermal control turns system off too eagerly
  powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
  powerpc/405ex: support cuImage via included dtb
  powerpc/405ex: provide necessary fixup function to support cuImage
  powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
  powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
  powerpc/44x: Update Arches defconfig
  powerpc/44x: Update Arches dts
  ...

Fix up conflicts in drivers/char/agp/uninorth-agp.c
2009-09-15 09:51:09 -07:00
Linus Torvalds
ada3fa1505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
  powerpc64: convert to dynamic percpu allocator
  sparc64: use embedding percpu first chunk allocator
  percpu: kill lpage first chunk allocator
  x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
  percpu: update embedding first chunk allocator to handle sparse units
  percpu: use group information to allocate vmap areas sparsely
  vmalloc: implement pcpu_get_vm_areas()
  vmalloc: separate out insert_vmalloc_vm()
  percpu: add chunk->base_addr
  percpu: add pcpu_unit_offsets[]
  percpu: introduce pcpu_alloc_info and pcpu_group_info
  percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
  percpu: add @align to pcpu_fc_alloc_fn_t
  percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
  percpu: drop @static_size from first chunk allocators
  percpu: generalize first chunk allocator selection
  percpu: build first chunk allocators selectively
  percpu: rename 4k first chunk allocator to page
  percpu: improve boot messages
  percpu: fix pcpu_reclaim() locking
  ...

Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15 09:39:44 -07:00
Linus Torvalds
4f0ac85416 Merge branch 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
  perf tools: Avoid unnecessary work in directory lookups
  perf stat: Clean up statistics calculations a bit more
  perf stat: More advanced variance computation
  perf stat: Use stddev_mean in stead of stddev
  perf stat: Remove the limit on repeat
  perf stat: Change noise calculation to use stddev
  x86, perf_counter, bts: Do not allow kernel BTS tracing for now
  x86, perf_counter, bts: Correct pointer-to-u64 casts
  x86, perf_counter, bts: Fail if BTS is not available
  perf_counter: Fix output-sharing error path
  perf trace: Fix read_string()
  perf trace: Print out in nanoseconds
  perf tools: Seek to the end of the header area
  perf trace: Fix parsing of perf.data
  perf trace: Sample timestamps as well
  perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
  perf trace: Sample the CPU too
  perf tools: Work around strict aliasing related warnings
  perf tools: Clean up warnings list in the Makefile
  perf tools: Complete support for dynamic strings
  ...
2009-09-11 13:22:43 -07:00
Linus Torvalds
a66a50054e Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (59 commits)
  x86/gart: Do not select AGP for GART_IOMMU
  x86/amd-iommu: Initialize passthrough mode when requested
  x86/amd-iommu: Don't detach device from pt domain on driver unbind
  x86/amd-iommu: Make sure a device is assigned in passthrough mode
  x86/amd-iommu: Align locking between attach_device and detach_device
  x86/amd-iommu: Fix device table write order
  x86/amd-iommu: Add passthrough mode initialization functions
  x86/amd-iommu: Add core functions for pd allocation/freeing
  x86/dma: Mark iommu_pass_through as __read_mostly
  x86/amd-iommu: Change iommu_map_page to support multiple page sizes
  x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
  x86/amd-iommu: Remove old page table handling macros
  x86/amd-iommu: Use 2-level page tables for dma_ops domains
  x86/amd-iommu: Remove bus_addr check in iommu_map_page
  x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX
  x86/amd-iommu: Change alloc_pte to support 64 bit address space
  x86/amd-iommu: Introduce increase_address_space function
  x86/amd-iommu: Flush domains if address space size was increased
  x86/amd-iommu: Introduce set_dte_entry function
  x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices
  ...
2009-09-11 13:16:37 -07:00
Martyn Welch
d331d8305c powerpc/nvram: Enable use Generic NVRAM driver for different size chips
Remove the reliance on a staticly defined NVRAM size, allowing
platforms to support NVRAMs with sizes differing from the standard.

A fall back value is provided for platforms not supporting this extension.

Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 16:02:11 +10:00
Benjamin Herrenschmidt
c6c9eacef0 powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
Also remove a duplicate setting of it in the context switch path
on BookE.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 11:27:59 +10:00
Paul Mackerras
e51ee31e8a powerpc/perf_counters: Reduce stack usage of power_check_constraints
Michael Ellerman reported stack-frame size warnings being produced
for power_check_constraints(), which uses an 8*8 array of u64 and
two 8*8 arrays of unsigned long, which are currently allocated on the
stack, along with some other smaller variables.  These arrays come
to 1.5kB on 64-bit or 1kB on 32-bit, which is a bit too much for the
stack.

This fixes the problem by putting these arrays in the existing
per-cpu cpu_hw_counters struct.  This is OK because two of the call
sites have interrupts disabled already; for the third call site we
use get_cpu_var, which disables preemption, so we know we won't
get a context switch while we're in power_check_constraints().
Note that power_check_constraints() can be called during context
switch but is not called from interrupts.

Reported-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 11:27:59 +10:00
Paul Mackerras
a6dbf93a2a powerpc: Fix bug where perf_counters breaks oprofile
Currently there is a bug where if you use oprofile on a pSeries
machine, then use perf_counters, then use oprofile again, oprofile
will not work correctly; it will lose the PMU configuration the next
time the hypervisor does a partition context switch, and thereafter
won't count anything.

Maynard Johnson identified the sequence causing the problem:
- oprofile setup calls ppc_enable_pmcs(), which calls
  pseries_lpar_enable_pmcs, which tells the hypervisor that we want
  to use the PMU, and sets the "PMU in use" flag in the lppaca.
  This flag tells the hypervisor whether it needs to save and restore
  the PMU config.
- The perf_counter code sets and clears the "PMU in use" flag directly
  as it context-switches the PMU between tasks, and leaves it clear
  when it finishes.
- oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
  which does nothing because it has already been called.  In particular
  it doesn't set the "PMU in use" flag.

This fixes the problem by arranging for ppc_enable_pmcs to always set
the "PMU in use" flag.  It makes the perf_counter code call
ppc_enable_pmcs also rather than calling the lower-level function
directly, and removes the setting of the "PMU in use" flag from
pseries_lpar_enable_pmcs, since that is now done in its caller.

This also removes the declaration of pasemi_enable_pmcs because it
isn't defined anywhere.

Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 11:27:58 +10:00
Kumar Gala
757cbd46d1 powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
The following commit introduced a compile error since it removed
the implementation of smp_85xx_basic_setup:

commit 77c0a700c1
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Fri Aug 28 14:25:04 2009 +1000

    powerpc: Properly start decrementer on BookE secondary CPUs

Make it so that smp_ops probe() and setup_cpu() can be set to NULL.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 11:27:57 +10:00
Mike Mason
6e19314cc9 PCI/powerpc: support PCIe fundamental reset
By default, the EEH framework on powerpc does what's known as a "hot
reset" during recovery of a PCI Express device.  We've found a case
where the device needs a "fundamental reset" to recover properly.  The
current PCI error recovery and EEH frameworks do not support this
distinction.

The attached patch makes changes to EEH to utilize the new bit field.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Richard Lary <rlary@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:41 -07:00
Ingo Molnar
695a461296 Merge branch 'amd-iommu/2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2009-09-04 14:44:16 +02:00
Paul Mackerras
a3df6f7d30 perf_counter/powerpc: Fix cache event codes for POWER7
I had the codes for L1 D-cache load accesses and misses swapped
around, and the wrong codes for LL-cache accesses and misses.
This corrects them.

Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <19103.8514.709300.585484@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-03 08:41:53 +02:00
Kumar Gala
76acc2c1a7 powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
Switch to using the Power ISA defined PTE format when we have a 64-bit
PTE.  This makes the code handling between fsl-booke and book3e-64
similiar for TLB faults.

Additionally this lets use take advantage of the page size encodings and
full permissions that the HW PTE defines.

Also defined _PMD_PRESENT, _PMD_PRESENT_MASK, and _PMD_BAD since the
32-bit ppc arch code expects them.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 16:20:41 +10:00
Brian King
46db2f86a3 powerpc/pseries: Fix to handle slb resize across migration
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.

BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
      breaking ppc32 build

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 16:19:01 +10:00
Grant Likely
0ed2c722c6 powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
The two versions are doing almost exactly the same thing.  No need to
maintain them as separate files.  This patch also has the side effect
of making the PCI device tree scanning code available to 32 bit powerpc
machines, but no board ports actually make use of this feature at this
point.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 15:45:53 +10:00
Thomas Gleixner
f71bb0ac5e Merge branch 'timers/posixtimers' into timers/tracing
Merge reason: timer tracepoint patches depend on both branches

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-29 10:34:29 +02:00
Benjamin Herrenschmidt
77c0a700c1 powerpc: Properly start decrementer on BookE secondary CPUs
This moves the code to start the decrementer on 40x and BookE into
a separate function which is now called from time_init() and
secondary_time_init(), before the respective clock sources are
registered. We also remove the 85xx specific code for doing it
from the platform code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:25:04 +10:00
Kumar Gala
89c2dd62a3 powerpc/pci: Pull ppc32 PCI features into common
Some of the PCI features we have in ppc32 we will need on ppc64
platforms in the future.  These include support for:

* ppc_md.pci_exclude_device
* indirect config cycles
* early config cycles

We also simplified the logic in fake_pci_bus() to assume it will always
get a valid pci_controller.  Since all current callers seem to pass it
one.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:15 +10:00
Grant Likely
fbe6544719 powerpc/pci: move pci_64.c device tree scanning code into pci-common.c
The PCI device tree scanning code in pci_64.c is some useful functionality.
It allows PCI devices to be described in the device tree instead of being
probed for, which in turn allows pci devices to use all of the device tree
facilities to describe complex PCI bus architectures like GPIO and IRQ
routing (perhaps not a common situation for desktop or server systems,
but useful for embedded systems with on-board PCI devices).

This patch moves the device tree scanning into pci-common.c so it is
available for 32-bit powerpc machines too.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:15 +10:00
Grant Likely
ae14e13a4c powerpc/pci: Remove dead checks for CONFIG_PPC_OF
PPC_OF is always selected for arch/powerpc.  This patch removes the stale
#defines

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:14 +10:00
Kumar Gala
bb1af71ecb powerpc/book3e-64: Add support to initial_tlb_book3e for non-HES TLB
We now search through TLBnCFG looking for the first array that has IPROT
support (we assume that there is only one).  If that TLB has hardware
entry select (HES) support we use the existing code and with the proper
TLB select (the HES code still needs to clean up bolted entries from
firmware).  The non-HES code is pretty similiar to the 32-bit FSL Book-E
code but does make some new assumtions (like that we have tlbilx) and
simplifies things down a bit.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:14 +10:00
Kumar Gala
4b98d9e713 powerpc/book3e-64: Add helper function to setup IVORs
Not all 64-bit Book-3E parts will have fixed IVORs so add a function that
cpusetup code can call to setup the base IVORs (0..15) to match the fixed
offsets.  We need to 'or' part of interrupt_base_book3e into the IVORs
since on parts that have them the IVPR doesn't extend as far down.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:13 +10:00
Kumar Gala
6c188829d2 powerpc/book3e-64: Wait til generic_calibrate_decr to enable decrementer
Match what we do on 32-bit Book-E processors and enable the decrementer
in generic_calibrate_decr.  We need to make sure we disable the
decrementer early in boot since we currently use lazy (soft) interrupt
on 64-bit Book-E and possible get a decrementer exception before we
are ready for it.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:13 +10:00
Kumar Gala
f45c4486f7 powerpc/book3e-64: Move the default cpu table entry
Move the default cpu entry table for CONFIG_PPC_BOOK3E_64 to the
very end since we will probably want to support both 32-bit and
64-bit kernels for some processors that are higher up in the list.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:13 +10:00
FUJITA Tomonori
80d3e8abb7 powerpc: Add CONFIG_DMA_API_DEBUG support
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:11 +10:00
FUJITA Tomonori
4a9a6bfe70 powerpc: Handle SWIOTLB mapping error properly
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:11 +10:00
FUJITA Tomonori
45223c5492 powerpc: use dma_map_ops struct
This converts uses dma_map_ops struct (in include/linux/dma-mapping.h)
instead of POWERPC homegrown dma_mapping_ops.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:10 +10:00
FUJITA Tomonori
3702977fa7 powerpc: Remove swiotlb_pci_dma_ops
Now swiotlb_pci_dma_ops is identical to swiotlb_dma_ops; we can use
swiotlb_dma_ops with any devices. This removes swiotlb_pci_dma_ops.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:09 +10:00
FUJITA Tomonori
762afb7317 powerpc: Remove addr_needs_map in struct dma_mapping_ops
This patch adds max_direct_dma_addr to struct dev_archdata to remove
addr_needs_map in struct dma_mapping_ops. It also converts
dma_capable() to use max_direct_dma_addr.

max_direct_dma_addr is initialized in pci_dma_dev_setup_swiotlb(),
called via ppc_md.pci_dma_dev_setup hook.

For further information:
http://marc.info/?t=124719060200001&r=1&w=2

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28 14:24:09 +10:00
Benjamin Herrenschmidt
2864697cef Merge commit 'tip/iommu-for-powerpc' into next 2009-08-28 14:23:06 +10:00
Gautham R Shenoy
6776426320 powerpc/pseries: Reduce the polling interval in __cpu_up()
Time time taken for a single cpu online operation on a pseries machine
is as follows:
Dedicated LPAR (POWER6): ~220ms.
Shared LPAR (POWER5)   : ~240ms.

Of this time, approximately 200ms is taken up by __cpu_up(). This is because
we poll every 200ms to check if the new cpu has notified it's presence
through the cpu_callin_map. We repeat this operation until the new cpu sets
the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.

However, using completion_structs instead of polling loops,
the time taken by the new processor to indicate it's presence has
found to be less than 1ms on pseries. This method however may not
work on all powerpc platforms due to the time-base synchronization code.

Keeping this in mind, we could reduce msleep polling interval from
200ms to 1ms while retaining the 5 second timeout.

With this, the time taken for a cpu online operation changes as follows:
Dedicated LPAR (POWER6): 20-25ms.
Shared LPAR (POWER5)   : 60-80ms.

In both these cases, it was found that the code polls through the loop
only once indicating that 1ms is a reasonable value, atleast on pseries.

The code needs testing on other powerpc platforms.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-27 13:12:54 +10:00
Josh Boyer
14d757520a powerpc: Fix __flush_icache_range on 44x
The ptrace POKETEXT interface allows a process to modify the text pages of
a child process being ptraced, usually to insert breakpoints via trap
instructions.  The kernel eventually calls copy_to_user_page, which in turn
calls __flush_icache_range to invalidate the icache lines for the child
process.

However, this function does not work on 44x due to the icache being virtually
indexed.  This was noticed by a breakpoint being triggered after it had been
cleared by ltrace on a 440EPx board.  The convenient solution is to do a
flash invalidate of the icache in the __flush_icache_range function.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-27 13:12:52 +10:00
Benjamin Herrenschmidt
ea3cc330ac powerpc/mm: Cleanup handling of execute permission
This is an attempt at cleaning up a bit the way we handle execute
permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only
defined by CPUs that can do something with it, and the myriad of
#ifdef's in the I$/D$ coherency code is reduced to 2 cases that
hopefully should cover everything.

The logic on BookE is a little bit different than what it was though
not by much. Since now, _PAGE_EXEC will be set by the generic code
for executable pages, we need to filter out if they are unclean and
recover it. However, I don't expect the code to be more bloated than
it already was in that area due to that change.

I could boast that this brings proper enforcing of per-page execute
permissions to all BookE and 40x but in fact, we've had that now for
some time as a side effect of my previous rework in that area (and
I didn't even know it :-) We would only enable execute permission if
the page was cache clean and we would only cache clean it if we took
and exec fault. Since we now enforce that the later only work if
VM_EXEC is part of the VMA flags, we de-fact already enforce per-page
execute permissions... Unless I missed something

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-27 13:12:51 +10:00
Martin Schwidefsky
d90246cd8e timekeeping: Increase granularity of read_persistent_clock(), build fix
Fix the following build problem on powerpc:

  arch/powerpc/kernel/time.c: In function 'read_persistent_clock':
  arch/powerpc/kernel/time.c:788: error: 'return' with a value, in function returning void
  arch/powerpc/kernel/time.c:791: error: 'return' with a value, in function returning void

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: dwalker@fifo99.com
Cc: johnstul@us.ibm.com
LKML-Reference: <20090822222313.74b9619c@skybase>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 10:49:48 +02:00
Benjamin Herrenschmidt
4f0dbc2781 Merge commit 'paulus-perf/master' into next 2009-08-20 11:07:56 +10:00
Michael Ellerman
903444e429 powerpc/vmlinux.lds: Move _edata down
Currently _edata does not include several data sections, this causes
the kernel's report of memory usage at boot to not match reality, and
also prevents kmemleak from working - because it scan between _sdata
and _edata for pointers to allocated memory.

This mirrors a similar change made recently to the x86 linker script.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:29:29 +10:00
Michael Ellerman
a15098c90d powerpc: Enable GCOV
Make it possible to enable GCOV code coverage measurement on powerpc.

Lightly tested on 64-bit, seems to work as expected.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:29:28 +10:00
Julia Lawall
14ea58ad79 powerpc: Use DIV_ROUND_CLOSEST in time init code
The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d
but is perhaps more readable.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@haskernel@
@@

#include <linux/kernel.h>

@depends on haskernel@
expression x,__divisor;
@@

- (((x) + ((__divisor) / 2)) / (__divisor))
+ DIV_ROUND_CLOSEST(x,__divisor)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:29:26 +10:00
Benjamin Krill
cf68787b68 powerpc/prom_init: Evaluate mem kernel parameter for early allocation
Evaluate mem kernel parameter for early memory allocations. If mem is set
no allocation in the region above the given boundary is allowed. The current
code doesn't take care about this and allocate memory above the given mem
boundary.

Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:29:25 +10:00
Stefan Roese
20d70345f1 powerpc: Add AMCC 460EX/460GT Rev. B support to cputable.c
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:18 +10:00
Benjamin Herrenschmidt
2d27cfd328 powerpc: Remaining 64-bit Book3E support
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:11 +10:00
Benjamin Herrenschmidt
25d21ad6e7 powerpc: Add TLB management code for 64-bit Book3E
This adds the TLB miss handler assembly, the low level TLB flush routines
along with the necessary hook for dealing with our virtual page tables
or indirect TLB entries that need to be flushes when PTE pages are freed.

There is currently no support for hugetlbfs

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt
dce6670aaa powerpc: Add PACA fields specific to 64-bit Book3E processors
This adds various fields in the PACA that are for use specifically
by Book3E processors, such as exception save areas, current pgd
pointer, special exceptions kernel stacks etc...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:08 +10:00
Benjamin Herrenschmidt
57e2a99f74 powerpc: Add memory management headers for new 64-bit BookE
This adds the PTE and pgtable format definitions, along with changes
to the kernel memory map and other definitions related to implementing
support for 64-bit Book3E. This also shields some asm-offset bits that
are currently only relevant on 32-bit

We also move the definition of the "linux" page size constants to
the common mmu.h file and add a few sizes that are relevant to
embedded processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:06 +10:00
Benjamin Herrenschmidt
cf54dc7cd4 powerpc: Move definitions of secondary CPU spinloop to header file
Those definitions are currently declared extern in the .c file where
they are used, move them to a header file instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:44 +10:00
Benjamin Herrenschmidt
747bea91b7 powerpc: Clean ifdef usage in copy_thread()
Currently, a single ifdef covers SLB related bits and more generic ppc64
related bits, split this in two separate ifdef's since 64-bit BookE will
need one but not the other.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:43 +10:00
Benjamin Herrenschmidt
6f0ef0f505 powerpc/mm: Call mmu_context_init() from ppc64
Our 64-bit hash context handling has no init function, but 64-bit Book3E
will use the common mmu_context_nohash.c code which does, so define an
empty inline mmu_context_init() for 64-bit server and call it from
our 64-bit setup_arch()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
2009-08-20 10:12:42 +10:00
Benjamin Herrenschmidt
6c1719942e powerpc/of: Remove useless register save/restore when calling OF back
enter_prom() used to save and restore registers such as CTR, XER etc..
which are volatile, or SRR0,1... which we don't care about. This
removes a bunch of useless code and while at it turns an mtmsrd into
an MTMSRD macro which will be useful to Book3E.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:36 +10:00
Benjamin Herrenschmidt
dd90bbd5fb powerpc: Add compat_sys_truncate
The truncate syscall has a signed long parameter, so when using a 32-
bit userspace with a 64-bit kernel the argument is zero-extended
instead of sign-extended. Adding the compat_sys_truncate function
fixes the issue.

This was noticed during an LSB truncate test failure. The test was
checking for the correct error number set when truncate is called with
a length of -1. The test can be found at:

http://bzr.linuxfoundation.org/lsb/devel/runtime-test?cmd=inventory;rev=stewb%40linux-foundation.org-20090626205411-sfb23cc0tjj7jzgm;path=modules/vsx-pcts/tset/POSIX.os/files/truncate/

BenH: Added compat_sys_ftruncate() as well, same issue.

Signed-off-by: Chase Douglas <cndougla@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:34 +10:00
Benjamin Herrenschmidt
c5a8c0c99f powerpc: Remove use of a second scratch SPRG in STAB code
The STAB code used on Power3 and RS/64 uses a second scratch SPRG to
save a GPR in order to decide whether to go to do_stab_bolted_* or
to handle a normal data access exception.

This prevents our scheme of freeing SPRG3 which is user visible for
user uses since we cannot use SPRG0 which, on RS/64, seems to be
read-only for supervisor mode (like POWER4).

This reworks the STAB exception entry to use the PACA as temporary
storage instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:28 +10:00
Benjamin Herrenschmidt
ee43eb788b powerpc: Use names rather than numbers for SPRGs (v2)
The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.

We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.

This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.

The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:27 +10:00
Benjamin Herrenschmidt
8aa34ab8b2 powerpc: Rename exception.h to exception-64s.h
The file include/asm/exception.h contains definitions
that are specific to exception handling on 64-bit server
type processors.

This renames the file to exception-64s.h to reflect that
fact and avoid confusion.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:26 +10:00
Anton Blanchard
30d0b36828 powerpc: Move 64bit VDSO to improve context switch performance
On 64bit applications the VDSO is the only thing in segment 0. Since the VDSO
is position independent we can remove the hint and let get_unmapped_area pick
an area. This will mean the vdso will be near other mmaps and will share
an SLB entry:

10000000-10001000 r-xp 00000000 08:06 5778459        /root/context_switch_64
10010000-10011000 r--p 00000000 08:06 5778459        /root/context_switch_64
10011000-10012000 rw-p 00001000 08:06 5778459        /root/context_switch_64
fffa92ae000-fffa92b0000 rw-p 00000000 00:00 0
fffa92b0000-fffa9453000 r-xp 00000000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa9453000-fffa9462000 ---p 001a3000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa9462000-fffa9466000 r--p 001a2000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa9466000-fffa947c000 rw-p 001a6000 08:06 4334051  /lib64/power6/libc-2.9.so
fffa947c000-fffa9480000 rw-p 00000000 00:00 0
fffa9480000-fffa94a8000 r-xp 00000000 08:06 4333852  /lib64/ld-2.9.so
fffa94b3000-fffa94b4000 rw-p 00000000 00:00 0

fffa94b4000-fffa94b7000 r-xp 00000000 00:00 0        [vdso] <----- here I am

fffa94b7000-fffa94b8000 r--p 00027000 08:06 4333852  /lib64/ld-2.9.so
fffa94b8000-fffa94bb000 rw-p 00028000 08:06 4333852  /lib64/ld-2.9.so
fffa94bb000-fffa94bc000 rw-p 00000000 00:00 0
fffe4c10000-fffe4c25000 rw-p 00000000 00:00 0        [stack]

On a microbenchmark that bounces a token between two 64bit processes over pipes
and calls gettimeofday each iteration (to access the VDSO), our context switch
rate goes from 268k to 277k ctx switches/sec (tested on a 4GHz POWER6).

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:24 +10:00
Paul Mackerras
20002ded4d perf_counter: powerpc: Add callchain support
This adds support for tracing callchains for powerpc, both 32-bit
and 64-bit, and both in the kernel and userspace, from PMU interrupt
context.

The first three entries stored for each callchain are the NIP (next
instruction pointer), LR (link register), and the contents of the LR
save area in the second stack frame (the first is ignored because the
ABI convention on powerpc is that functions save their return address
in their caller's stack frame).  Because leaf functions don't have to
save their return address (LR value) and don't have to establish a
stack frame, it's possible for either or both of LR and the second
stack frame's LR save area to have valid return addresses in them.
This is basically impossible to disambiguate without either reading
the code or looking at auxiliary information such as CFI tables.
Since we don't want to do either of those things at interrupt time,
we store both LR and the second stack frame's LR save area.

Once we get past the second stack frame, there is no ambiguity; all
return addresses we get are reliable.

For kernel traces, we check whether they are valid kernel instruction
addresses and store zero instead if they are not (rather than
omitting them, which would make it impossible for userspace to know
which was which).  We also store zero instead of the second stack
frame's LR save area value if it is the same as LR.

For kernel traces, we check for interrupt frames, and for user traces,
we check for signal frames.  In each case, since we're starting a new
trace, we store a PERF_CONTEXT_KERNEL/USER marker so that userspace
knows that the next three entries are NIP, LR and the second stack frame
for the interrupted context.

We read user memory with __get_user_inatomic.  On 64-bit, if this
PMU interrupt occurred while interrupts are soft-disabled, and
there is no MMU hash table entry for the page, we will get an
-EFAULT return from __get_user_inatomic even if there is a valid
Linux PTE for the page, since hash_page isn't reentrant.  Thus we
have code here to read the Linux PTE and access the page via the
kernel linear mapping.  Since 64-bit doesn't use (or need) highmem
there is no need to do kmap_atomic.  On 32-bit, we don't do soft
interrupt disabling, so this complication doesn't occur and there
is no need to fall back to reading the Linux PTE, since hash_page
(or the TLB miss handler) will get called automatically if necessary.

Note that we cannot get PMU interrupts in the interval during
context switch between switch_mm (which switches the user address
space) and switch_to (which actually changes current to the new
process).  On 64-bit this is because interrupts are hard-disabled
in switch_mm and stay hard-disabled until they are soft-enabled
later, after switch_to has returned.  So there is no possibility
of trying to do a user stack trace when the user address space is
not current's address space.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-18 14:48:47 +10:00
Paul Mackerras
9c1e105238 powerpc: Allow perf_counters to access user memory at interrupt time
This provides a mechanism to allow the perf_counters code to access
user memory in a PMU interrupt routine.  Such an access can cause
various kinds of interrupt: SLB miss, MMU hash table miss, segment
table miss, or TLB miss, depending on the processor.  This commit
only deals with 64-bit classic/server processors, which use an MMU
hash table.  32-bit processors are already able to access user memory
at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
the possibility of reentering hash_page or the TLB miss handlers,
since they run with interrupts disabled.

On 64-bit processors, an SLB miss interrupt on a user address will
update the slb_cache and slb_cache_ptr fields in the paca.  This is
OK except in the case where a PMU interrupt occurs in switch_slb,
which also accesses those fields.  To prevent this, we hard-disable
interrupts in switch_slb.  Interrupts are already soft-disabled at
this point, and will get hard-enabled when they get soft-enabled
later.

This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
and to make sure that it clears the slb_cache_ptr when called from
other callers than switch_slb, the existing routine is renamed to
__slb_flush_and_rebolt, which is called by switch_slb and the new
version of slb_flush_and_rebolt.

Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
hard_irq_disable() to protect the per-cpu variables used there and
in ste_allocate.

If a MMU hashtable miss interrupt occurs, normally we would call
hash_page to look up the Linux PTE for the address and create a HPTE.
However, hash_page is fairly complex and takes some locks, so to
avoid the possibility of deadlock, we check the preemption count
to see if we are in a (pseudo-)NMI handler, and if so, we don't call
hash_page but instead treat it like a bad access that will get
reported up through the exception table mechanism.  An interrupt
whose handler runs even though the interrupt occurred when
soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
handler, which should use nmi_enter()/nmi_exit() rather than
irq_enter()/irq_exit().

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-18 14:48:43 +10:00
Martin Schwidefsky
d4f587c67f timekeeping: Increase granularity of read_persistent_clock()
The persistent clock of some architectures (e.g. s390) have a
better granularity than seconds. To reduce the delta between the
host clock and the guest clock in a virtualized system change the 
read_persistent_clock function to return a struct timespec.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Daniel Walker <dwalker@fifo99.com>
LKML-Reference: <20090814134811.013873340@de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-15 10:55:46 +02:00
Tejun Heo
c2a7e81801 powerpc64: convert to dynamic percpu allocator
Now that percpu allows arbitrary embedding of the first chunk,
powerpc64 can easily be converted to dynamic percpu allocator.
Convert it.  powerpc supports several large page sizes.  Cap atom_size
at 1M.  There isn't much to gain by going above that anyway.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-14 15:00:53 +09:00
Tejun Heo
384be2b18a Merge branch 'percpu-for-linus' into percpu-for-next
Conflicts:
	arch/sparc/kernel/smp_64.c
	arch/x86/kernel/cpu/perf_counter.c
	arch/x86/kernel/setup_percpu.c
	drivers/cpufreq/cpufreq_ondemand.c
	mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids.  As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 14:45:31 +09:00
Linus Torvalds
d00aa6695b Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
  perf_counter: Zero dead bytes from ftrace raw samples size alignment
  perf_counter: Subtract the buffer size field from the event record size
  perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
  perf_counter: Correct PERF_SAMPLE_RAW output
  perf tools: callchain: Fix bad rounding of minimum rate
  perf_counter tools: Fix libbfd detection for systems with libz dependency
  perf: "Longum est iter per praecepta, breve et efficax per exempla"
  perf_counter: Fix a race on perf_counter_ctx
  perf_counter: Fix tracepoint sampling to be part of generic sampling
  perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
  perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
  perf tools: callchain: Fix 'perf report' display to be callchain by default
  perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
  perf record: Fix the -A UI for empty or non-existent perf.data
  perf util: Fix do_read() to fail on EOF instead of busy-looping
  perf list: Fix the output to not include tracepoints without an id
  perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
  perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
  perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
  perf report: Fix per task mult-counter stat reporting
  ...
2009-08-10 11:48:51 -07:00
Benjamin Herrenschmidt
b2f2e8fee3 powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
	https://bugzilla.redhat.com/show_bug.cgi?id=514787

We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-10 16:36:38 +10:00
Paul Mackerras
f36a1a133a perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.

This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:54:37 +02:00
Benjamin Herrenschmidt
e0d82a0a4e perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it
If the current CPU doesn't support performance counters,
cur_cpu_spec->oprofile_cpu_type can be NULL. The current
perf_counter modules don't test for that case and would thus
crash at boot time.

Bug reported by David Woodhouse.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
LKML-Reference: <19066.48028.446975.501454@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 13:55:09 +02:00
Stanislaw Gruszka
a42548a188 cputime: Optimize jiffies_to_cputime(1)
For powerpc with CONFIG_VIRT_CPU_ACCOUNTING
jiffies_to_cputime(1) is not compile time constant and run time
calculations are quite expensive. To optimize we use
precomputed value. For all other architectures is is
preprocessor definition.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <1248862529-6063-5-git-send-email-sgruszka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-03 14:48:36 +02:00
FUJITA Tomonori
8ab7ff42c9 powerpc: remove unused swiotlb_phys_to_bus() and swiotlb_bus_to_phys()
phys_to_dma() and dma_to_phys() are used instead of
swiotlb_phys_to_bus() and swiotlb_bus_to_phys().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:20 +09:00
FUJITA Tomonori
30945fdf6a powerpc: remove unncesary swiotlb_arch_address_needs_mapping
swiotlb doesn't use swiotlb_arch_address_needs_mapping(); it uses
dma_capalbe(). We can remove unnecessary
swiotlb_arch_address_needs_mapping().

We can remove swiotlb_addr_needs_map() and is_buffer_dma_capable() in
swiotlb_pci_addr_needs_map() too; dma_capable() handles the features
that both provide.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:19 +09:00
FUJITA Tomonori
02ca646e73 swiotlb: remove unnecessary swiotlb_bus_to_virt
swiotlb_bus_to_virt is unncessary; we can use swiotlb_bus_to_phys and
phys_to_virt instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:18 +09:00
Andreas Schwab
0115cb544b powerpc: Fix another bug in move of altivec code to vector.S
When moving load_up_altivec to vector.S a typo in a comment caused a
thinko setting the wrong variable.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-15 17:41:46 +10:00
Dave Kleikamp
28477fb1ed powerpc: Fix booke user_disable_single_step()
On booke processors, gdb is seeing spurious SIGTRAPs when setting a
watchpoint.

user_disable_single_step() simply quits when the DAC is non-zero.  It should
be clearing the DBCR0_IC and DBCR0_BT bits from the dbcr0 register and
TIF_SINGLESTEP from the thread flag.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-15 17:41:45 +10:00
Alexey Dobriyan
405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
Linus Torvalds
85be928c41 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
  perf report: Add "Fractal" mode output - support callchains with relative overhead rate
  perf_counter tools: callchains: Manage the cumul hits on the fly
  perf report: Change default callchain parameters
  perf report: Use a modifiable string for default callchain options
  perf report: Warn on callchain output request from non-callchain file
  x86: atomic64: Inline atomic64_read() again
  x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
  x86: atomic64: Improve atomic64_xchg()
  x86: atomic64: Export APIs to modules
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
  x86: atomic64: Fix unclean type use in atomic64_xchg()
  x86: atomic64: Make atomic_read() type-safe
  x86: atomic64: Reduce size of functions
  x86: atomic64: Improve atomic64_add_return()
  x86: atomic64: Improve cmpxchg8b()
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
  x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
  perf report: Annotate variable initialization
  ...
2009-07-10 14:25:03 -07:00
Tejun Heo
023bf6f1b8 linker script: unify usage of discard definition
Discarded sections in different archs share some commonality but have
considerable differences.  This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.

This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro.  As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.

ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.

defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390.  Michal Simek tested microblaze.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: linux-arch@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>
2009-07-09 11:27:40 +09:00
Huang Weiyi
3665ee36fa powerpc/perf_counter: Remove duplicated #include
Remove duplicated #include('s) in
  arch/powerpc/kernel/mpc7450-pmu.c
  arch/powerpc/kernel/ppc970-pmu.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:22 +10:00
Tejun Heo
c43768cbb7 Merge branch 'master' into for-next
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes.  As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.

Conflicts:
	arch/alpha/include/asm/percpu.h
	arch/mn10300/kernel/vmlinux.lds.S
	include/linux/percpu-defs.h
2009-07-04 07:13:18 +09:00
Anton Blanchard
0a456fc58f powerpc/perf_counter: Enable alternate PR/HV bits for POWER7
POWER7 has the same PR/HV bit layout as POWER6, so set the flag.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: a.p.zijlstra@chello.nl
Cc: benh@kernel.crashing.org
LKML-Reference: <20090701030701.GI3563@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 10:20:28 +02:00
Benjamin Herrenschmidt
f694cda892 powerpc/440: Fix warning early debug code
The function udbg_44x_as1_flush() has the wrong prototype causing
a warning when enabling 440 early debug.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 16:55:35 +10:00
Benjamin Herrenschmidt
03c01aa740 powerpc/of: Fix usage of dev_set_name() in of_device_alloc()
dev_set_name() takes a format string, so use it properly and avoid
a warning with recent gcc's

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 16:55:35 +10:00
Benjamin Herrenschmidt
c4007a2fbf powerpc: Use one common impl. of RTAS timebase sync and use raw spinlock
Several platforms use their own copy of what is essentially the same code,
using RTAS to synchronize the timebases when bringing up new CPUs. This
moves it all into a single common implementation and additionally
turns the spinlock into a raw spinlock since the former can rely on
the timebase not being frozen when spinlock debugging is enabled, and finally
masks interrupts while the timebase is disabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 16:55:25 +10:00
Benjamin Herrenschmidt
f97bb36f70 powerpc/rtas: Turn rtas lock into a raw spinlock
RTAS currently uses a normal spinlock. However it can be called from
contexts where this is not necessarily a good idea. For example, it
can be called while syncing timebases, with the core timebase being
frozen. Unfortunately, that will deadlock in case of lock contention
when spinlock debugging is enabled as the spin lock debugging code
will try to use __delay() which ... relies on the timebase being
enabled.

Also RTAS can be used in some low level IRQ handling code path so it
may as well be a raw spinlock for -rt sake.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 14:37:27 +10:00
Benjamin Herrenschmidt
5d38902c48 powerpc: Add irqtrace support for 32-bit powerpc
Based on initial work from: Dale Farnsworth <dale@farnsworth.org>

Add the low level irq tracing hooks for 32-bit powerpc needed
to enable full lockdep functionality.

The approach taken to deal with the code in entry_32.S is that
we don't trace all the transitions of MSR:EE when we just turn
it off to peek at TI_FLAGS without races. Only when we are
calling into C code or returning from exceptions with a state
that have changed from what lockdep thinks.

There's a little bugger though: If we take an exception that
keeps interrupts enabled (such as an alignment exception) while
interrupts are enabled, we will call trace_hardirqs_on() on the
way back spurriously. Not a big deal, but to get rid of it would
require remembering in pt_regs that the exception was one of the
type that kept interrupts enabled which we don't know at this
stage. (Well, we could test all cases for regs->trap but that
sucks too much).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Kumar Gala <galak@kernel.crashing.org>
2009-06-26 14:37:27 +10:00
Benjamin Herrenschmidt
4a5cbf17c4 powerpc: Map more memory early on 601 processors
The 32-bit kernel relies on some memory being mapped covering
the kernel text,data and bss at least, early during boot before
the full MMU setup is done. On 32-bit "classic" processors, this
is done using BAT registers.

On 601, the size of BATs is limited to 8M and we use 2 of them
for that initial mapping. This can become quite tight when enabling
features like lockdep, so let's use a 3rd one to bump that mapping
from 16M to 24M. We keep the 4th BAT free as it can be useful for
debugging early boot code to map things like serial ports.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 14:37:25 +10:00
Kumar Gala
a236719418 powerpc: Fix output from show_regs
For some reason we've had an explicit KERN_INFO for GPR dumps.  With
recent changes we get output like:

<6>GPR00: 00000000 ef855eb0 ef858000 00000001 000000d0 f1000000 ffbc8000 ffffffff

The KERN_INFO is causing the <6>.  Don't see any reason to keep it
around.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 14:37:24 +10:00
Benjamin Herrenschmidt
7ccbe504b5 powerpc/pmac: Fix issues with PowerMac "PowerSurge" SMP
The old PowerSurge SMP (ie, dual or quad 604 machines) code has
numerous issues in modern world.

One is cpu_possible_map is set too late (the device-tree is bogus)
so we fail to allocate the interrupt stacks and crash. Another
problem is the fact the timebase is frozen by the bringup of the
second CPU so the delays in the generic code will hang, we need
to move some of the calling procedure to inside the powermac code.

This makes it boot again for me

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-26 14:37:24 +10:00
Tejun Heo
405d967dc7 linker script: throw away .discard section
x86 throws away .discard section but no other archs do.  Also,
.discard is not thrown away while linking modules.  Make every arch
and module linking throw it away.  This will be used to define dummy
variables for percpu declarations and definitions.

This patch is based on Ivan Kokshaysky's alpha percpu patch.

[ Impact: always throw away everything in .discard ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-24 15:13:38 +09:00
Linus Torvalds
12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds
b0b7065b64 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  tracing/urgent: warn in case of ftrace_start_up inbalance
  tracing/urgent: fix unbalanced ftrace_start_up
  function-graph: add stack frame test
  function-graph: disable when both x86_32 and optimize for size are configured
  ring-buffer: have benchmark test print to trace buffer
  ring-buffer: do not grab locks in nmi
  ring-buffer: add locks around rb_per_cpu_empty
  ring-buffer: check for less than two in size allocation
  ring-buffer: remove useless compile check for buffer_page size
  ring-buffer: remove useless warn on check
  ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
  tracing: update sample event documentation
  tracing/filters: fix race between filter setting and module unload
  tracing/filters: free filter_string in destroy_preds()
  ring-buffer: use commit counters for commit pointer accounting
  ring-buffer: remove unused variable
  ring-buffer: have benchmark test handle discarded events
  ring-buffer: prevent adding write in discarded area
  tracing/filters: strloc should be unsigned short
  tracing/filters: operand can be negative
  ...

Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20 10:56:46 -07:00
Linus Torvalds
773d7a09e1 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (35 commits)
  powerpc/5121: make clock debug output more readable
  powerpc/5xxx: Add common mpc5xxx_get_bus_frequency() function
  powerpc/5200: Update pcm030.dts to add i2c eeprom and delete cruft
  powerpc/5200: convert mpc52xx_psc_spi to use cs_control callback
  fbdev/xilinxfb: Fix improper casting and tighen up probe path
  usb/ps3: Add missing annotations
  powerpc: Add memory clobber to mtspr()
  powerpc: Fix invalid construct in our CPU selection Kconfig
  ps3rom: Use ps3_system_bus_[gs]et_drvdata() instead of direct access
  powerpc: Add configurable -Werror for arch/powerpc
  of_serial: Add UPF_FIXED_TYPE flag
  drivers/hvc: Add missing __devexit_p()
  net/ps3: gelic - Add missing annotations
  powerpc: Introduce macro spin_event_timeout()
  powerpc/warp: Fix ISA_DMA_THRESHOLD default
  powerpc/bootwrapper: Custom build options for XPedite52xx targets
  powerpc/85xx: Add defconfig for X-ES MPC85xx boards
  powerpc/85xx: Add dts files for X-ES MPC85xx boards
  powerpc/85xx: Add platform support for X-ES MPC85xx boards
  83xx: add support for the kmeter1 board.
  ...
2009-06-19 17:40:40 -07:00
Steven Rostedt
71e308a239 function-graph: add stack frame test
In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.

An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.

This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.

There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.

This test detected the problem and pointed out where the issue was.

This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.

Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18 18:40:18 -04:00
Harry Ciao
8f101a051e edac: cpc925 MC platform device setup
Fix up the number of cells for the values of CPC925 Memory Controller,
and setup related platform device during system booting up, against
which CPC925 Memory Controller EDAC driver would be matched.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:57 -07:00
Paul Mackerras
7325927e5a perf_counter: powerpc: Add processor back-end for MPC7450 family
This adds support for the performance monitor hardware on the
MPC7450 family of processors (7450, 7451, 7455, 7447/7457, 7447A,
7448), used in the later Apple G4 powermacs/powerbooks and other
machines.  These machines have 6 hardware counters with a unique
set of events which can be counted on each counter, with some
events being available on multiple counters.

Raw event codes for these processors are (PMC << 8) + PMCSEL.
If PMC is non-zero then the event is that selected by the given
PMCSEL value for that PMC (hardware counter).  If PMC is zero
then the event selected is one of the low-numbered ones that are
common to several PMCs.  In this case PMCSEL must be <= 22 and
the event is what that PMCSEL value would select on PMC1 (but
it may be placed any other PMC that has the same event for that
PMCSEL value).

For events that count cycles or occurrences that exceed a threshold,
the threshold requested can be specified in the 0x3f000 bits of the
raw event codes.  If the event uses the threshold multiplier bit
and that bit should be set, that is indicated with the 0x40000 bit
of the raw event code.

This fills in some of the generic cache events.  Unfortunately there
are quite a few blank spaces in the table, partly because these
processors tend to count cache hits rather than cache accesses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55631.802122.696927@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 11:11:46 +02:00
Paul Mackerras
98fb1807b9 perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
This abstracts a few things in arch/powerpc/kernel/perf_counter.c
that are specific to 64-bit kernels, and provides definitions for
32-bit kernels.  In particular,

* Only 64-bit has MMCRA and the bits in it that give information
  about a PMU interrupt (sampled PR, HV, slot number etc.)
* Only 64-bit has the lppaca and the lppaca->pmcregs_in_use field
* Use of SDAR is confined to 64-bit for now
* Only 64-bit has soft/lazy interrupt disable and therefore
  pseudo-NMIs (interrupts that occur while interrupts are soft-disabled)
* Only 64-bit has PMC7 and PMC8
* Only 64-bit has the MSR_HV bit.

This also fixes the types used in a couple of places, where we were
using long types for things that need to be 64-bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55590.634126.876084@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 11:11:46 +02:00
Paul Mackerras
079b3c569c perf_counter: powerpc: Change how processor-specific back-ends get selected
At present, the powerpc generic (processor-independent) perf_counter
code has list of processor back-end modules, and at initialization,
it looks at the PVR (processor version register) and has a switch
statement to select a suitable processor-specific back-end.

This is going to become inconvenient as we add more processor-specific
back-ends, so this inverts the order: now each back-end checks whether
it applies to the current processor, and registers itself if so.
Furthermore, instead of looking at the PVR, back-ends now check the
cur_cpu_spec->oprofile_cpu_type string and match on that.

Lastly, each back-end now specifies a name for itself so the core can
print a nice message when a back-end registers itself.

This doesn't provide any support for unregistering back-ends, but that
wouldn't be hard to do and would allow back-ends to be modules.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55529.762227.518531@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 11:11:45 +02:00
Paul Mackerras
448d64f8f4 perf_counter: powerpc: Use unsigned long for register and constraint values
This changes the powerpc perf_counter back-end to use unsigned long
types for hardware register values and for the value/mask pairs used
in checking whether a given set of events fit within the hardware
constraints.  This is in preparation for adding support for the PMU
on some 32-bit powerpc processors.  On 32-bit processors the hardware
registers are only 32 bits wide, and the PMU structure is generally
simpler, so 32 bits should be ample for expressing the hardware
constraints.  On 64-bit processors, unsigned long is 64 bits wide,
so using unsigned long vs. u64 (unsigned long long) makes no actual
difference.

This makes some other very minor changes: adjusting whitespace to line
things up in initialized structures, and simplifying some code in
hw_perf_disable().

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55473.26174.331511@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 11:11:45 +02:00
Paul Mackerras
105988c015 perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
This enables the perf_counter subsystem on 32-bit powerpc.  Since we
don't have any support for hardware counters on 32-bit powerpc yet,
only software counters can be used.

Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as
64-bit, the main thing this does is add an implementation of
set_perf_counter_pending().  This needs to arrange for
perf_counter_do_pending() to be called when interrupts are enabled.
Rather than add code to local_irq_restore as 64-bit does, the 32-bit
set_perf_counter_pending() generates an interrupt by setting the
decrementer to 1 so that a decrementer interrupt will become pending
in 1 or 2 timebase ticks (if a decrementer interrupt isn't already
pending).  When interrupts are enabled, timer_interrupt() will be
called, and some new code in there calls perf_counter_do_pending().
We use a per-cpu array of flags to indicate whether we need to call
perf_counter_do_pending() or not.

This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT,
which is selected by processor families for which we have hardware PMU
support (currently only PPC64), and PPC_PERF_CTRS, which enables the
powerpc-specific perf_counter back-end.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 11:11:44 +02:00
Benjamin Herrenschmidt
4b337c5f24 Merge commit 'origin/master' into next 2009-06-18 11:16:55 +10:00
Ingo Molnar
a3d06cc6aa Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/include/asm/kmap_types.h
	include/linux/mm.h

	include/asm-generic/kmap_types.h

Merge reason: We crossed changes with kmap_types.h cleanups in mainline.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 13:06:17 +02:00
Linus Torvalds
517d08699b Merge branch 'akpm'
* akpm: (182 commits)
  fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
  fbdev: *bfin*: fix __dev{init,exit} markings
  fbdev: *bfin*: drop unnecessary calls to memset
  fbdev: bfin-t350mcqb-fb: drop unused local variables
  fbdev: blackfin has __raw I/O accessors, so use them in fb.h
  fbdev: s1d13xxxfb: add accelerated bitblt functions
  tcx: use standard fields for framebuffer physical address and length
  fbdev: add support for handoff from firmware to hw framebuffers
  intelfb: fix a bug when changing video timing
  fbdev: use framebuffer_release() for freeing fb_info structures
  radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
  s3c-fb: CPUFREQ frequency scaling support
  s3c-fb: fix resource releasing on error during probing
  carminefb: fix possible access beyond end of carmine_modedb[]
  acornfb: remove fb_mmap function
  mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
  mb862xxfb: restrict compliation of platform driver to PPC
  Samsung SoC Framebuffer driver: add Alpha Channel support
  atmel-lcdc: fix pixclock upper bound detection
  offb: use framebuffer_alloc() to allocate fb_info struct
  ...

Manually fix up conflicts due to kmemcheck in mm/slab.c
2009-06-16 19:50:13 -07:00
Geert Uytterhoeven
ae52bb2384 fbdev: move logo externs to header file
Now we have __initconst, we can finally move the external declarations for
the various Linux logo structures to <linux/linux_logo.h>.

James' ack dates back to the previous submission (way to long ago), when the
logos were still __initdata, which caused failures on some platforms with some
toolchain versions.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: James Simmons <jsimmons@infradead.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:57 -07:00
Alexey Dobriyan
bb1f17b037 mm: consolidate init_mm definition
* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:

  init_mm is already unexported on x86.

  One strange place is some OMAP driver (drivers/video/omap/) which
  won't build modular, but it's already wants get_vm_area() export.
  Somebody should look there.

[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:28 -07:00
Michael Ellerman
ba55bd7436 powerpc: Add configurable -Werror for arch/powerpc
Add the option to build the code under arch/powerpc with -Werror.

The intention is to make it harder for people to inadvertantly introduce
warnings in the arch/powerpc code. It needs to be configurable so that
if a warning is introduced, people can easily work around it while it's
being fixed.

The option is a negative, ie. don't enable -Werror, so that it will be
turned on for allyes and allmodconfig builds.

The default is n, in the hope that developers will build with -Werror,
that will probably lead to some build breaks, I am prepared to be flamed.

It's not enabled for math-emu, which is a steaming pile of warnings.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-16 14:15:45 +10:00
Gerhard Pircher
f1f8b4948d powerpc: Enable additional BAT registers in setup_745x_specifics()
Currently the kernel expects the additional four IBAT and DBAT registers
to be available, but doesn't enable these registers on 745x CPUs, which
have them disabled after reset. Thus set the HIGH_BAT_EN bit in HID0
register, if the corresponding MMU feature is defined.

Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-06-15 21:45:31 -05:00
Nate Case
cab888e678 powerpc/fsl-booke: Enable L1 cache on e500v1/e500v2/e500mc CPUs
Some boot loaders may not enable L1 instruction/data cache.  Check if
data and instruction caches are enabled, and enable them if needed.

Signed-off-by: Nate Case <ncase@xes-inc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-06-15 21:45:30 -05:00
Linus Torvalds
0fa213310c Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits)
  powerpc: Fix bug in move of altivec code to vector.S
  powerpc: Add support for swiotlb on 32-bit
  powerpc/spufs: Remove unused error path
  powerpc: Fix warning when printing a resource_size_t
  powerpc/xmon: Remove unused variable in xmon.c
  powerpc/pseries: Fix warnings when printing resource_size_t
  powerpc: Shield code specific to 64-bit server processors
  powerpc: Separate PACA fields for server CPUs
  powerpc: Split exception handling out of head_64.S
  powerpc: Introduce CONFIG_PPC_BOOK3S
  powerpc: Move VMX and VSX asm code to vector.S
  powerpc: Set init_bootmem_done on NUMA platforms as well
  powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
  powerpc/mm: Fix some SMP issues with MMU context handling
  powerpc: Add PTRACE_SINGLEBLOCK support
  fbdev: Add PLB support and cleanup DCR in xilinxfb driver.
  powerpc/virtex: Add ml510 reference design device tree
  powerpc/virtex: Add Xilinx ML510 reference design support
  powerpc/virtex: refactor intc driver and add support for i8259 cascading
  powerpc/virtex: Add support for Xilinx PCI host bridge
  ...
2009-06-15 09:32:52 -07:00
Paul Mackerras
90c8f95453 perf_counter: powerpc: Fix two compile warnings
This fixes a couple of compile warnings that crept into the powerpc
perf_counter code recently:

   CC      arch/powerpc/kernel/perf_counter.o
 arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
 arch/powerpc/kernel/perf_counter.c:1016: warning: unused variable 'addr'
 arch/powerpc/kernel/perf_counter.c: In function 'hw_perf_counter_init':
 arch/powerpc/kernel/perf_counter.c:891: warning: 'ev' may be used uninitialized in this function

Stephen Rothwell reported this against linux-next as well.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18998.12884.787039.22202@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15 16:12:25 +02:00
Michael Ellerman
7719ed7ce8 powerpc: Only build prom_init.o when CONFIG_PPC_OF_BOOT_TRAMPOLINE=y
Commit 28794d34 ("powerpc/kconfig: Kill PPC_MULTIPLATFORM"), added
CONFIG_PPC_OF_BOOT_TRAMPOLINE to control the buliding of prom_init.o

However the Makefile still unconditionally builds prom_init_check,
the script that checks prom_init.o for symbol usage, and so in turn
prom_init.o is still always being built. (it's not linked though)

So surround all the prom_init_check logic with an ifeq block testing
if CONFIG_PPC_OF_BOOT_TRAMPOLINE is set.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-15 13:27:36 +10:00
Michael Ellerman
e468455e58 powerpc: Fix warning in setup_64.c when CONFIG_RELOCATABLE=y
When CONFIG_RELOCATABLE is enabled, PHYSICAL_START is actually a
variable of type phys_addr_t. That means to print it we need to
cast to unsigned long long and use llx.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-15 13:26:21 +10:00
Benjamin Herrenschmidt
177996e6e2 powerpc: Don't do generic calibrate_delay()
Currently we are wasting time calling the generic calibrate_delay()
function. We don't need it since our implementation of __delay() is
based on the CPU timebase. So instead, we use our own small
implementation that initializes loops_per_jiffy to something sensible
to make the few users like spinlock debug be happy

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-15 13:26:17 +10:00
Benjamin Herrenschmidt
7dafd239ab Merge commit 'origin/master' into next 2009-06-15 10:36:54 +10:00
Linus Torvalds
4ddbac9898 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
  perf_counter: Add forward/backward attribute ABI compatibility
  perf record: Explicity program a default counter
  perf_counter: Remove PERF_TYPE_RAW special casing
  perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
  powerpc, perf_counter: Fix performance counter event types
  perf_counter/x86: Add a quirk for Atom processors
  perf_counter tools: Remove one L1-data alias
2009-06-12 13:16:52 -07:00
Jaswinder Singh Rajput
4c921126fe powerpc, perf_counter: Fix performance counter event types
Sachin Sant reported these compiler errors:

 CC      arch/powerpc/kernel/power7-pmu.o
arch/powerpc/kernel/power7-pmu.c:297: error: PERF_COUNT_CPU_CYCLES undeclared here (not in a function)

Which happened because a last-minute rename of symbols crossed with
the Power7 support patch.

Fix this by using the new symbol names.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
LKML-Reference: <1244788494.5554.1.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 14:21:11 +02:00
Rusty Russell
5933048c69 module: cleanup FIXME comments about trimming exception table entries.
Everyone cut and paste this comment from my original one.  We now do
it generically, so cut the comments.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Amerigo Wang <amwang@redhat.com>
2009-06-12 21:47:05 +09:30
Benjamin Herrenschmidt
bc47ab0241 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/kernel/asm-offsets.c
2009-06-12 16:53:38 +10:00
Benjamin Herrenschmidt
37f9ef553b powerpc: Fix bug in move of altivec code to vector.S
The patch that moved to vector.S and made common between 32 and 64-bit the
altivec code had a nasty bug on 32-bit (did I really test that ?) which
causes the kernel to blr back into userspace ... oops :-)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-12 16:51:41 +10:00
Stephen Rothwell
e14112d1bd perfcounters: remove powerpc definitions of perf_counter_do_pending
Commit 925d519ab8 ("perf_counter:
unify and fix delayed counter wakeup") added global definitions.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11 20:03:13 -07:00
Peter Zijlstra
8be6e8f3c3 perf_counter: Rename L2 to LL cache
The top (fastest) and last level (biggest) caches are the most
interesting ones, performance wise.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ Fixed the Nehalem LL table to LLC Reference/Miss events ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:17 +02:00
Peter Zijlstra
f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Paul Mackerras
106b506c3a perf_counter: powerpc: Implement generalized cache events for POWER processors
This adds tables of event codes for the generalized cache events for
all the currently supported powerpc processors: POWER{4,5,5+,6,7} and
PPC970*, plus powerpc-specific code to use these tables when a
generalized cache event is requested.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18992.36430.933526.742969@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 16:48:37 +02:00
Paul Mackerras
4da52960fd perf_counters: powerpc: Add support for POWER7 processors
This adds the back-end for the PMU on POWER7 processors.  POWER7
has 4 fully-programmable counters and two fixed-function counters
(which do respect the freeze conditions, can generate interrupts,
and are writable, unlike PMC5/6 on POWER5+/6).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18992.36329.189378.17992@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 16:48:37 +02:00
Peter Zijlstra
9e350de37a perf_counter: Accurate period data
We currently log hw.sample_period for PERF_SAMPLE_PERIOD, however this is
incorrect. When we adjust the period, it will only take effect the next
cycle but report it for the current cycle. So when we adjust the period
for every cycle, we're always wrong.

Solve this by keeping track of the last_period.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 02:39:02 +02:00
Peter Zijlstra
df1a132bf3 perf_counter: Introduce struct for sample data
For easy extension of the sample data, put it in a structure.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 02:39:02 +02:00
Becky Bruce
ec3cf2ece2 powerpc: Add support for swiotlb on 32-bit
This patch includes the basic infrastructure to use swiotlb
bounce buffering on 32-bit powerpc.  It is not yet enabled on
any platforms.  Probably the most interesting bit is the
addition of addr_needs_map to dma_ops - we need this as
a dma_op because the decision of whether or not an addr
can be mapped by a device is device-specific.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:49:18 +10:00
Stephen Rothwell
bcba0778ee powerpc: Fix warning when printing a resource_size_t
resource_size_t is 64 bits on PowerPC 64.

Gets rid of this warning:

arch/powerpc/kernel/pci_64.c: In function 'pcibios_map_io_space':
arch/powerpc/kernel/pci_64.c:504: warning: format '%016lx' expects type 'long unsigned int', but argument 2 has type 'resource_size_t'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:41 +10:00
Benjamin Herrenschmidt
944916858a powerpc: Shield code specific to 64-bit server processors
This is a random collection of added ifdef's around portions of
code that only mak sense on server processors. Using either
CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.

This is meant to make the future merging of Book3E 64-bit support
easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt
91c60b5b82 powerpc: Separate PACA fields for server CPUs
This patch has no effect other than re-ordering PACA fields on
current server CPUs. It however is a pre-requisite for future
support of BookE 64-bit processors. Various parts of the PACA
struct are now moved under some ifdef's, either the new
CONFIG_PPC_BOOK3S or CONFIG_PPC_STD_MMU_64, whatever seems more
appropriate.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.craashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt
0ebc4cdaa3 powerpc: Split exception handling out of head_64.S
To prepare for future support of Book3E 64-bit PowerPC processors,
which use a completely different exception handling, we move that
code to a new exceptions-64s.S file.

This file is #included from head_64.S due to some of the absolute
address requirements which can currently only be fulfilled from
within that file.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:47:37 +10:00
Benjamin Herrenschmidt
e821ea70f3 powerpc: Move VMX and VSX asm code to vector.S
Currently, load_up_altivec and give_up_altivec are duplicated
in 32-bit and 64-bit. This creates a common implementation that
is moved away from head_32.S, head_64.S and misc_64.S and into
vector.S, using the same macros we already use for our common
implementation of load_up_fpu.

I also moved the VSX code over to vector.S though in that case
I didn't make it build on 32-bit (yet).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 16:46:25 +10:00
Roland McGrath
ec097c84df powerpc: Add PTRACE_SINGLEBLOCK support
Reworked by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This adds block-step support on powerpc, including a PTRACE_SINGLEBLOCK
request for ptrace.

The BookE implementation is tweaked to fire a single step after a
block step in order to mimmic the server behaviour.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-09 13:29:25 +10:00
Ingo Molnar
a21ca2cac5 perf_counter: Separate out attr->type from attr->config
Counter type is a frequently used value and we do a lot of
bit juggling by encoding and decoding it from attr->config.

Clean this up by creating a separate attr->type field.

Also clean up the various similarly complex user-space bits
all around counter attribute management.

The net improvement is significant, and it will be easier
to add a new major type (which is what triggered this cleanup).

(This changes the ABI, all tools are adapted.)
(PowerPC build-tested.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 11:37:22 +02:00
Paul Mackerras
1b58c2515b perf_counter: powerpc: Use new identifier names in powerpc-specific code
Commit b23f3325 ("perf_counter: Rename various fields") fixed up
most of the uses of the renamed fields, but missed one instance
of "record_type" in powerpc-specific code which needs to be changed
to "sample_type", and a "PERF_RECORD_ADDR" in the same statement that
needs to be changed to "PERF_SAMPLE_ADDR", causing compilation
errors on powerpc.  This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18983.3111.770392.800486@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-04 13:20:11 +02:00
Paul Mackerras
dcd945e0d8 perf_counter: powerpc: Fix race causing "oops trying to read PMC0" errors
When using interrupting counters and limited (non-interrupting)
counters at the same time, it's possible that we get an
interrupt in write_mmcr0() after writing MMCR0 but before we
have set up the counters using limited PMCs.  What happens then
is that we get into perf_counter_interrupt() with
counter->hw.idx = 0 for the limited counters, leading to the
"oops trying to read PMC0" error message being printed.

This fixes the problem by making perf_counter_interrupt()
robust against counter->hw.idx being zero (the counter is just
ignored in that case) and also by changing write_mmcr0() to
write MMCR0 initially with the counter overflow interrupt
enable bits masked (set to 0).  If the MMCR0 value requested by
the caller has either of those bits set, we write MMCR0 again
with the requested value of those bits after setting up the
limited counters properly.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <18982.17684.138182.954599@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-03 11:49:53 +02:00
Paul Mackerras
6984efb692 perf_counter: powerpc: Fix event alternative code generation on POWER5/5+
Commit ef923214 ("perf_counter: powerpc: use u64 for event
codes internally") introduced a bug where the return value from
function find_alternative_bdecode gets put into a u64 variable
and later tested to see if it is < 0.  The effect is that we
get extra, bogus event code alternatives on POWER5 and POWER5+,
leading to error messages such as "oops compute_mmcr failed"
being printed and counters not counting properly.

This fixes it by using s64 for the return type of
find_alternative_bdecode and for the local variable that the
caller puts the value in.  It also makes the event argument a
u64 on POWER5+ for consistency.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <18982.17586.666132.90983@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-03 11:49:52 +02:00
Peter Zijlstra
0d48696f87 perf_counter: Rename perf_counter_hw_event => perf_counter_attr
The structure isn't hw only and when I read event, I think about those
things that fall out the other end. Rename the thing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-02 21:45:33 +02:00
Peter Zijlstra
b23f3325ed perf_counter: Rename various fields
A few renames:

  s/irq_period/sample_period/
  s/irq_freq/sample_freq/
  s/PERF_RECORD_/PERF_SAMPLE_/
  s/record_type/sample_type/

And change both the new sample_type and read_format to u64.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-02 21:45:30 +02:00
Stephen Rothwell
baf75b0a42 powerpc/pci: Fix annotation of pcibios_claim_one_bus
It was __devinit, but it is also within a CONFIG_HOTPLUG guarded section
of code, so the __devinit does nothing but cause the following warning:

WARNING: vmlinux.o(.text+0x107a8): Section mismatch in reference from the function pcibios_finish_adding_to_bus() to the function .devinit.text:pcibios_claim_one_bus()
The function pcibios_finish_adding_to_bus() references
the function __devinit pcibios_claim_one_bus().
This is often because pcibios_finish_adding_to_bus lacks a __devinit
annotation or the annotation of pcibios_claim_one_bus is wrong.

It is also only (externally) used in arch/powerpc/kernel/of_platform.c
which cannot be built as a module so don't export it.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-02 11:09:12 +10:00
Michael Ellerman
92e02a5125 powerpc/ftrace: Use PPC_INST_NOP directly
There's no need to wrap PPC_INST_NOP in a static inline.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-02 10:36:53 +10:00
Michael Ellerman
898b160fe9 powerpc/ftrace: Remove unused macros
These macros were used in the original port, but since commit
e4486fe316 (ftrace, use probe_kernel API to modify code) they
are unused.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-02 10:36:46 +10:00
Michael Ellerman
4a9e3f8e94 powerpc/ftrace: Use ppc_function_entry() instead of GET_ADDR
Use ppc_function_entry() from code-patching.h.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-02 10:36:32 +10:00
Nathan Fontenot
f03cdb3a66 powerpc: Display processor virtualization resource allocs in lparcfg
This patch updates the output from /proc/ppc64/lparcfg to display the
processor virtualization resource allocations for a shared processor
partition.

This information is already gathered via the h_get_ppp call, we just
have to make sure that the ibm,partition-performance-parameters-level
property is >= 1 to ensure that the information is valid.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-02 10:36:10 +10:00
Ingo Molnar
23db9f430b Merge branch 'linus' into perfcounters/core
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
              based - to pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 10:01:39 +02:00
Benjamin Herrenschmidt
435462c6e6 Merge branch 'merge' into next 2009-05-29 13:54:52 +10:00
Benjamin Herrenschmidt
8b31e49d1d powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency.
The implementation we just revived has issues, such as using a
Kconfig-defined virtual address area in kernel space that nothing
actually carves out (and thus will overlap whatever is there),
or having some dependencies on being self contained in a single
PTE page which adds unnecessary constraints on the kernel virtual
address space.

This fixes it by using more classic PTE accessors and automatically
locating the area for consistent memory, carving an appropriate hole
in the kernel virtual address space, leaving only the size of that
area as a Kconfig option. It also brings some dma-mask related fixes
from the ARM implementation which was almost identical initially but
grew its own fixes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27 16:33:59 +10:00
Paul Mackerras
8a7b8cb91f perf_counter: powerpc: Implement interrupt throttling
This implements interrupt throttling on powerpc.  Since we don't have
individual count enable/disable or interrupt enable/disable controls
per counter, this simply sets the hardware counter to 0, meaning that
it will not interrupt again until it has counted 2^31 counts, which
will take at least 2^30 cycles assuming a maximum of 2 counts per
cycle.  Also, we set counter->hw.period_left to the maximum possible
value (2^63 - 1), so we won't report overflows for this counter for
the forseeable future.

The unthrottle operation restores counter->hw.period_left and the
hardware counter so that we will once again report a counter overflow
after counter->hw.irq_period counts.

[ Impact: new perfcounters robustness feature on PowerPC ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <18971.35823.643362.446774@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-26 09:43:59 +02:00
Geert Uytterhoeven
80947e7c99 powerpc: Keep track of emulated instructions
If CONFIG_PPC_EMULATED_STATS is enabled, make available counters for the
various classes of emulated instructions under
/sys/kernel/debug/powerpc/emulated_instructions/ (assumed debugfs is mounted on
/sys/kernel/debug).  Optionally (controlled by
/sys/kernel/debug/powerpc/emulated_instructions/do_warn), rate-limited warnings
can be printed to the console when instructions are emulated.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:26 +10:00
Anton Blanchard
8d165db107 powerpc: Improve decrementer accuracy
I have been looking at sources of OS jitter and notice that after a long
NO_HZ idle period we wakeup too early:

relative time (us)    event
                      timer irq exit
    999946.405        timer irq entry
         4.835        timer irq exit
        21.685        timer irq entry
         3.540          timer (tick_sched_timer) entry

Here we slept for just under a second then took a timer interrupt that did
nothing. 21.685 us later we wake up again and do the work.

We set a rather low shift value of 16 for the decrementer clockevent, which I
think is causing this issue. On this box we have a 207MHz decrementer and see:

clockevent: decrementer mult[3501] shift[16] cpu[0]

For calculations of large intervals this mult/shift combination could be
off by a significant amount. I notice the sparc code has a loop that iterates
to find a mult/shift combination that maximises the shift value while
keeping mult under 32bit. With the patch below we get:

clockevent: decrementer mult[35015c20] shift[32] cpu[15]

And we no longer see the spurious wakeups.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Kumar Gala
9aa4e7b169 powerpc/pci: Remove redundant pcnet32 fixup
The pcnet32 driver has had in its device id table for some time a match
against the "broken" vendor ID.  No need to fake out the vendor ID
with an explicit fixup.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Kumar Gala
cf6692c07a powerpc/pci: Cleanup some minor cruft
* Removed building setup-irq on ppc32, we don't use it anymore
* Remove duplicate prototype for setup_grackle() code that needs it
  gets it from <asm/grackle.h>
* Removed gratuitous pci_io_size type differences between ppc32/ppc64

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Kumar Gala
2eb4afb69f powerpc/pci: Move pseries code into pseries platform specific area
There doesn't appear to be any specific reason that we need to setup the
pseries specific notifier in generic arch pci code.  Move it into pseries
land.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Kumar Gala
2f52297665 powerpc/pci: Clean up direct access to sysdata by RTAS
We shouldn't directly access sysdata to get the device node but call
pci_bus_to_OF_node() for this purpose.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:23 +10:00
Milton Miller
60dbf43851 powerpc: Add 2.06 tlbie mnemonics
This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards
compatibilty for CPUs before 2.06.

Only useful for bare metal systems.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:21 +10:00
Michael Ellerman
835363e67d powerpc/irq: Remove fallback to __do_IRQ()
We should no longer have any irq code that needs __do_IRQ(), so
remove the fallback to __do_IRQ().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:20 +10:00
Michael Ellerman
9b647a30cb powerpc/irq: Move get_irq() comment into header
The guts of do_IRQ() isn't really the right place to be documenting
the ppc_md.get_irq() interface. So move the comment into machdep.h

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:43:59 +10:00
Michael Ellerman
d7cb10d6d2 powerpc/irq: Move stack overflow check into a separate function
Makes do_IRQ() shorter and clearer.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:43:58 +10:00
Michael Ellerman
f2694ba568 powerpc/irq: Move #ifdef'ed body of do_IRQ() into a separate function
Rather than a giant ifdef in the body of do_IRQ(), including a
dangling else, move the irq stack logic into a separate routine and
do the ifdef there.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:43:58 +10:00
Paul Mackerras
c0daaf3f1f perf_counter: powerpc: initialize cpuhw pointer before use
Commit 9e35ad38 ("perf_counter: Rework the perf counter
disable/enable") added code to the powerpc hw_perf_enable (renamed
from hw_perf_restore) to test cpuhw->disabled and return immediately
if it is not set (i.e. if the PMU is already enabled).

Unfortunately the test got added before cpuhw was initialized,
resulting in an oops the first time hw_perf_enable got called.
This fixes it by moving the initialization of cpuhw to before
cpuhw->disabled is tested.

[ Impact: fix oops-causing bug on powerpc ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <18960.56772.869734.304631@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 07:38:42 +02:00
Ingo Molnar
dc3f81b129 Merge commit 'v2.6.30-rc6' into perfcounters/core
Merge reason: this branch was on an -rc4 base, merge it up to -rc6
              to get the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 07:37:49 +02:00
Benjamin Herrenschmidt
0e337b42d6 powerpc: Explicit alignment for .data.cacheline_aligned
I don't think anything guarantees that the objects in data.page_aligned
are a multiple of PAGE_SIZE, thus the section may end on any boundary.

So the following section, .data.cacheline_aligned needs an explicit
alignment.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18 15:19:05 +10:00
Steven Rostedt
c3cf8667ed powerpc/ftrace: Fix constraint to be early clobber
After upgrading my distcc boxes from gcc 4.2.2 to 4.4.0, the function
graph tracer broke. This was discovered on my x86 boxes.

The issue is that gcc used the same register for an output as it did for
an input in an asm statement. I first thought this was a bug in gcc and
reported it. I was notified that gcc was correct and that the output had
to be flagged as an "early clobber".

I noticed that powerpc had the same issue and this patch fixes it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18 15:19:05 +10:00
Michael Ellerman
021376a3b6 powerpc/ftrace: Use pr_devel() in ftrace.c
pr_debug() can now result in code being generated even when #DEBUG
is not defined. That's not really desirable in the ftrace code
which we want to be snappy.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text	   data	    bss	    dec	    hex	filename
   3334	    672	      4	   4010	    faa	arch/powerpc/kernel/ftrace.o

size after:
   text	   data	    bss	    dec	    hex	filename
   2616	    360	      4	   2980	    ba4	arch/powerpc/kernel/ftrace.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18 15:19:04 +10:00
Paul Mackerras
0bbd0d4be8 perf_counter: powerpc: supply more precise information on counter overflow events
This uses values from the MMCRA, SIAR and SDAR registers on
powerpc to supply more precise information for overflow events,
including a data address when PERF_RECORD_ADDR is specified.

Since POWER6 uses different bit positions in MMCRA from earlier
processors, this converts the struct power_pmu limited_pmc5_6
field, which only had 0/1 values, into a flags field and
defines bit values for its previous use (PPMU_LIMITED_PMC5_6)
and a new flag (PPMU_ALT_SIPR) to indicate that the processor
uses the POWER6 bit positions rather than the earlier
positions.  It also adds definitions in reg.h for the new and
old positions of the bit that indicates that the SIAR and SDAR
values come from the same instruction.

For the data address, the SDAR value is supplied if we are not
doing instruction sampling.  In that case there is no guarantee
that the address given in the PERF_RECORD_ADDR subrecord will
correspond to the instruction whose address is given in the
PERF_RECORD_IP subrecord.

If instruction sampling is enabled (e.g. because this counter
is counting a marked instruction event), then we only supply
the SDAR value for the PERF_RECORD_ADDR subrecord if it
corresponds to the instruction whose address is in the
PERF_RECORD_IP subrecord.  Otherwise we supply 0.

[ Impact: support more PMU hardware features on PowerPC ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18955.37028.48861.555309@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 16:38:57 +02:00
Paul Mackerras
ef923214a4 perf_counter: powerpc: use u64 for event codes internally
Although the perf_counter API allows 63-bit raw event codes,
internally in the powerpc back-end we had been using 32-bit
event codes.  This expands them to 64 bits so that we can add
bits for specifying threshold start/stop events and instruction
sampling modes later.

This also corrects the return value of can_go_on_limited_pmc;
we were returning an event code rather than just a 0/1 value in
some circumstances. That didn't particularly matter while event
codes were 32-bit, but now that event codes are 64-bit it
might, so this fixes it.

[ Impact: extend PowerPC perfcounter interfaces from u32 to u64 ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18955.36874.472452.353104@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 16:38:55 +02:00
Peter Zijlstra
60db5e09c1 perf_counter: frequency based adaptive irq_period
Instead of specifying the irq_period for a counter, provide a target interrupt
frequency and dynamically adapt the irq_period to match this frequency.

[ Impact: new perf-counter attribute/feature ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090515132018.646195868@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 15:26:56 +02:00
Peter Zijlstra
9e35ad388b perf_counter: Rework the perf counter disable/enable
The current disable/enable mechanism is:

	token = hw_perf_save_disable();
	...
	/* do bits */
	...
	hw_perf_restore(token);

This works well, provided that the use nests properly. Except we don't.

x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
provide a reference counter disable/enable interface, where the first disable
disables the hardware, and the last enable enables the hardware again.

[ Impact: refactor, simplify the PMU disable/enable logic ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:47:02 +02:00
Benjamin Herrenschmidt
ad892a63f6 powerpc: Fix PCI ROM access
A couple of issues crept in since about 2.6.27 related to accessing PCI
device ROMs on various powerpc machines.

First, historically, we don't allocate the ROM resource in the resource
tree. I'm not entirely certain of why, I susepct they often contained
garbage on x86 but it's hard to tell. This causes the current generic
code to always call pci_assign_resource() when trying to access the said
ROM from sysfs, which will try to re-assign some new address regardless
of what the ROM BAR was already set to at boot time. This can be a
problem on hypervisor platforms like pSeries where we aren't supposed
to move PCI devices around (and in fact probably can't).

Second, our code that generates the PCI tree from the OF device-tree
(instead of doing config space probing) which we mostly use on pseries
at the moment, didn't set the (new) flag IORESOURCE_SIZEALIGN on any
resource. That means that any attempt at re-assigning such a resource
with pci_assign_resource() would fail due to resource_alignment()
returning 0.

This fixes this by doing these two things:

 - The code that calculates resource flags based on the OF device-node
is improved to set IORESOURCE_SIZEALIGN on any valid BAR, and while at
it also set IORESOURCE_READONLY for ROMs since we were lacking that too

 - We now allocate ROM resources as part of the resource tree. However
to limit the chances of nasty conflicts due to busted firmwares, we
only do it on the second pass of our two-passes allocation scheme,
so that all valid and enabled BARs get precedence.

This brings pSeries back the ability to access PCI ROMs via sysfs (and
thus initialize various video cards from X etc...).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15 16:43:42 +10:00
Benjamin Herrenschmidt
b173f03d7c powerpc/pseries: Really fix the oprofile CPU type on pseries
My previous pach for fixing the oprofile CPU type got somewhat mismerged
(by my fault) when it collided with another related patch. This should
finally (fingers crossed) fix the whole thing.

We make sure we keep the -old- oprofile type and CPU type whenever
one of them was specified in the first pass through the function.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15 16:43:42 +10:00
Becky Bruce
49a8496525 powerpc: Allow mem=x cmdline to work with 4G+
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15 16:43:41 +10:00
Benjamin Herrenschmidt
0203d6ec4e powerpc: Fix setting of oprofile cpu type
commit 2657dd4e30 introduced a
bug where we would now always override the "real" oprofile CPU
type with the "compatible" one provided by a pseudo-PVR in the
device-tree which is incorrect and breaks oprofile on all current
configs since the "compatible" ones aren't yet recognized.

This fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-01 15:12:05 +10:00
Michael Wolf
79af6c49a9 powerpc adjust oprofile_cpu_type version 3
Oprofile is changing the naming it is using for the compatibility modes.
Instead of having compat-power<x>, oprofile will go to family naming
convention and use ibm-compat-v<x>.  Currently only ibm-compat-v1 will
be defined.
The notion of compatibility events just started with POWER6. So there is
no way that any other tool could exist that is using these
oprofile_cpu_type strings we want to change.

Signed-off-by: Mike Wolf <mjw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-01 15:12:05 +10:00
Paul Mackerras
ab7ef2e50a perf_counter: powerpc: allow use of limited-function counters
POWER5+ and POWER6 have two hardware counters with limited functionality:
PMC5 counts instructions completed in run state and PMC6 counts cycles
in run state.  (Run state is the state when a hardware RUN bit is 1;
the idle task clears RUN while waiting for work to do and sets it when
there is work to do.)

These counters can't be written to by the kernel, can't generate
interrupts, and don't obey the freeze conditions.  That means we can
only use them for per-task counters (where we know we'll always be in
run state; we can't put a per-task counter on an idle task), and only
if we don't want interrupts and we do want to count in all processor
modes.

Obviously some counters can't go on a limited hardware counter, but there
are also situations where we can only put a counter on a limited hardware
counter - if there are already counters on that exclude some processor
modes and we want to put on a per-task cycle or instruction counter that
doesn't exclude any processor mode, it could go on if it can use a
limited hardware counter.

To keep track of these constraints, this adds a flags argument to the
processor-specific get_alternatives() functions, with three bits defined:
one to say that we can accept alternative event codes that go on limited
counters, one to say we only want alternatives on limited counters, and
one to say that this is a per-task counter and therefore events that are
gated by run state are equivalent to those that aren't (e.g. a "cycles"
event is equivalent to a "cycles in run state" event).  These flags
are computed for each counter and stored in the counter->hw.counter_base
field (slightly wonky name for what it does, but it was an existing
unused field).

Since the limited counters don't freeze when we freeze the other counters,
we need some special handling to avoid getting skew between things counted
on the limited counters and those counted on normal counters.  To minimize
this skew, if we are using any limited counters, we read PMC5 and PMC6
immediately after setting and clearing the freeze bit.  This is done in
a single asm in the new write_mmcr0() function.

The code here is specific to PMC5 and PMC6 being the limited hardware
counters.  Being more general (e.g. having a bitmap of limited hardware
counter numbers) would have meant more complex code to read the limited
counters when freezing and unfreezing the normal counters, with
conditional branches, which would have increased the skew.  Since it
isn't necessary for the code to be more general at this stage, it isn't.

This also extends the back-ends for POWER5+ and POWER6 to be able to
handle up to 6 counters rather than the 4 they previously handled.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <18936.19035.163066.892208@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:58:35 +02:00
Robert Richter
4aeb0b4239 perfcounters: rename struct hw_perf_counter_ops into struct pmu
This patch renames struct hw_perf_counter_ops into struct pmu. It
introduces a structure to describe a cpu specific pmu (performance
monitoring unit). It may contain ops and data. The new name of the
structure fits better, is shorter, and thus better to handle. Where it
was appropriate, names of function and variable have been changed too.

[ Impact: cleanup ]

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:51:03 +02:00
Ingo Molnar
e7fd5d4b3d Merge branch 'linus' into perfcounters/core
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
              the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29 14:47:05 +02:00
Linus Torvalds
c3310e7766 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc/ps3: Fix build error on UP
  powerpc/cell: Select PCI for IBM_CELL_BLADE AND CELLEB
  powerpc: ppc32 needs elf_read_implies_exec()
  powerpc/86xx: Add device_type entry to soc for ppc9a
  powerpc/44x: Correct memory size calculation for denali-based boards
  maintainers: Fix PowerPC 4xx git tree
  powerpc: fix for long standing bug noticed by gcc 4.4.0
  Revert "powerpc: Add support for early tlbilx opcode"
2009-04-28 15:55:32 -07:00
Tim Abbott
13beadd91f powerpc: Revert switch to TEXT_TEXT in linker script
Commit edada399 broke the build on 64-bit powerpc because it moved the
__ftr_alt_* sections of a file away from the .text section, causing
link failures due to relative conditional branch targets being too far
away from the branch instructions.  This happens on pretty much all
64-bit powerpc configs.

This change reverts commit edada399 while preserving the update from
the *.refok sections to .ref.text that has happened since.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Requested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-28 15:55:14 -07:00
Tim Abbott
edada399e8 powerpc: Use TEXT_TEXT macro in linker script.
Rather than adding .ref.text to the powerpc linker script so that we
can use __REF on the powerpc architecture, it seems simpler to switch
to using the generic TEXT_TEXT macro.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-27 19:51:58 -07:00
Tim Abbott
e703984587 powerpc: convert to use __HEAD and HEAD_TEXT macros.
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text".  Since this commit changes all
users in the architecture, this change should be harmless.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-26 09:20:38 -07:00
Linus Torvalds
f8c3301e83 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: Fix modular build of ide-pmac when mediabay is built in
  powerpc/pasemi: Fix build error on UP
  powerpc: Make macintosh/mediabay driver depend on CONFIG_BLOCK
  maintainers: Fix PS3 patterns
  powerpc/ps3: Fix CONFIG_PS3_FLASH=n build warning
  powerpc/32: Don't clobber personality flags on exec
  powerpc: Fix crash on CPU hotplug
  powerpc/85xx: Remove defconfigs that mpc85xx_{smp_}defconfig cover
  powerpc/85xx: Added SMP defconfig
  powerpc/85xx: Enabled a bunch of FSL specific drivers/options
  powerpc/85xx: Updated generic mpc85xx_defconfig
  powerpc: don't disable SATA interrupts on Freescale MPC8610 HPCD
  fsl_rio: Pass the proper device to dma mapping routines
  powerpc: Fix of_node_put() exit path in of_irq_map_one()
  powerpc/5200: defconfig updates
  powerpc/5200: Add FLASH nodes to lite5200 device tree
  powerpc/device-tree: Document MTD nodes with multiple "reg" tuples
  powerpc/of-device-tree: Factor MTD physmap bindings out of booting-without-of
  powerpc/5200: Bring the legacy fsl_spi_platform_data hooks back
2009-04-24 07:44:58 -07:00
Kumar Gala
323d23aeac Revert "powerpc: Add support for early tlbilx opcode"
This reverts commit e996557740.  Our HW
guys were able to fix this so it never sees the light of day.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-23 08:51:22 -05:00
Paul Mackerras
5bd3ef84d7 Merge branch 'merge' of git://git.secretlab.ca/git/linux-2.6 into merge 2009-04-22 13:02:09 +10:00
Magnus Damm
8e19608e8b clocksource: pass clocksource to read() callback
Pass clocksource pointer to the read() callback for clocksources.  This
allows us to share the callback between multiple instances.

[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21 13:41:47 -07:00
Ilpo Järvinen
6d25b688ec powerpc: Fix of_node_put() exit path in of_irq_map_one()
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-04-20 12:18:43 -06:00
Linus Torvalds
a23c218bd3 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: pseries/dtl.c should include asm/firmware.h
  powerpc: Fix data-corrupting bug in __futex_atomic_op
  powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
  powerpc: Allow 256kB pages with SHMEM
  powerpc: Document new FSL I2C bindings and cleanup
  powerpc/mm: Fix compile warning
  powerpc/85xx: TQM8548: update defconfig
  powerpc/85xx: TQM8548: use proper phy-handles for enet2 and enet3
  powerpc/85xx: TQM85xx: correct address of LM75 I2C device nodes
  powerpc: Add support for early tlbilx opcode
  powerpc: Fix tlbilx opcode
2009-04-15 08:42:40 -07:00
Paul Mackerras
ca8f2d7f01 perf_counter: powerpc: add nmi_enter/nmi_exit calls
Impact: fix potential deadlocks on powerpc

Now that the core is using in_nmi() (added in e30e08f6, "perf_counter:
fix NMI race in task clock"), we need the powerpc perf_counter_interrupt
to call nmi_enter() and nmi_exit() in those cases where the interrupt
happens when interrupts are soft-disabled.

If interrupts were soft-enabled, we can treat it as a regular interrupt
and do irq_enter/irq_exit around the whole routine. This lets us get rid
of the test_perf_counter_pending() call at the end of
perf_counter_interrupt, thus simplifying things a little.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18909.31952.873098.336615@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09 07:56:08 +02:00
Peter Zijlstra
78f13e9525 perf_counter: allow for data addresses to be recorded
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.

For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.

x86 doesn't seem capable of providing this atm.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 19:05:56 +02:00
Paul Mackerras
f708223d49 perf_counter: powerpc: set sample enable bit for marked instruction events
Impact: enable access to hardware feature

POWER processors have the ability to "mark" a subset of the instructions
and provide more detailed information on what happens to the marked
instructions as they flow through the pipeline.  This marking is
enabled by the "sample enable" bit in MMCRA, and there are
synchronization requirements around setting and clearing the bit.

This adds logic to the processor-specific back-ends so that they know
which events relate to marked instructions and set the sampling enable
bit if any event that we want to put on the PMU is a marked instruction
event.  It also adds logic to the generic powerpc code to do the
necessary synchronization if that bit is set.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18908.31930.1024.228867@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 12:39:28 +02:00
Paul Mackerras
dc66270b51 perf_counter: fix powerpc build
Commit 4af4998b ("perf_counter: rework context time") changed struct
perf_counter_context to have a 'time' field instead of a 'time_now'
field, but neglected to fix the place in the powerpc perf_counter.c
where the time_now field was accessed.  This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18908.31922.411398.147810@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 12:39:27 +02:00
Ingo Molnar
5ea472a77f Merge commit 'v2.6.30-rc1' into perfcounters/core
Conflicts:
	arch/powerpc/include/asm/systbl.h
	arch/powerpc/include/asm/unistd.h
	include/linux/init_task.h

Merge reason: the conflicts are non-trivial: PowerPC placement
              of sys_perf_counter_open has to be mixed with the
	      new preadv/pwrite syscalls.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08 10:35:30 +02:00
Yang Hongyang
284901a90a dma-mapping: replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)

Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:11 -07:00
Peter Zijlstra
f6c7d5fe58 perf_counter: theres more to overflow than writing events
Prepare for more generic overflow handling. The new perf_counter_overflow()
method will handle the generic bits of the counter overflow, and can return
a !0 return value, in which case the counter should be (soft) disabled, so
that it won't count until it's properly disabled.

XXX: do powerpc and swcounter

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090406094517.812109629@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07 10:48:56 +02:00
Kumar Gala
e996557740 powerpc: Add support for early tlbilx opcode
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode.  Add support for a MMU_FTR
fixup to deal with this.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07 01:36:30 -05:00
Michael Ellerman
7ddb7ad11f powerpc/ftrace: Fix printf format warning
'tramp' is an unsigned long, so print it with %lx.

Fixes the following build warning:
arch/powerpc/kernel/ftrace.c:291: error: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:19:00 +10:00
Michael Ellerman
f4952f6cbe powerpc/ftrace: Fix #if that should be #ifdef
Commit bb7253403f ("powerpc64,
ftrace: save toc only on modules for function graph"), added an
#if CONFIG_PPC64.  This changes it to #ifdef.

Fixes the following warning on 32-bit builds:
 arch/powerpc/kernel/ftrace.c:562:5: error: "CONFIG_PPC64" is not defined

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:19:00 +10:00
Michael Neuling
bc826666e4 powerpc: Fix ptrace compat wrapper for FPU register access
The ptrace compat wrapper mishandles access to the fpu registers.  The
PTRACE_PEEKUSR and PTRACE_POKEUSR requests miscalculate the index into
the fpr array due to the broken FPINDEX macro.  The
PPC_PTRACE_PEEKUSR_3264 request needs to use the same formula that the
native ptrace interface uses when operating on the register number (as
opposed to the 4-byte offset).  The PPC_PTRACE_POKEUSR_3264 request
didn't take TS_FPRWIDTH into account.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:19:00 +10:00
Michael Ellerman
c7d07fdd5a powerpc: Print information about mapping hw irqs to virtual irqs
The irq remapping layer seems to cause some confusion when people
see a different irq number in /proc/interrupts vs the one they
request in their driver or DTS.

So have the irq remapping layer print out a message when we map an
irq. The message is only printed the first time the irq is mapped,
and it's KERN_DEBUG so most people won't see it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:19:00 +10:00
Michael Neuling
7e875e9dc8 powerpc: Disable VSX or current process in giveup_fpu/altivec
When we call giveup_fpu, we need to need to turn off VSX for the
current process.  If we don't, on return to userspace it may execute a
VSX instruction before the next FP instruction, and not have its
register state refreshed correctly from the thread_struct.  Ditto for
altivec.

This caused a bug where an unaligned lfs or stfs results in
fix_alignment calling giveup_fpu so it can use the FPRs (in order to
do a single <-> double conversion), and then returning to userspace
with FP off but VSX on.  Then if a VSX instruction is executed, before
another FP instruction, it will proceed without another exception and
hence have the incorrect register state for VSX registers 0-31.

   lfs unaligned   <- alignment exception turns FP off but leaves VSX on

   VSX instruction <- no exception since VSX on, hence we get the
                      wrong VSX register values for VSX registers 0-31,
                      which overlap the FPRs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:18:59 +10:00
Anton Blanchard
856cc2f0be powerpc/pseries: Fix ibm,client-architecture comment
We specify a 64MB RMO, but the comment says 128MB.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:18:59 +10:00
Anton Blanchard
0559f0a761 powerpc/pseries: Add dispatch dispersion statistics
PHYP tells us how often a shared processor dispatch changed physical cpus.
This can highlight performance problems caused by the hypervisor.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:18:59 +10:00
Anton Blanchard
1f8737aab3 powerpc: Clean up some prom printouts
Make all messages consistent, some have spaces before the "...", some do not.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:18:59 +10:00
Anton Blanchard
4da727ae2a powerpc: Print progress of ibm,client-architecture method
The ibm,client-architecture method will often cause a reconfiguration reboot.
When this happens the last thing we see is:

	Hypertas detected, assuming LPAR !

Which doesn't explain what just happened.  Wrap the ibm,client-architecture
so it's clear what is going on:

	Calling ibm,client-architecture... done

In order to maintain the law of conservation of screen real estate, downgrade
two other messages to debug.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:18:58 +10:00
Huang Weiyi
85701e6ac1 powerpc: Remove duplicated #include's
Remove duplicated #include's in
  - arch/powerpc/include/asm/ps3fb.h
  - arch/powerpc/kernel/setup-common.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07 15:18:58 +10:00
Paul Mackerras
d5d2bc0dd0 perf_counter: make it possible for hw_perf_counter_init to return error codes
Impact: better error reporting

At present, if hw_perf_counter_init encounters an error, all it can do
is return NULL, which causes sys_perf_counter_open to return an EINVAL
error to userspace.  This isn't very informative for userspace; it means
that userspace can't tell the difference between "sorry, oprofile is
already using the PMU" and "we don't support this CPU" and "this CPU
doesn't support the requested generic hardware event".

This commit uses the PTR_ERR/ERR_PTR/IS_ERR set of macros to let
hw_perf_counter_init return an error code on error rather than just NULL
if it wishes.  If it does so, that error code will be returned from
sys_perf_counter_open to userspace.  If it returns NULL, an EINVAL
error will be returned to userspace, as before.

This also adapts the powerpc hw_perf_counter_init to make use of this
to return ENXIO, EINVAL, EBUSY, or EOPNOTSUPP as appropriate.  It would
be good to add extra error numbers in future to allow userspace to
distinguish the various errors that are currently reported as EINVAL,
i.e. irq_period < 0, too many events in a group, conflict between
exclude_* settings in a group, and PMU resource conflict in a group.

[ v2: fix a bug pointed out by Corey Ashford where error returns from
      hw_perf_counter_init were not handled correctly in the case of
      raw hardware events.]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Orig-LKML-Reference: <20090330171023.682428180@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:40 +02:00
Paul Mackerras
7595d63b3a perf_counter: powerpc: only reserve PMU hardware when we need it
Impact: cooperate with oprofile

At present, on PowerPC, if you have perf_counters compiled in, oprofile
doesn't work.  There is code to allow the PMU to be shared between
competing subsystems, such as perf_counters and oprofile, but currently
the perf_counter subsystem reserves the PMU for itself at boot time,
and never releases it.

This makes perf_counter play nicely with oprofile.  Now we keep a count
of how many perf_counter instances are counting hardware events, and
reserve the PMU when that count becomes non-zero, and release the PMU
when that count becomes zero.  This means that it is possible to have
perf_counters compiled in and still use oprofile, as long as there are
no hardware perf_counters active.  This also means that if oprofile is
active, sys_perf_counter_open will fail if the hw_event specifies a
hardware event.

To avoid races with other tasks creating and destroying perf_counters,
we use a mutex.  We use atomic_inc_not_zero and atomic_add_unless to
avoid having to take the mutex unless there is a possibility of the
count going between 0 and 1.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Orig-LKML-Reference: <20090330171023.627912475@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:39 +02:00
Peter Zijlstra
925d519ab8 perf_counter: unify and fix delayed counter wakeup
While going over the wakeup code I noticed delayed wakeups only work
for hardware counters but basically all software counters rely on
them.

This patch unifies and generalizes the delayed wakeup to fix this
issue.

Since we're dealing with NMI context bits here, use a cmpxchg() based
single link list implementation to track counters that have pending
wakeups.

[ This should really be generic code for delayed wakeups, but since we
  cannot use cmpxchg()/xchg() in generic code, I've let it live in the
  perf_counter code. -- Eric Dumazet could use it to aggregate the
  network wakeups. ]

Furthermore, the x86 method of using TIF flags was flawed in that its
quite possible to end up setting the bit on the idle task, loosing the
wakeup.

The powerpc method uses per-cpu storage and does appear to be
sufficient.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:36 +02:00
Paul Mackerras
53cfbf5937 perf_counter: record time running and time enabled for each counter
Impact: new functionality

Currently, if there are more counters enabled than can fit on the CPU,
the kernel will multiplex the counters on to the hardware using
round-robin scheduling.  That isn't too bad for sampling counters, but
for counting counters it means that the value read from a counter
represents some unknown fraction of the true count of events that
occurred while the counter was enabled.

This remedies the situation by keeping track of how long each counter
is enabled for, and how long it is actually on the cpu and counting
events.  These times are recorded in nanoseconds using the task clock
for per-task counters and the cpu clock for per-cpu counters.

These values can be supplied to userspace on a read from the counter.
Userspace requests that they be supplied after the counter value by
setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or
PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field
when creating the counter.  (There is no way to change the read format
after the counter is created, though it would be possible to add some
way to do that.)

Using this information it is possible for userspace to scale the count
it reads from the counter to get an estimate of the true count:

true_count_estimate = count * total_time_enabled / total_time_running

This also lets userspace detect the situation where the counter never
got to go on the cpu: total_time_running == 0.

This functionality has been requested by the PAPI developers, and will
be generally needed for interpreting the count values from counting
counters correctly.

In the implementation, this keeps 5 time values (in nanoseconds) for
each counter: total_time_enabled and total_time_running are used when
the counter is in state OFF or ERROR and for reporting back to
userspace.  When the counter is in state INACTIVE or ACTIVE, it is the
tstamp_enabled, tstamp_running and tstamp_stopped values that are
relevant, and total_time_enabled and total_time_running are determined
from them.  (tstamp_stopped is only used in INACTIVE state.)  The
reason for doing it like this is that it means that only counters
being enabled or disabled at sched-in and sched-out time need to be
updated.  There are no new loops that iterate over all counters to
update total_time_enabled or total_time_running.

This also keeps separate child_total_time_running and
child_total_time_enabled fields that get added in when reporting the
totals to userspace.  They are separate fields so that they can be
atomic.  We don't want to use atomics for total_time_running,
total_time_enabled etc., because then we would have to use atomic
sequences to update them, which are slower than regular arithmetic and
memory accesses.

It is possible to measure total_time_running by adding a task_clock
counter to each group of counters, and total_time_enabled can be
measured approximately with a top-level task_clock counter (though
inaccuracies will creep in if you need to disable and enable groups
since it is not possible in general to disable/enable the top-level
task_clock counter simultaneously with another group).  However, that
adds extra overhead - I measured around 15% increase in the context
switch latency reported by lat_ctx (from lmbench) when a task_clock
counter was added to each of 2 groups, and around 25% increase when a
task_clock counter was added to each of 4 groups.  (In both cases a
top-level task-clock counter was also added.)

In contrast, the code added in this commit gives better information
with no overhead that I could measure (in fact in some cases I
measured lower times with this code, but the differences were all less
than one standard deviation).

[ v2: address review comments by Andrew Morton. ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:36 +02:00
Peter Zijlstra
7b732a7504 perf_counter: new output ABI - part 1
Impact: Rework the perfcounter output ABI

use sys_read() only for instant data and provide mmap() output for all
async overflow data.

The first mmap() determines the size of the output buffer. The mmap()
size must be a PAGE_SIZE multiple of 1+pages, where pages must be a
power of 2 or 0. Further mmap()s of the same fd must have the same
size. Once all maps are gone, you can again mmap() with a new size.

In case of 0 extra pages there is no data output and the first page
only contains meta data.

When there are data pages, a poll() event will be generated for each
full page of data. Furthermore, the output is circular. This means
that although 1 page is a valid configuration, its useless, since
we'll start overwriting it the instant we report a full page.

Future work will focus on the output format (currently maintained)
where we'll likey want each entry denoted by a header which includes a
type and length.

Further future work will allow to splice() the fd, also containing the
async overflow data -- splice() would be mutually exclusive with
mmap() of the data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090323172417.470536358@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:27 +02:00
Paul Mackerras
37d8182838 perf_counter: add an mmap method to allow userspace to read hardware counters
Impact: new feature giving performance improvement

This adds the ability for userspace to do an mmap on a hardware counter
fd and get access to a read-only page that contains the information
needed to translate a hardware counter value to the full 64-bit
counter value that would be returned by a read on the fd.  This is
useful on architectures that allow user programs to read the hardware
counters, such as PowerPC.

The mmap will only succeed if the counter is a hardware counter
monitoring the current process.

On my quad 2.5GHz PowerPC 970MP machine, userspace can read a counter
and translate it to the full 64-bit value in about 30ns using the
mmapped page, compared to about 830ns for the read syscall on the
counter, so this does give a significant performance improvement.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Orig-LKML-Reference: <20090323172417.297057964@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:26 +02:00
Peter Zijlstra
f4a2deb486 perf_counter: remove the event config bitfields
Since the bitfields turned into a bit of a mess, remove them and rely on
good old masks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090323172417.059499915@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:25 +02:00
Paul Mackerras
9aaa131a27 perf_counter: fix type/event_id layout on big-endian systems
Impact: build fix for powerpc

Commit db3a944aca35ae61 ("perf_counter: revamp syscall input ABI")
expanded the hw_event.type field into a union of structs containing
bitfields.  In particular it introduced a type field and a raw_type
field, with the intention that the 1-bit raw_type field should
overlay the most-significant bit of the 8-bit type field, and in fact
perf_counter_alloc() now assumes that (or at least, assumes that
raw_type doesn't overlay any of the bits that are 1 in the values of
PERF_TYPE_{HARDWARE,SOFTWARE,TRACEPOINT}).

Unfortunately this is not true on big-endian systems such as PowerPC,
where bitfields are laid out from left to right, i.e. from most
significant bit to least significant.  This means that setting
hw_event.type = PERF_TYPE_SOFTWARE will set hw_event.raw_type to 1.

This fixes it by making the layout depend on whether or not
__BIG_ENDIAN_BITFIELD is defined.  It's a bit ugly, but that's what
we get for using bitfields in a user/kernel ABI.

Also, that commit didn't fix up some places in arch/powerpc/kernel/
perf_counter.c where hw_event.raw and hw_event.event_id were used.
This fixes them too.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-06 09:30:18 +02:00
Paul Mackerras
db4fb5acf2 perf_counter: powerpc: clean up perc_counter_interrupt
Impact: cleanup

This updates the powerpc perf_counter_interrupt following on from the
"perf_counter: unify irq output code" patch.  Since we now use the
generic perf_counter_output code, which sets the perf_counter_pending
flag directly, we no longer need the need_wakeup variable.

This removes need_wakeup and makes perf_counter_interrupt use
get_perf_counter_pending() instead.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194234.024464535@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:18 +02:00
Peter Zijlstra
0322cd6ec5 perf_counter: unify irq output code
Impact: cleanup

Having 3 slightly different copies of the same code around does nobody
any good. First step in revamping the output format.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.929962222@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:17 +02:00
Peter Zijlstra
b8e83514b6 perf_counter: revamp syscall input ABI
Impact: modify ABI

The hardware/software classification in hw_event->type became a little
strained due to the addition of tracepoint tracing.

Instead split up the field and provide a type field to explicitly specify
the counter type, while using the event_id field to specify which event to
use.

Raw counters still work as before, only the raw config now goes into
raw_event.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.836807573@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:30:17 +02:00
Paul Mackerras
b6c5a71da1 perf_counter: abstract wakeup flag setting in core to fix powerpc build
Impact: build fix for powerpc

Commit bd753921015e7905 ("perf_counter: software counter event
infrastructure") introduced a use of TIF_PERF_COUNTERS into the core
perfcounter code.  This breaks the build on powerpc because we use
a flag in a per-cpu area to signal wakeups on powerpc rather than
a thread_info flag, because the thread_info flags have to be
manipulated with atomic operations and are thus slower than per-cpu
flags.

This fixes the by changing the core to use an abstracted
set_perf_counter_pending() function, which is defined on x86 to set
the TIF_PERF_COUNTERS flag and on powerpc to set the per-cpu flag
(paca->perf_counter_pending).  It changes the previous powerpc
definition of set_perf_counter_pending to not take an argument and
adds a clear_perf_counter_pending, so as to simplify the definition
on x86.

On x86, set_perf_counter_pending() is defined as a macro.  Defining
it as a static inline in arch/x86/include/asm/perf_counters.h causes
compile failures because <asm/perf_counters.h> gets included early in
<linux/sched.h>, and the definitions of set_tsk_thread_flag etc. are
therefore not available in <asm/perf_counters.h>.  (On powerpc this
problem is avoided by defining set_perf_counter_pending etc. in
<asm/hw_irq.h>.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-06 09:30:14 +02:00
Ingo Molnar
f541ae326f Merge branch 'linus' into perfcounters/core-v2
Merge reason: we have gathered quite a few conflicts, need to merge upstream

Conflicts:
	arch/powerpc/kernel/Makefile
	arch/x86/ia32/ia32entry.S
	arch/x86/include/asm/hardirq.h
	arch/x86/include/asm/unistd_32.h
	arch/x86/include/asm/unistd_64.h
	arch/x86/kernel/cpu/common.c
	arch/x86/kernel/irq.c
	arch/x86/kernel/syscall_table_32.S
	arch/x86/mm/iomap_32.c
	include/linux/sched.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06 09:02:57 +02:00
Linus Torvalds
714f83d5d9 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
  tracing, net: fix net tree and tracing tree merge interaction
  tracing, powerpc: fix powerpc tree and tracing tree interaction
  ring-buffer: do not remove reader page from list on ring buffer free
  function-graph: allow unregistering twice
  trace: make argument 'mem' of trace_seq_putmem() const
  tracing: add missing 'extern' keywords to trace_output.h
  tracing: provide trace_seq_reserve()
  blktrace: print out BLK_TN_MESSAGE properly
  blktrace: extract duplidate code
  blktrace: fix memory leak when freeing struct blk_io_trace
  blktrace: fix blk_probes_ref chaos
  blktrace: make classic output more classic
  blktrace: fix off-by-one bug
  blktrace: fix the original blktrace
  blktrace: fix a race when creating blk_tree_root in debugfs
  blktrace: fix timestamp in binary output
  tracing, Text Edit Lock: cleanup
  tracing: filter fix for TRACE_EVENT_FORMAT events
  ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
  x86: kretprobe-booster interrupt emulation code fix
  ...

Fix up trivial conflicts in
 arch/parisc/include/asm/ftrace.h
 include/linux/memory.h
 kernel/extable.c
 kernel/module.c
2009-04-05 11:04:19 -07:00
Linus Torvalds
bad6a5c08c Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/rtc-parisc
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/rtc-parisc:
  powerpc/ps3: Add rtc-ps3
  powerpc: Hook up rtc-generic, and kill rtc-ppc
  m68k: Hook up rtc-generic
  parisc: rtc: Rename rtc-parisc to rtc-generic
  parisc: rtc: Add missing module alias
  parisc: rtc: platform_driver_probe() fixups
  parisc: rtc: get_rtc_time() returns unsigned int
2009-04-03 09:51:35 -07:00
Alexey Dobriyan
6f2c55b843 Simplify copy_thread()
First argument unused since 2.3.11.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:51 -07:00
Jean Delvare
bf6aede712 workqueue: add to_delayed_work() helper function
It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in.  In particular, all
delayed works which want to rearm themselves will have to do that.  So it
would seem fair to offer a helper function for this operation.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:50 -07:00
Geert Uytterhoeven
bcd68a70cb powerpc: Hook up rtc-generic, and kill rtc-ppc
PowerPC has been a long time user of the generic RTC abstraction, so hook up
rtc-generic:
  - Create the "rtc-generic" platform device if ppc_md.get_rtc_time is set,
  - Kill rtc-ppc, as rtc-generic offers the same functionality in a more
    generic way, and supports autoloading through udev.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2009-04-02 01:05:31 +00:00
Stephen Rothwell
a095bdbb13 tracing, powerpc: fix powerpc tree and tracing tree interaction
Today's linux-next build (powerpc allyesconfig) failed like this:

arch/powerpc/kernel/ftrace.c: In function 'prepare_ftrace_return':
arch/powerpc/kernel/ftrace.c:612: warning: passing argument 3 of 'ftrace_push_return_trace' makes pointer from integer without a cast
arch/powerpc/kernel/ftrace.c:612: error: too many arguments to function 'ftrace_push_return_trace'

Caused by commit 5d1a03dc54
("function-graph: moved the timestamp from arch to generic code") from
the tracing tree which (removed an argument from
ftrace_push_return_trace()) interacting with commit
6794c78243 ("powerpc64: port of the
function graph tracer") from the powerpc tree.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090327230834.93d0221d.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-02 00:50:24 +02:00
Linus Torvalds
e76e5b2c66 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
  PCI: fix HT MSI mapping fix
  PCI: don't enable too much HT MSI mapping
  x86/PCI: make pci=lastbus=255 work when acpi is on
  PCI: save and restore PCIe 2.0 registers
  PCI: update fakephp for bus_id removal
  PCI: fix kernel oops on bridge removal
  PCI: fix conflict between SR-IOV and config space sizing
  powerpc/PCI: include pci.h in powerpc MSI implementation
  PCI Hotplug: schedule fakephp for feature removal
  PCI Hotplug: rename legacy_fakephp to fakephp
  PCI Hotplug: restore fakephp interface with complete reimplementation
  PCI: Introduce /sys/bus/pci/devices/.../rescan
  PCI: Introduce /sys/bus/pci/devices/.../remove
  PCI: Introduce /sys/bus/pci/rescan
  PCI: Introduce pci_rescan_bus()
  PCI: do not enable bridges more than once
  PCI: do not initialize bridges more than once
  PCI: always scan child buses
  PCI: pci_scan_slot() returns newly found devices
  PCI: don't scan existing devices
  ...

Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
2009-04-01 09:47:12 -07:00
Alexey Dobriyan
99b7623380 proc 2/2: remove struct proc_dir_entry::owner
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.

We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.

But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.

->read_proc/->write_proc were just fixed to not require ->owner for
protection.

rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.

Removing ->owner will also make PDE smaller.

So, let's nuke it.

Kudos to Jeff Layton for reminding about this, let's say, oversight.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-03-31 01:14:44 +04:00
Benjamin Herrenschmidt
9ff9a26b78 Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/elf.h
	drivers/i2c/busses/i2c-mpc.c
2009-03-30 14:04:53 +11:00
Ingo Molnar
6e15cf0486 Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2
Conflicts:
	arch/parisc/kernel/irq.c
	arch/x86/include/asm/fixmap_64.h
	arch/x86/include/asm/setup.h
	kernel/irq/handle.c

Semantic merge:
        arch/x86/include/asm/fixmap.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-27 17:28:43 +01:00
Benjamin Herrenschmidt
ec78c8ac16 powerpc: Fix bugs introduced by sysfs changes
Rusty's patch to change our sysfs access to various registers
to use smp_call_function_single() introduced a whole bunch of
warnings. This fixes them. This version also fixes an actual
bug in here where it did mtspr instead of mfspr when reading
the files

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-27 16:58:24 +11:00
Josh Boyer
efbda86098 powerpc: Sanitize stack pointer in signal handling code
On powerpc64 machines running 32-bit userspace, we can get garbage bits in the
stack pointer passed into the kernel.  Most places handle this correctly, but
the signal handling code uses the passed value directly for allocating signal
stack frames.

This fixes the issue by introducing a get_clean_sp function that returns a
sanitized stack pointer.  For 32-bit tasks on a 64-bit kernel, the stack
pointer is masked correctly.  In all other cases, the stack pointer is simply
returned.

Additionally, we pass an 'is_32' parameter to get_sigframe now in order to
get the properly sanitized stack.  The callers are know to be 32 or 64-bit
statically.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-27 16:58:24 +11:00
Linus Torvalds
a8416961d3 Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
  x86: disable __do_IRQ support
  sparseirq, powerpc/cell: fix unused variable warning in interrupt.c
  genirq: deprecate obsolete typedefs and defines
  genirq: deprecate __do_IRQ
  genirq: add doc to struct irqaction
  genirq: use kzalloc instead of explicit zero initialization
  genirq: make irqreturn_t an enum
  genirq: remove redundant if condition
  genirq: remove unused hw_irq_controller typedef
  irq: export remove_irq() and setup_irq() symbols
  irq: match remove_irq() args with setup_irq()
  irq: add remove_irq() for freeing of setup_irq() irqs
  genirq: assert that irq handlers are indeed running in hardirq context
  irq: name 'p' variables a bit better
  irq: further clean up the free_irq() code flow
  irq: refactor and clean up the free_irq() code flow
  irq: clean up manage.c
  irq: use GFP_KERNEL for action allocation in request_irq()
  kernel/irq: fix sparse warning: make symbol static
  irq: optimize init_kstat_irqs/init_copy_kstat_irqs
  ...
2009-03-26 16:06:50 -07:00
Jesse Barnes
ceb93a9ff1 powerpc/PCI: include pci.h in powerpc MSI implementation
This file uses PCI MSI defines and so needs pci.h.

Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-25 08:54:29 -07:00
Hollis Blanchard
366d4b9b9f KVM: ppc: No need to include core-header for KVM in asm-offsets.c currently
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:58 +02:00
Benjamin Herrenschmidt
757c74d298 powerpc/mm: Introduce early_init_mmu() on 64-bit
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:34 +11:00
Kumar Gala
2319f12395 powerpc/mm: e300c2/c3/c4 TLB errata workaround
Complete workaround for DTLB errata in e300c2/c3/c4 processors.

Due to the bug, the hardware-implemented LRU algorythm always goes to way
1 of the TLB. This fix implements the proposed software workaround in
form of a LRW table for chosing the TLB-way.

Based on patch from David Jander <david@protonic.nl>

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:32 +11:00
Kumar Gala
eb3436a013 powerpc/mm: Used free register to save a few cycles in SW TLB miss handling
Now that r0 is free we can keep the value of I/DMISS in r3 and not reload
it before doing the tlbli/d.  This saves us a few cycles in the fast path
case.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:31 +11:00
Kumar Gala
00fcb14703 powerpc/mm: Remove unused register usage in SW TLB miss handling
Long ago we had some code that actually used the CTR in the SW TLB
miss handlers (603/e300).  Since we don't use it no reason to waste
cycles saving it off and restoring it (we actually didn't restore it
in the fast path case).

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:31 +11:00
Kumar Gala
d746286c1f powerpc: setup default archdata for {of_}platform via bus_register_notifier
Since a number of powerpc chips are SoCs we end up having dma-able
devices that are registered as platform or of_platform devices.  We need
to hook the archdata to setup proper dma_ops for these devices.

Rather than having to add a bus_notify to each platform we add a default
one at the highest priority (called first) to set the default dma_ops for
of_platform and platform devices to dma_direct_ops.  This allows platform
code to override the ops by providing their own notifier call back.

In the future to enable >4G DMA support on ppc32 we can hook swiotlb ops.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:30 +11:00
Kumar Gala
32ac57668d powerpc/pci: Default to dma_direct_ops for pci dma_ops
This will allow us to remove the ppc32 specific checks in get_dma_ops()
that defaults to dma_direct_ops if the archdata is NULL.  We really
should always have archdata set to something going forward.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:30 +11:00
Rusty Russell
9a3719341a powerpc: Make sysfs code use smp_call_function_single
Impact: performance improvement

This fixes 'powerpc: avoid cpumask games in arch/powerpc/kernel/sysfs.c'
which talked about using smp_call_function_single, but actually used
work_on_cpu (an older version of the patch).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:47:27 +11:00
Benjamin Herrenschmidt
151a9f4aef powerpc: Fix prom_init on 32-bit OF machines
Commit e7943fbbfd broke ppc32 using
Open Firmware client interface due to using the wrong relocation
macro when accessing the variable "linux_banner".

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-24 13:43:35 +11:00
Benjamin Herrenschmidt
9e41d9597e Merge commit 'origin/master' into next 2009-03-24 13:38:30 +11:00
Kumar Gala
345953cf9a powerpc/mm: Fix Respect _PAGE_COHERENT on classic ppc32 SW TLB load machines
Grant picked up the wrong version of "Respect _PAGE_COHERENT on classic
ppc32 SW" (commit a4bd6a93c3)

It was missing the code to actually deal with the fixup of
_PAGE_COHERENT based on the CPU feature.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-03-23 08:38:26 -05:00
Ingo Molnar
b3e3b302cf Merge branches 'irq/sparseirq' and 'linus' into irq/core 2009-03-23 10:07:49 +01:00
Matthew Wilcox
1c8d7b0a56 PCI MSI: Add support for multiple MSI
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block.  Ensure that the architecture back ends don't
have to know about multiple MSI.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 10:48:14 -07:00
Kumar Gala
a4bd6a93c3 powerpc/mm: Respect _PAGE_COHERENT on classic ppc32 SW
Since we now set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing
it out before we setup the SW TLB.  Today all the SW TLB machines
(603/e300) that we support are non-SMP, however there are some errata on
some devices that cause us to set _PAGE_COHERENT via CPU_FTR_NEED_COHERENT.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-03-17 09:17:50 -06:00
Ingo Molnar
edb35028e4 Merge branches 'irq/genirq' and 'linus' into irq/core 2009-03-16 09:20:13 +01:00
Benjamin Herrenschmidt
28794d34ec powerpc/kconfig: Kill PPC_MULTIPLATFORM
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.

This removes it along with the following changes:

 - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
   on 6xx which is what they want anyway.

 - A new symbol, PPC_BOOK3S, is defined that represent compliance with
   the "Server" variant of the architecture. This is set when either 6xx
   or PPC64 is set and open the door for future BOOK3E 64-bit.

 - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
   PPC64 && PPC_BOOK3S

 - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
   used to control the use of prom_init.c

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:35 +11:00
Thomas Gleixner
97f7d6bcc1 powerpc/irq: Convert obsolete irq_desc_t to struct irq_desc
Impact: cleanup

Convert the last remaining users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:34 +11:00
Andrew Klossner
af9c724907 powerpc/udbg: Fix lost byte during console handover; change LFCR to CRLF
When the console is on a serial port to be driven by serial8250, a
character can be lost from the end of the first line in the two-line
sequence

	serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
	console handover: boot [udbg0] -> real [ttyS0]

This happens because udbg_puts or udbg_write stuff the last byte of
the line into the Tx FIFO and return, whereupon the serial8250
initialization code immediately empties that FIFO.  The fix: udbg_puts
and udbg_write now wait for the Tx FIFO to clear before returning.
This delays the system by one additional serial frame time for each
line written by udbg, but the effect is not noticeable, a cumulative
17 milliseconds for 200 lines of early printk output at 115200 baud.

Also, the routines in udbg_16550.c now emit CRLF instead of LFCR.
Linux makes a point of emitting CRLF because, when serial output is
captured to a file, LFCR sequences can confuse text editors.  See
http://lkml.org/lkml/2006/2/4/50 for some history.

Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:34 +11:00
Wolfram Sang
a77acda0b7 powerpc/pci: Fix typo: s/resouces/resources/ in a pr_debug
Fix typo: s/resouces/resources/ in a pr_debug

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:34 +11:00
Michael Ellerman
e7943fbbfd powerpc: Print linux_banner in prom_init
So at least you can see what kernel you're booting if you die
before the kernel prints it mid-way through start_kernel().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:33 +11:00
Octavian Purdila
7c9583a4db powerpc/oprofile: Enable support for ppc750 processors
This patch enables oprofile for all 3 FX variants and GX variant of the
750 processor.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:11:32 +11:00
Michael Ellerman
9e1e3723be powerpc: Remove unused asm-offsets entries for cpu_spec
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:10:15 +11:00
Michael Ellerman
2657dd4e30 powerpc: Make sure we copy all cpu_spec features except PMC related ones
When identify_cpu() is called a second time with a logical PVR, it
only copies a subset of the cpu_spec fields so as to avoid overwriting
the performance monitor fields that were initialized based on the
real PVR.

However some of the other, non performance monitor related fields are
also not copied:
 * pvr_mask
 * pvr_value
 * mmu_features
 * machine_check

The fact that pvr_mask is not copied can result in show_cpuinfo()
showing the cpu as "unknown", if we override an unknown PVR with a
logical one - as reported by Shaggy.

So change the logic to copy all fields, and then put back the PMC
related ones in the case that we're overwriting a real PVR with a
logical one.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:10:14 +11:00
Michael Ellerman
666435bbf3 powerpc: Deindentify identify_cpu()
The for-loop body of identify_cpu() has gotten a little big, so move the
loop body logic into a separate function. No other changes.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11 17:10:14 +11:00
Tejun Heo
19390c4d03 linker script: define __per_cpu_load on all SMP capable archs
Impact: __per_cpu_load available on all SMP capable archs

Percpu now requires three symbols to be defined - __per_cpu_load,
__per_cpu_start and __per_cpu_end.  There were three archs which
didn't have it.  Update them as follows.

* powerpc: can use generic PERCPU() macro.  Compile tested for
  powerpc32, compile/boot tested for powerpc64.

* ia64: can use generic PERCPU_VADDR() macro.  __phys_per_cpu_start is
  identical to __per_cpu_load.  Compile tested and symbol table looks
  identical after the change except for the additional __per_cpu_load.

* arm: added explicit __per_cpu_load definition.  Currently uses
  unified .init output section so can't use the generic macro.  Dunno
  whether the unified .init ouput section is required by arch
  peculiarity so I left it alone.  Please break it up and use PERCPU()
  if possible.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Pat Gefre <pfg@sgi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
2009-03-10 16:27:48 +09:00
Kumar Gala
c3071951d0 powerpc/fsl-booke: Add support for tlbilx instructions
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-03-09 09:25:38 -05:00
Paul Mackerras
880860e392 perfcounters/powerpc: add support for POWER4 processors
Impact: more hardware support

This adds the back-end for the PMU on the POWER4 and POWER4+ processors
(GP and GQ).  This is quite similar to the PPC970, with 8 PMCs, but has
fewer events than the PPC970.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-03-06 16:30:57 +11:00
Paul Mackerras
aabbaa6036 perfcounters/powerpc: add support for POWER5+ processors
Impact: more hardware support

This adds the back-end for the PMU on the POWER5+ processors (i.e. GS,
including GS DD3 aka POWER5++).  This doesn't use the fixed-function
PMC5 and PMC6 since they don't respect the freeze conditions and don't
generate interrupts, as on POWER6.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-03-06 16:28:37 +11:00
Paul Mackerras
86028598de perfcounters/powerpc: fix oops with multiple counters in a group
Impact: fix oops-causing bug

This fixes a bug in the powerpc hw_perf_counter_init where the code
didn't initialize ctrs[n] before passing the ctrs array to check_excludes,
leading to possible oopses and other incorrect behaviour.  This fixes it
by initializing ctrs[n] correctly.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-03-06 08:07:13 +11:00
Ingo Molnar
8163d88c79 Merge commit 'v2.6.29-rc7' into perfcounters/core
Conflicts:
	arch/x86/mm/iomap_32.c
2009-03-04 11:42:31 +01:00
Benjamin Herrenschmidt
652e8f8d57 Merge commit 'jwb/next' into next 2009-03-03 13:30:03 +11:00
Ingo Molnar
55f2b78995 Merge branch 'x86/urgent' into x86/pat 2009-03-01 12:47:58 +01:00
Ingo Molnar
8e818179eb Merge branch 'x86/core' into perfcounters/core
Conflicts:
	arch/x86/kernel/apic/apic.c
	arch/x86/kernel/irqinit_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 13:02:23 +01:00
Paul Mackerras
742bd95ba9 perfcounters/powerpc: Add support for POWER5 processors
This adds the back-end for the PMU on the POWER5 processor.  This knows
how to use the fixed-function PMC5 and PMC6 (instructions completed and
run cycles).  Unlike POWER6, PMC5/6 obey the freeze conditions and can
generate interrupts, so their use doesn't impose any extra restrictions.

POWER5+ is different and is not supported by this patch.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-02-26 15:36:48 +11:00
Michael Neuling
49f297f8df powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct.  Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.

Below fixes this and merges the little/big endian case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Paul Mackerras
d095cd46da perfcounters/powerpc: Make exclude_kernel bit work on Apple G5 processors
Currently, setting hw_event.exclude_kernel does nothing on the PPC970
variants used in Apple G5 machines, because they have the HV (hypervisor)
bit in the MSR forced to 1, so as far as the PMU is concerned, the
kernel runs in hypervisor mode.  Thus we have to use the MMCR0_FCHV
(freeze counters in hypervisor mode) bit rather than the MMCR0_FCS
(freeze counters in supervisor mode) bit.

This checks the MSR.HV bit at startup, and if it is set, we set the
freeze_counters_kernel variable to MMCR0_FCHV (it was initialized to
MMCR0_FCS).  We then use that whenever we need to exclude kernel events.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-02-23 23:01:28 +11:00
Anton Blanchard
501cb16d3c powerpc: Randomise PIEs
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:21 +11:00
Anton Blanchard
912f9ee21c powerpc: Randomise the brk region
Randomize the heap.

before:
tundro2:~ # sleep 1 & cat /proc/${!}/maps | grep heap
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]
10017000-10118000 rw-p 10017000 00:00 0                                  [heap]

after
tundro2:~ # sleep 1 & cat /proc/${!}/maps | grep heap
19419000-1951a000 rw-p 19419000 00:00 0                                  [heap]
325ff000-32700000 rw-p 325ff000 00:00 0                                  [heap]
1a97c000-1aa7d000 rw-p 1a97c000 00:00 0                                  [heap]
1cc60000-1cd61000 rw-p 1cc60000 00:00 0                                  [heap]
1afa9000-1b0aa000 rw-p 1afa9000 00:00 0                                  [heap]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:20 +11:00
Anton Blanchard
d839088cae powerpc: Randomise lower bits of stack address
Randomise the lower bits of the stack address. More randomisation is good for
security but the scatter can also help with SMT threads that share an L1. A
quick test case shows this working:

int main()
{
	int sp;
	printf("%x\n", (unsigned long)&sp & 4095);
}

before:
80
80
80
80
80

after:
610
490
300
6b0
d80

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:20 +11:00
Anton Blanchard
a465f9b694 powerpc: Move is_32bit_task
Move is_32bit_task into asm/thread_info.h, that allows us to test for
32/64bit tasks without an ugly CONFIG_PPC64 ifdef.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:06 +11:00
Michael Neuling
553631e25f powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct.  Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.

Below fixes this and merges the little/big endian case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:05 +11:00
Michael Neuling
545bba1824 powerpc: Add alignment handler for new lfiwzx instruction
lfiwzx is a new floating point load instruction in 2.06 that needs an
alignment handler for Linux.

Turns out to be the worlds easiest handler to add.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Brian King
f52862f407 powerpc/pseries: Fix partition migration hang under load
While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:04 +11:00
Kumar Gala
620165f971 powerpc: Add support for using doorbells for SMP IPI
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture.  We use the normal level doorbell for
doing SMP IPIs at this point.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 15:53:03 +11:00
Tom Arbuckle
f81786913a powerpc/pci: Fix PCI<->OF matching of old style multifunc devices
Old OF variants used to create a 'dummy' parent node "multifunc-device"
for devices with more than one PCI function. Our code that matches OF
nodes to PCI devices dealt with that in one place but not in another,
this fixes it.

This has the practical effect of fixing interrupt routing of multifunction
PCI cards on some older PowerMac machines.

Signed-off-by: Tom Arbuckle <tom.d.arbuckle@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:57 +11:00
Kumar Gala
16c57b3620 powerpc: Unify opcode definitions and support
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.

We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.

Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:56 +11:00
Steven Rostedt
bb9b903527 powerpc, ftrace: use create_branch lib function
Impact: clean up, remove duplicate code

When ftrace was first ported to PowerPC, there existed a
create_function_call that would create the instruction to make a call
to a given address. Unfortunately, this call expected to write to
the address it was given, and since it used the address to calculate
the offset, it could not be faked.

ftrace needed a way to create the instruction without actually writing
that instruction to the text section. So ftrace had to implement its
own code.

Now we have create_branch in the code patching library, which does
exactly what ftrace needs. This patch replaces ftrace's implementation
with the library function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:56 +11:00
Steven Rostedt
b54dcfe108 powerpc, ftrace: use unsigned int for instruction manipulation
The original port of ftrace to PowerPC kept a lot of the code used
by x86. Some of this code was to handle x86's 5 byte instruction.
This was handled by using character arrays to manipulate the
code.

PowerPC has a consistent 4 byte instruction. Using unsigned ints
makes the code more efficient as well as more readable.
By converting to use unsigned ints to represent instructions,
I was able to remove the side effects that were needed for
manipulating character strings.

  i.e. memcpy and memcmp

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:55 +11:00
Steven Rostedt
60ce8f7260 powerpc32, ftrace: dynamic function graph tracer
This patch gets function graph tracing working with dynamic function
tracer on PowerPC32.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:55 +11:00
Steven Rostedt
fad4f47cc8 powerpc32, ftrace: port function graph tracer to ppc32, static only
This patch ports the function graph tracer for PowerPC, but only
for static function tracing.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:55 +11:00
Steven Rostedt
bf528a3a9b powerpc32, ftrace: save and restore mcount regs with macro
Impact: clean up

Use a macro to save and restore the registers for PowerPC32,
since that code is duplicated.

This is similar to the work done by Cyrill Gorcunov for the
mcount code in x86_64.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt
bb7253403f powerpc64, ftrace: save toc only on modules for function graph
The TOCS used by modules are different than the one used by
the core kernel code. The function graph tracer must save and
restore the TOC whenever it traces a module call. But this
is an added overhead to burden the majority of core kernel
code being traced.

Benjamin Herrenschmidt suggested in testing the entry of
the call to tell if it is a core kernel function or a module.
He recommended using the REGION_ID() macro to perform this test.

This patch implements Benjamin's idea, and uses a different
return_to_handler routine dependent on if the entry is a core
kernel function or not. The module version saves the TOC, where as
the core kernel version does not.

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt
4654288847 powerpc64, tracing: add function graph tracer with dynamic tracing
This is the port of the function graph tracer to PowerPC with
dynamic tracing.

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:54 +11:00
Steven Rostedt
6794c78243 powerpc64: port of the function graph tracer
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.

This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.

The trace produces a visual calling of functions:

 # tracer: function_graph
 #
 # CPU  DURATION                  FUNCTION CALLS
 # |     |   |                     |   |   |   |
  0)   2.224 us    |                        }
  0) ! 271.024 us  |                      }
  0) ! 320.080 us  |                    }
  0) ! 324.656 us  |                  }
  0) ! 329.136 us  |                }
  0)               |                .put_prev_task_fair() {
  0)               |                  .update_curr() {
  0)   2.240 us    |                    .update_min_vruntime();
  0)   6.512 us    |                  }
  0)   2.528 us    |                  .__enqueue_entity();
  0) + 15.536 us   |                }
  0)               |                .pick_next_task_fair() {
  0)   2.032 us    |                  .__pick_next_entity();
  0)   2.064 us    |                  .__clear_buddies();
  0)               |                  .set_next_entity() {
  0)   2.672 us    |                    .__dequeue_entity();
  0)   6.864 us    |                  }

Geoff Lavand tested on PS3.

Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:53 +11:00
Steven Rostedt
17be5b3ddf powerpc, ftrace: fix compile error when modules not configured
Michael Neuling reported a compile bug when dynamic ftrace was
configured in and modules were not. This was due to the ftrace
code referencing module specific structures.

Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:53 +11:00
Steven Rostedt
44e1d064b9 ftrace, powerpc: replace debug macro with proper pr_deug
Impact: cleanup

The PowerPC ftrace code uses a hacked up DEBUGP macro for prints.
This patch converts it to the standard pr_debug.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23 10:48:52 +11:00
Ingo Molnar
fc6fc7f1b1 Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic conflict resolution:
	arch/x86/kernel/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:05:19 +01:00
Benjamin Herrenschmidt
3b7faeb49e Merge commit 'kumar/next' into next 2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt
82a0a1cc8f Merge commit 'origin/master' into next
Manual merge of:
	arch/powerpc/include/asm/pgtable-ppc32.h
2009-02-18 13:19:25 +11:00
Madhulika Madishetty
6c71209023 AMCC PPC 460SX redwood SoC platform initial framework
This patch contains initial framework for the AMCC Redwood board.

Signed-off-by: Madhulika Madishetty <mmadishetty@amcc.com>
Signed-off-by: Tirumala Marri <tmarri@amcc.com>
Signed-off-by: Feng Kan <fkan@amcc.com>
Signed-off-by: Vidhyananth Venkatasamy <vvenkatasamy@amcc.com>
Signed-off-by: Preetesh Parekh <pparekh@amcc.com>
Acked-by: Loc Ho <lho@amcc.com>
Acked-by: Feng Kan <fkan@amcc.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-14 14:41:29 -05:00
Yuri Tikhonov
e12401222f powerpc/44x: Support for 256KB PAGE_SIZE
This patch adds support for 256KB pages on ppc44x-based boards.

For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).

Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).

With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.

Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:

	--- binutils/bfd/elf32-ppc.c.orig
	+++ binutils/bfd/elf32-ppc.c

	-#define ELF_MAXPAGESIZE                0x10000
	+#define ELF_MAXPAGESIZE                0x40000

One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
 http://lkml.org/lkml/2008/12/19/20

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-14 14:40:04 -05:00
Ingo Molnar
8f8573ae9f Merge branches 'irq/genirq', 'irq/sparseirq' and 'irq/urgent' into irq/core 2009-02-13 11:57:18 +01:00
Ingo Molnar
f8a6b2b9ce Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/kernel/acpi/boot.c
	arch/x86/mm/fault.c
2009-02-13 09:44:22 +01:00
Ingo Molnar
e9c4ffb11f Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/kernel/acpi/boot.c
2009-02-13 09:34:07 +01:00
Michael Neuling
26456dcfb8 powerpc/vsx: Fix VSX alignment handler for regs 32-63
Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
in the VMX part of the thread_struct not the FPR part.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-13 16:37:45 +11:00
Kumar Gala
70fe3af840 powerpc/book-3e: Introduce concept of Book-3e MMU
The Power ISA 2.06 spec introduces a standard MMU programming model that
is based on the Freescale Book-E MMU programing model.  The Freescale
version is pretty backwards compatiable with the ISA 2.06 definition so
we are starting to refactor some of the Freescale code so it can be
easily shared.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:51:33 -06:00
Kumar Gala
d66c82ea45 powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture.  Its done it such a way to be code compatiable with the
existing HW.  Made the minor code changes to support both power of two
and power of four page sizes.  Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA.  Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.

Note, its still invalid to try and use a page size that isn't supported
by cpu.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-02-12 16:37:11 -06:00
Ingo Molnar
ffc0467293 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters into perfcounters/core 2009-02-11 09:22:14 +01:00
Ingo Molnar
95fd4845ed Merge commit 'v2.6.29-rc4' into perfcounters/core
Conflicts:
	arch/x86/kernel/setup_percpu.c
	arch/x86/mm/fault.c
	drivers/acpi/processor_idle.c
	kernel/irq/handle.c
2009-02-11 09:22:04 +01:00
Milton Miller
c3bd517de6 powerpc/pci: Move hose_list and pci_address_to_pio to pci-common
move the definition of hose_list next to its hotplug spinlock.

create pcibios_io_size to encapsulate ifdef in existing pci-common
function pcibios_vaddr_is_ioport

move pci_address_to_pio to pci-common, using new pcibios_io_size, and
protect this GPL exported function against concurrent hotplug removal

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 16:00:07 +11:00
Paul Mackerras
0475f9ea8e perf_counters: allow users to count user, kernel and/or hypervisor events
Impact: new perf_counter feature

This extends the perf_counter_hw_event struct with bits that specify
that events in user, kernel and/or hypervisor mode should not be
counted (i.e. should be excluded), and adds code to program the PMU
mode selection bits accordingly on x86 and powerpc.

For software counters, we don't currently have the infrastructure to
distinguish which mode an event occurs in, so we currently fail the
counter initialization if the setting of the hw_event.exclude_* bits
would require us to distinguish.  Context switches and CPU migrations
are currently considered to occur in kernel mode.

On x86, this changes the previous policy that only root can count
kernel events.  Now non-root users can count kernel events or exclude
them.  Non-root users still can't use NMI events, though.  On x86 we
don't appear to have any way to control whether hypervisor events are
counted or not, so hw_event.exclude_hv is ignored.

On powerpc, the selection of whether to count events in user, kernel
and/or hypervisor mode is PMU-wide, not per-counter, so this adds a
check that the hw_event.exclude_* settings are the same as other events
on the PMU.  Counters being added to a group have to have the same
settings as the other hardware counters in the group.  Counters and
groups can only be enabled in hw_perf_group_sched_in or power_perf_enable
if they have the same settings as any other counters already on the
PMU.  If we are not running on a hypervisor, the exclude_hv setting
is ignored (by forcing it to 0) since we can't ever get any
hypervisor events.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-02-11 15:06:59 +11:00
Michael Ellerman
059f134f84 powerpc: Allow debugging of LMBs with lmb=debug
The lmb debugging can be turned on at boottime with lmb=debug on the
command line. However on powerpc that doesn't work, because we don't
necessarily call lmb_dump_all().

So always call lmb_dump_all() after lmb_analyze(), no output is
generated unless lmb=debug is found on the command line.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:38:00 +11:00
Michael Ellerman
33642d31d1 powerpc: Remove unused ppc64_terminate_msg()
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11 13:38:00 +11:00
Benjamin Herrenschmidt
edbc29d76d Merge commit 'kumar/next' into next 2009-02-11 13:37:44 +11:00
Benjamin Herrenschmidt
5b11abfdb5 powerpc/pci: mmap anonymous memory when legacy_mem doesn't exist
The new legacy_mem file in sysfs is causing problems with X on machines
that don't support legacy memory access. The way I initially implemented
it, we would fail with -ENXIO when trying to mmap it, thus exposing to
X that we do support the API but there is no legacy memory.

Unfortunately, X poor error handling is causing it to fail to start when
it gets this error.

This implements a workaround hack that instead maps anonymous memory
instead (using shmem if VM_SHARED is set, just like /dev/zero does).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-10 14:39:08 +11:00
Steven Rostedt
f25f9074c2 powerpc/ftrace: Fix math to calculate offset in TOC
Impact: fix dynamic ftrace with large modules in PPC64

The math to calculate the offset into the TOC that is taken from reading
the trampoline is incorrect. The bottom half of the offset is a signed
extended short. The current code was using an OR to create the offset
when it should have been using an addition.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-10 14:39:08 +11:00
Ingo Molnar
9d45cf9e36 Merge branch 'x86/urgent' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic merge:
	arch/x86/kernel/irqinit_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05 22:30:01 +01:00
Benjamin Herrenschmidt
59b608c2c3 powerpc: Fix oops on some machines due to incorrect pr_debug()
Recently, a patch left DEBUG enabled in the powerpc common PCI code,
resulting in an old bug in a pr_debug() statement to show up and cause
a NULL dereference on some machines.

This fixes the pr_debug() statement and reverts to DEBUG not being
force-enabled in that file.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-02 17:08:25 +11:00
Kumar Gala
105c31df6f powerpc/fsl-booke: Cleanup init/exception setup to be runtime
We currently have a few variants of fsl-booke processors (e500v1, e500v2,
e500mc, and e200).  They all have minor differences that we had previously
been handling via ifdefs.

To move towards having this support the following changes have been made:

* PID1, PID2 only exist on e500v1 & e500v2 and should not be accessed on
  e500mc or e200.  We use MMUCFG[NPIDS] to determine which case we are
  since we only touch PID1/2 in extremely early init code.

* Not all IVORs exist on all the processors so introduce cpu_setup
  functions for each variant to setup the proper IVORs that are either
  unique or exist but have some variations between the processors

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-28 18:16:50 -06:00
Ingo Molnar
6a385db5ce Merge branch 'core/percpu' into x86/core
Conflicts:
	kernel/irq/handle.c
2009-01-28 23:12:55 +01:00
Robert Jennings
69b052e828 powerpc/pseries: Correct VIO bus accounting problem in CMO env.
In the VIO bus code the wrappers for dma alloc_coherent and free_coherent
calls are rounding to IOMMU_PAGE_SIZE.  Taking a look at the underlying
calls, the actual mapping is promoted to PAGE_SIZE.  Changing the
rounding in these two functions fixes under-reporting the entitlement
used by the system.  Without this change, the system could run out of
entitlement before it believes it has and incur mapping failures at the
firmware level.

Also in the VIO bus code, the wrapper for dma map_sg is not exiting in
an error path where it should.  Rather than fall through to code for the
success case, this patch adds the return that is needed in the error path.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-28 17:15:52 +11:00
Ingo Molnar
77835492ed Merge commit 'v2.6.29-rc2' into perfcounters/core
Conflicts:
	include/linux/syscalls.h
2009-01-21 16:37:27 +01:00
Ingo Molnar
198030782c Merge branch 'x86/mm' into core/percpu
Conflicts:
	arch/x86/mm/fault.c
2009-01-21 10:39:51 +01:00
Ingo Molnar
af37501c79 Merge branch 'core/percpu' into perfcounters/core
Conflicts:
	arch/x86/include/asm/pda.h

We merge tip/core/percpu into tip/perfcounters/core because of a
semantic and contextual conflict: the former eliminates the PDA,
while the latter extends it with apic_perf_irqs field.

Resolve the conflict by moving the new field to the irq_cpustat
structure on 64-bit too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-18 18:15:49 +01:00
Tejun Heo
74e7904559 linker script: add missing .data.percpu.page_aligned
arm, arm/mach-integrator and powerpc were missing
.data.percpu.page_aligned in their percpu output section definitions.
Add it.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-17 15:26:32 +09:00
Michael Neuling
b60c31d85a powerpc: Get the number of SLBs from "slb-size" property
The PAPR says that the property for specifying the number of SLBs should
be called "slb-size".  We currently only look for "ibm,slb-size" because
this is what firmware actually presents.

This patch makes us look for the "slb-size" property as well and in
preference to the "ibm,slb-size".  This should future proof us if
firmware changes to match PAPR.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-16 16:15:16 +11:00
Paul Mackerras
3b6f9e5cb2 perf_counter: Add support for pinned and exclusive counter groups
Impact: New perf_counter features

A pinned counter group is one that the user wants to have on the CPU
whenever possible, i.e. whenever the associated task is running, for
a per-task group, or always for a per-cpu group.  If the system
cannot satisfy that, it puts the group into an error state where
it is not scheduled any more and reads from it return EOF (i.e. 0
bytes read).  The group can be released from error state and made
readable again using prctl(PR_TASK_PERF_COUNTERS_ENABLE).  When we
have finer-grained enable/disable controls on counters we'll be able
to reset the error state on individual groups.

An exclusive group is one that the user wants to be the only group
using the CPU performance monitor hardware whenever it is on.  The
counter group scheduler will not schedule an exclusive group if there
are already other groups on the CPU and will not schedule other groups
onto the CPU if there is an exclusive group scheduled (that statement
does not apply to groups containing only software counters, which can
always go on and which do not prevent an exclusive group from going on).
With an exclusive group, we will be able to let users program PMU
registers at a low level without the concern that those settings will
perturb other measurements.

Along the way this reorganizes things a little:
- is_software_counter() is moved to perf_counter.h.
- cpuctx->active_oncpu now records the number of hardware counters on
  the CPU, i.e. it now excludes software counters.  Nothing was reading
  cpuctx->active_oncpu before, so this change is harmless.
- A new cpuctx->exclusive field records whether we currently have an
  exclusive group on the CPU.
- counter_sched_out moves higher up in perf_counter.c and gets called
  from __perf_counter_remove_from_context and __perf_counter_exit_task,
  where we used to have essentially the same code.
- __perf_counter_sched_in now goes through the counter list twice, doing
  the pinned counters in the first loop and the non-pinned counters in
  the second loop, in order to give the pinned counters the best chance
  to be scheduled in.

Note that only a group leader can be exclusive or pinned, and that
attribute applies to the whole group.  This avoids some awkwardness in
some corner cases (e.g. where a group leader is closed and the other
group members get added to the context list).  If we want to relax that
restriction later, we can, and it is easier to relax a restriction than
to apply a new one.

This doesn't yet handle the case where a pinned counter is inherited
and goes into error state in the child - the error state is not
propagated up to the parent when the child exits, and arguably it
should.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-14 21:00:30 +11:00
Paul Mackerras
01d0287f06 powerpc/perf_counter: Make sure PMU gets enabled properly
This makes sure that we call the platform-specific ppc_md.enable_pmcs
function on each CPU before we try to use the PMU on that CPU.  If the
CPU goes off-line and then on-line, we need to do the enable_pmcs call
again, so we use the hw_perf_counter_setup hook to ensure that.  It gets
called as each CPU comes online, but it isn't called on the CPU that is
coming up, so this adds the CPU number as an argument to it (there were
no non-empty instances of hw_perf_counter_setup before).

This also arranges to set the pmcregs_in_use field of the lppaca (data
structure shared with the hypervisor) on each CPU when we are using the
PMU and clear it when we are not.  This allows the hypervisor to optimize
partition switches by not saving/restoring the PMU registers when we
aren't using the PMU.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-14 13:44:19 +11:00
Kumar Gala
5597b25c30 powerpc/e500mc: Doorbells need to be taken w/exceptions disabled
We use Doorbell interrupts for IPIs and thus we need to make sure we aren't
interrupted in the process of processing the IPI.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Dave Liu <daveliu@freescale.com>
2009-01-13 17:46:24 -06:00
Benjamin Herrenschmidt
c478b58135 powerpc/powermac: Fix occasional SMP boot failure
The PowerMac kernel occasionally fails to bring up the secondary CPUs on
SMP, the trigger factor seem to be fairly random and related to location
of code and data.

This appears to be due to the initial loading of the TOC value by the
secondary processor which now happens before we clear HID4:RM_CI (Real
Mode Cache Invalidate). This bit should really be cleared before we do
any load or store other than fetching code.

This fix works based on the assumption that all SMP 64-bit PowerMacs use
variants of the 970, which fortunately is true, by explicitely clearing
that bit, adding an slbia for good measure as RM_CI mode is known to
create bogus ERAT entries.

I also removed some spurrious debug output that was left enabled by
mistake while at it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:48:03 +11:00
Nathan Lynch
fc7a9feb9c powerpc/cacheinfo: Rename cache_dir per-cpu variable
The per_cpu__ prefix on DECLARE_PER_CPU'd variables is going away;
rename cache_dir to cache_dir_pcpu.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:48:02 +11:00
Stephen Rothwell
9477e455b4 powerpc: Cleanup from l64 to ll64 change: arch code
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:59 +11:00
Ingo Molnar
fe333321e2 powerpc: Change u64/s64 to a long long integer type
Convert arch/powerpc/ over to long long based u64:

 -#ifdef __powerpc64__
 -# include <asm-generic/int-l64.h>
 -#else
 -# include <asm-generic/int-ll64.h>
 -#endif
 +#include <asm-generic/int-ll64.h>

This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.

[Adjusted to not impact user mode (from paulus) - sfr]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:59 +11:00
Milton Miller
66c721e184 powerpc/kexec: Check crash_base for relocatable kernel
Enforce that the crash kernel region never overlaps the current kernel,
as it will be written directly on kexec load.

Also, default to the previous KDUMP_KERNELBASE if the start is 0.

Other architectures (x86, ia64) state that specifying the start address
0 (or omitting it) will result in the kernel allocating it.  Before the
relocatable patch in 2.6.28, powerpc would adjust any other start value
to the hardcoded KDUMP_KERNELBASE of 32M.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:59 +11:00
Milton Miller
e16459c6b7 powerpc: Make dummy section a valid note header
We are declaring the dummy section (used to work around a binutils
bug) as PT_NOTE, but we don't have enough bytes for it to be a valid
note header, and kexec userspace complains:

Warning: Elf Note name is not null terminated
Warning: append= option is not passed. Using the first kernel root partition
Warning: Elf Note name is not null terminated

Instead of using the arbitray value 0xf177 (aka "fill"), declare a
no-name no-description note of type 0.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-13 14:47:58 +11:00
Benjamin Herrenschmidt
30aae739a9 Merge commit 'kumar/kumar-next' into next 2009-01-13 13:59:03 +11:00
Mike Travis
e65e49d0f3 irq: update all arches for new irq_desc
Impact: cleanup, update to new cpumask API

Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.

Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-12 15:27:13 -08:00
Yinghai Lu
dee4102a9a sparseirq: use kstat_irqs_cpu instead
Impact: build fix

Ingo Molnar wrote:

> tip/arch/blackfin/kernel/irqchip.c: In function 'show_interrupts':
> tip/arch/blackfin/kernel/irqchip.c:85: error: 'struct kernel_stat' has no member named 'irqs'
> make[2]: *** [arch/blackfin/kernel/irqchip.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
>

So could move kstat_irqs array to irq_desc struct.

(s390, m68k, sparc) are not touched yet, because they don't support genirq

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-11 15:53:13 +01:00
Ingo Molnar
c0d362a832 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters into perfcounters/core 2009-01-11 02:44:08 +01:00
Paul Mackerras
f78628374a powerpc/perf_counter: Add support for POWER6
This adds the back-end for the PMU on the POWER6 processor.
Fortunately, the event selection hardware is somewhat simpler on
POWER6 than on other POWER family processors, so the constraints
fit into only 32 bits.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-10 16:35:01 +11:00
Paul Mackerras
16b067993d powerpc/perf_counter: Add support for PPC970 family
This adds the back-end for the PMU on the PPC970 family.

The PPC970 allows events from the ISU to be selected in two different
ways.  Rather than use alternative event codes to express this, we
instead use a single encoding for ISU events and express the
resulting constraint (that you can't select events from all three
of FPU/IFU/VPU, ISU and IDU/STS at the same time, since they all come
in through only 2 multiplexers) using a NAND constraint field, and
work out which multiplexer is used for ISU events at compute_mmcr
time.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-10 16:34:07 +11:00
Paul Mackerras
4574910e50 powerpc/perf_counter: Add generic support for POWER-family PMU hardware
This provides the architecture-specific functions needed to access
PMU hardware on the 64-bit PowerPC processors.  It has been designed
for the IBM POWER family (POWER 4/4+/5/5+/6 and PPC970) but will
hopefully also suit other 64-bit PowerPC machines (although probably
not Cell given how different it is in this area).  This doesn't
include back-ends for any specific processors.

This implements a system which allows back-ends to express the
constraints that their hardware has on what events can be counted
simultaneously.  The constraints are expressed as a 64-bit mask +
64-bit value for each event, and the encoding is capable of
expressing the constraints arising from having a set of multiplexers
feeding an event bus, with some events being available through
multiple multiplexer settings, such as we get on POWER4 and PPC970.
Furthermore, the back-end can supply alternative event codes for
each event, and the constraint checking code will try all possible
combinations of alternative event codes to try to find a combination
that will fit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-01-10 16:32:05 +11:00