Commit graph

71 commits

Author SHA1 Message Date
Linus Torvalds
1c39865151 Merge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: make pstore write function return normal success/fail value
  pstore: change mutex locking to spin_locks
  pstore: defer inserting OOPS entries into pstore
2011-11-01 10:52:29 -07:00
Linus Torvalds
8a4a8918ed Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  llist: Add back llist_add_batch() and llist_del_first() prototypes
  sched: Don't use tasklist_lock for debug prints
  sched: Warn on rt throttling
  sched: Unify the ->cpus_allowed mask copy
  sched: Wrap scheduler p->cpus_allowed access
  sched: Request for idle balance during nohz idle load balance
  sched: Use resched IPI to kick off the nohz idle balance
  sched: Fix idle_cpu()
  llist: Remove cpu_relax() usage in cmpxchg loops
  sched: Convert to struct llist
  llist: Add llist_next()
  irq_work: Use llist in the struct irq_work logic
  llist: Return whether list is empty before adding in llist_add()
  llist: Move cpu_relax() to after the cmpxchg()
  llist: Remove the platform-dependent NMI checks
  llist: Make some llist functions inline
  sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch
  sched: Remove redundant test in check_preempt_tick()
  sched: Add documentation for bandwidth control
  sched: Return unused runtime on group dequeue
  ...
2011-10-26 17:08:43 +02:00
Chen Gong
b238b8fa93 pstore: make pstore write function return normal success/fail value
Currently pstore write interface employs record id as return
value, but it is not enough because it can't tell caller if
the write operation is successful. Pass the record id back via
an argument pointer and return zero for success, non-zero for
failure.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-10-12 09:17:24 -07:00
Don Zickus
9c48f1c629 x86, nmi: Wire up NMI handlers to new routines
Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10 06:56:57 +02:00
Huang Ying
1230db8e15 llist: Make some llist functions inline
Because llist code will be used in performance critical scheduler
code path, make llist_add() and llist_del_all() inline to avoid
function calling overhead and related 'glue' overhead.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-2-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 11:30:53 +02:00
Don Zickus
abd4d5587b pstore: change mutex locking to spin_locks
pstore was using mutex locking to protect read/write access to the
backend plug-ins.  This causes problems when pstore is executed in
an NMI context through panic() -> kmsg_dump().

This patch changes the mutex to a spin_lock_irqsave then also checks to
see if we are in an NMI context.  If we are in an NMI and can't get the
lock, just print a message stating that and blow by the locking.

All this is probably a hack around the bigger locking problem but it
solves my current situation of trying to sleep in an NMI context.

Tested by loading the lkdtm module and executing a HARDLOCKUP which
will cause the machine to panic inside the nmi handler.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-08-16 11:55:58 -07:00
Chen Gong
03ba176a29 ACPI APEI: Add Kconfig option IRQ_WORK for GHES
IRQ_WORK is used by GHES, but it is selected by PERF_EVENT.
For now PERF_EVENT is selected by x86 by default, but
in concept, IRQ_WORK should be selected by GHES, not by others.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-11 15:42:09 -04:00
Matthew Garrett
b3b46d76d0 APEI: Fix WHEA _OSC call
Bit 0 of the support parameter to the OSC call should be set in order to
indicate that the OS supports the WHEA mechanism. Stuart Hayes tracked
an APEI issue on some Dell platforms down to this.

Reported-by: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-11 12:18:38 -04:00
Len Brown
d0e323b470 Merge branch 'apei' into apei-release
Some trivial conflicts due to other various merges
adding to the end of common lists sooner than this one.

	arch/ia64/Kconfig
	arch/powerpc/Kconfig
	arch/x86/Kconfig
	lib/Kconfig
	lib/Makefile

Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:30:42 -04:00
Huang Ying
c3e6088e10 ACPI, APEI, EINJ Param support is disabled by default
EINJ parameter support is only usable for some specific BIOS.
Originally, it is expected to have no harm for BIOS does not support
it.  But now, we found it will cause issue (memory overwriting) for
some BIOS.  So param support is disabled by default and only enabled
when newly added module parameter named "param_extension" is
explicitly specified.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:15:59 -04:00
Len Brown
70cb6e1da0 APEI GHES: 32-bit buildfix
drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression

ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
  in function ghes_estatus_cache_add().

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:15:59 -04:00
Huang Ying
ba61ca4aab ACPI, APEI, GHES: Add hardware memory error recovery support
memory_failure_queue() is called when recoverable memory errors are
notified by firmware to do the recovery work.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:15:58 -04:00
Huang Ying
152cef40a8 ACPI, APEI, GHES, Error records content based throttle
printk is used by GHES to report hardware errors.  Ratelimit is
enforced on the printk to avoid too many hardware error reports in
kernel log.  Because there may be thousands or even millions of
corrected hardware errors during system running.

Currently, a simple scheme is used.  That is, the total number of
hardware error reporting is ratelimited.  This may cause some issues
in practice.

For example, there are two kinds of hardware errors occurred in
system.  One is corrected memory error, because the fault memory
address is accessed frequently, there may be hundreds error report
per-second.  The other is corrected PCIe AER error, it will be
reported once per-second.  Because they share one ratelimit control
structure, it is highly possible that only memory error is reported.

To avoid the above issue, an error record content based throttle
algorithm is implemented in the patch.  Where after the first
successful reporting, all error records that are same are throttled for
some time, to let other kinds of error records have the opportunity to
be reported.

In above example, the memory errors will be throttled for some time,
after being printked.  Then the PCIe AER error will be printked
successfully.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:15:57 -04:00
Huang Ying
67eb2e9907 ACPI, APEI, GHES, printk support for recoverable error via NMI
Some APEI GHES recoverable errors are reported via NMI, but printk is
not safe in NMI context.

To solve the issue, a lock-less memory allocator is used to allocate
memory in NMI handler, save the error record into the allocated
memory, put the error record into a lock-less list.  On the other
hand, an irq_work is used to delay the operation from NMI context to
IRQ context.  The irq_work IRQ handler will remove nodes from
lock-less list, printk the error record and do some further processing
include recovery operation, then free the memory.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:15:57 -04:00
Matthew Garrett
b94fdd077e pstore: Make "part" unsigned
We'll never have a negative part, so just make this an unsigned int.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-07-22 16:14:29 -07:00
Matthew Garrett
56280682ce pstore: Add extra context for writes and erases
EFI only provides small amounts of individual storage, and conventionally
puts metadata in the storage variable name. Rather than add a metadata
header to the (already limited) variable storage, it's easier for us to
modify pstore to pass all the information we need to construct a unique
variable name to the appropriate functions.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-07-22 16:14:20 -07:00
Matthew Garrett
638c1fd303 pstore: Extend API for more flexibility in new backends
Some pstore implementations may not have a static context, so extend the
API to pass the pstore_info struct to all calls and allow for a context
pointer.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-07-22 16:14:06 -07:00
Huang Ying
9fb0bfe140 ACPI, APEI, Add WHEA _OSC support
APEI firmware first mode must be turned on explicitly on some
machines, otherwise there may be no GHES hardware error record for
hardware error notification.  APEI bit in generic _OSC call can be
used to do that, but on some machine, a special WHEA _OSC call must be
used.  This patch adds the support to that WHEA _OSC call.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:38:49 -04:00
Huang Ying
b6a9501658 ACPI, APEI, GHES, Support disable GHES at boot time
Some machine may have broken firmware so that GHES and firmware first
mode should be disabled.  This patch adds support to that.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:36:34 -04:00
Huang Ying
86cd47334b ACPI, APEI, GHES, Prevent GHES to be built as module
GHES (Generic Hardware Error Source) is used to process hardware error
notification in firmware first mode.  But because firmware first mode
can be turned on but can not be turned off, it is unreasonable to
unload the GHES module with firmware first mode turned on.  To avoid
confusion, this patch makes GHES can be enabled/disabled in
configuration time, but not built as module and unloaded at run time.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:35:57 -04:00
Huang Ying
392913de7c ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
This patch changes APEI EINJ and ERST to use apei_exec_run for
mandatory actions, and apei_exec_run_optional for optional actions.

Cc: Thomas Renninger <trenn@novell.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:35:14 -04:00
Huang Ying
eecf2f7124 ACPI, APEI, Add apei_exec_run_optional
Some actions in APEI ERST and EINJ tables are optional, for example,
ACPI_EINJ_BEGIN_OPERATION action is used to do some preparation for
error injection, and firmware may choose to do nothing here.  While
some other actions are mandatory, for example, firmware must provide
ACPI_EINJ_GET_ERROR_TYPE implementation.

Original implementation treats all actions as optional (that is, can
have no instructions), that may cause issue if firmware does not
provide some mandatory actions.  To fix this, this patch adds
apei_exec_run_optional, which should be used for optional actions.
The original apei_exec_run should be used for mandatory actions.

Cc: Thomas Renninger <trenn@novell.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:34:49 -04:00
Huang Ying
5588340d46 ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
printk is used by GHES to report hardware errors.  Normally, the
printk will be ratelimited to avoid too many hardware error reports in
kernel log.  Because there may be thousands or even millions of
corrected hardware errors during system running.

That is different for fatal hardware error, because system will go
panic as soon as possible, there will be no more than several error
records.  And these error records are valuable for system fault
diagnosis, so they should not be ratelimited.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:33:57 -04:00
Chen Gong
d37afc50e6 ACPI, APEI, ERST, Fix erst-dbg long record reading issue
When we debug ERST table with erst-dbg, if the error record in ERST
table is too long(>4K), it can't be read out.  So this patch increases
the buffer size to 16K to ensure such error records can be read from
ERST table.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:31:51 -04:00
Huang Ying
ca7cc5110a ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled
erst_dbg module can not work when ERST is disabled.  So disable module
loading to provide clearer information to user.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:29:52 -04:00
Huang Ying
4d2b2956ef ACPI, APEI, HEST, Detect duplicated hardware error source ID
The firmware on some machine will report duplicated hardware error
source ID in HEST.  This is considered a firmware bug.  To provide
better warning message, this patch adds duplicated hardware error
source ID detecting and corresponding printk.

This patch fixes #37412 on kernel bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=37412

Reported-by: marconifabio@ubuntu-it.org
Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Mathias <janedo.spam@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 23:27:56 -04:00
Roland Dreier
dbee8a0aff x86: remove 32-bit versions of readq()/writeq()
The presense of a writeq() implementation on 32-bit x86 that splits the
64-bit write into two 32-bit writes turns out to break the mpt2sas driver
(and in general is risky for drivers as was discussed in
<http://lkml.kernel.org/r/adaab6c1h7c.fsf@cisco.com>).  To fix this,
revert 2c5643b1c5 ("x86: provide readq()/writeq() on 32-bit too") and
follow-on cleanups.

This unfortunately leads to pushing non-atomic definitions of readq() and
write() to various x86-only drivers that in the meantime started using the
definitions in the x86 version of <asm/io.h>.  However as discussed
exhaustively, this is actually the right thing to do, because the right
way to split a 64-bit transaction is hardware dependent and therefore
belongs in the hardware driver (eg mpt2sas needs a spinlock to make sure
no other accesses occur in between the two halves of the access).

Build tested on 32- and 64-bit x86 allmodconfig.

Link: http://lkml.kernel.org/r/x86-32-writeq-is-broken@mdm.bga.com
Acked-by: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Acked-by: James Bottomley <James.Bottomley@parallels.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:44 -07:00
Luck, Tony
5d2a8342f6 pstore: Fix Kconfig dependencies for apei->pstore
Geert Uytterhoeven ran a dependency checker which kicked out this warning:

+ warning: (ACPI_APEI) selects PSTORE which has unmet direct dependencies (MISC_FILESYSTEMS):  => N/A

Randy confirmed that the fix was to "select MISC_FILESYSTEMS" too.

Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-05-20 10:34:35 -07:00
Chen Gong
f5ec25deb2 pstore: fix potential logic issue in pstore read interface
1) in the calling of erst_read, the parameter of buffer size
maybe overflows and cause crash

2) the return value of erst_read should be checked more strictly

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-05-16 11:05:08 -07:00
Chen Gong
06cf91b4b4 pstore: fix pstore filesystem mount/remount issue
Currently after mount/remount operation on pstore filesystem,
the content on pstore will be lost. It is because current ERST
implementation doesn't support multi-user usage, which moves
internal pointer to the end after accessing it. Adding
multi-user support for pstore usage.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-05-16 11:05:00 -07:00
Chen Gong
8d38d74b64 pstore: fix one type of return value in pstore
the return type of function _read_ in pstore is size_t,
but in the callback function of _read_, the logic doesn't
consider it too much, which means if negative value (assuming
error here) is returned, it will be converted to positive because
of type casting. ssize_t is enough for this function.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-05-16 11:04:51 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Len Brown
02e2407858 Merge branch 'linus' into release
Conflicts:
	arch/x86/kernel/acpi/sleep.c

Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-23 02:34:54 -04:00
Huang Ying
c413d76820 ACPI, APEI, Add PCIe AER error information printing support
The AER error information printing support is implemented in
drivers/pci/pcie/aer/aer_print.c.  So some string constants, functions
and macros definitions can be re-used without being exported.

The original PCIe AER error information printing function is not
re-used directly because the overall format is quite different.  And
changing the original printing format may make some original users'
scripts broken.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-21 22:59:08 -04:00
Huang Ying
885b976fad ACPI, APEI, Add ERST record ID cache
APEI ERST firmware interface and implementation has no multiple users
in mind.  For example, if there is four records in storage with ID: 1,
2, 3 and 4, if two ERST readers enumerate the records via
GET_NEXT_RECORD_ID as follow,

reader 1		reader 2
1
			2
3
			4
-1
			-1

where -1 signals there is no more record ID.

Reader 1 has no chance to check record 2 and 4, while reader 2 has no
chance to check record 1 and 3.  And any other GET_NEXT_RECORD_ID will
return -1, that is, other readers will has no chance to check any
record even they are not cleared by anyone.

This makes raw GET_NEXT_RECORD_ID not suitable for used by multiple
users.

To solve the issue, an in-memory ERST record ID cache is designed and
implemented.  When enumerating record ID, the ID returned by
GET_NEXT_RECORD_ID is added into cache in addition to be returned to
caller.  So other readers can check the cache to get all record ID
available.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-21 22:59:06 -04:00
Tony Luck
afe997a183 Pull pstorev4 into release branch 2011-03-16 09:58:31 -07:00
Rafael J. Wysocki
d3072e6a7e ACPI: Fix boot problem related to APEI with acpi_disabled set
Commit 415e12b237 ("PCI/ACPI: Request _OSC control once for each root
bridge (v3)") put the acpi_hest_init() call in acpi_pci_root_init() into
a wrong place, presumably because the author confused acpi_pci_disabled
with acpi_disabled.  Bring the code ordering in acpi_pci_root_init()
back to sanity.

Additionally, make sure that hest_disable is set when acpi_disabled is
set, which is going to prevent acpi_hest_parse(), that still may be
executed for acpi_disabled=1 through aer_acpi_firmware_first(), from
crashing because of uninitialized hest_tab.

Reported-and-tested-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-16 11:56:26 -08:00
Linus Torvalds
d73b388459 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI/PM: Report wakeup events before resuming devices
  PCI/PM: Use pm_wakeup_event() directly for reporting wakeup events
  PCI: sysfs: Update ROM to include default owner write access
  x86/PCI: make Broadcom CNB20LE driver EMBEDDED and EXPERIMENTAL
  x86/PCI: don't use native Broadcom CNB20LE driver when ACPI is available
  PCI/ACPI: Request _OSC control once for each root bridge (v3)
  PCI: enable pci=bfsort by default on future Dell systems
  PCI/PCIe: Clear Root PME Status bits early during system resume
  PCI: pci-stub: ignore zero-length id parameters
  x86/PCI: irq and pci_ids patch for Intel Patsburg
  PCI: Skip id checking if no id is passed
  PCI: fix __pci_device_probe kernel-doc warning
  PCI: make pci_restore_state return void
  PCI: Disable ASPM if BIOS asks us to
  PCI: Add mask bit definition for MSI-X table
  PCI: MSI: Move MSI-X entry definition to pci_regs.h

Fix up trivial conflicts in drivers/net/{skge.c,sky2.c} that had in the
meantime been converted to not use legacy PCI power management, and thus
no longer use pci_restore_state() at all (and that caused trivial
conflicts with the "make pci_restore_state return void" patch)
2011-01-14 09:29:05 -08:00
Rafael J. Wysocki
415e12b237 PCI/ACPI: Request _OSC control once for each root bridge (v3)
Move the evaluation of acpi_pci_osc_control_set() (to request control of
PCI Express native features) into acpi_pci_root_add() to avoid calling
it many times for the same root complex with the same arguments.
Additionally, check if all of the requisite _OSC support bits are set
before calling acpi_pci_osc_control_set() for a given root complex.

References: https://bugzilla.kernel.org/show_bug.cgi?id=20232
Reported-by: Ozan Caglayan <ozan@pardus.org.tr>
Tested-by: Ozan Caglayan <ozan@pardus.org.tr>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-01-14 08:55:41 -08:00
Linus Torvalds
52cfd503ad Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
  ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
  ACPI: fix resource check message
  ACPI / Battery: Update information on info notification and resume
  ACPI: Drop device flag wake_capable
  ACPI: Always check if _PRW is present before trying to evaluate it
  ACPI / PM: Check status of power resources under mutexes
  ACPI / PM: Rename acpi_power_off_device()
  ACPI / PM: Drop acpi_power_nocheck
  ACPI / PM: Drop acpi_bus_get_power()
  Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
  ACPI / Fan: Rework the handling of power resources
  ACPI / PM: Register power resource devices as soon as they are needed
  ACPI / PM: Register acpi_power_driver early
  ACPI / PM: Add function for updating device power state consistently
  ACPI / PM: Add function for device power state initialization
  ACPI / PM: Introduce __acpi_bus_get_power()
  ACPI / PM: Introduce function for refcounting device power resources
  ACPI / PM: Add functions for manipulating lists of power resources
  ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
  ACPICA: Update version to 20101209
  ...
2011-01-13 20:15:35 -08:00
Len Brown
03b6e6e58d Merge branch 'apei' into release 2011-01-12 05:02:22 -05:00
Huang Ying
81e88fdc43 ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
  via NMI, because printk is not NMI-safe. Will fix this via delay
  printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 03:06:19 -05:00
Tony Luck
0bb77c465f pstore: X86 platform interface using ACPI/APEI/ERST
The 'error record serialization table' in ACPI provides a suitable
amount of persistent storage for use by the pstore filesystem.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-01-03 14:22:11 -08:00
Stefan Weil
e8a8b252fb Fix spelling mistakes in comments
milisecond -> millisecond
 meassge -> message

Cc: Kalle Valo <kvalo@adurom.com>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-03 13:51:58 +01:00
Huang Ying
32c361f574 ACPI, APEI, Report GHES error information via printk
printk is one of the methods to report hardware errors to user space.
This patch implements hardware error reporting for GHES via printk.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-12-13 23:42:39 -05:00
Huang Ying
f59c55d04b ACPI, APEI, Add APEI generic error status printing support
In APEI, Hardware error information reported by firmware to Linux
kernel is in the data structure of APEI generic error status (struct
acpi_hes_generic_status).  While now printk is used by Linux kernel to
report hardware error information to user space.

So, this patch adds printing support for the data structure, so that
the corresponding hardware error information can be reported to user
space via printk.

PCIe AER information printing is not implemented yet.  Will refactor the
original PCIe AER information printing code to avoid code duplicating.

The output format is as follow:

<error record> :=
APEI generic hardware error status
severity: <integer>, <severity string>
section: <integer>, severity: <integer>, <severity string>
flags: <integer>
<section flags strings>
fru_id: <uuid string>
fru_text: <string>
section_type: <section type string>
<section data>

<severity string>* := recoverable | fatal | corrected | info

<section flags strings># :=
[primary][, containment warning][, reset][, threshold exceeded]\
[, resource not accessible][, latent error]

<section type string> := generic processor error | memory error | \
PCIe error | unknown, <uuid string>

<section data> :=
<generic processor section data> | <memory section data> | \
<pcie section data> | <null>

<generic processor section data> :=
[processor_type: <integer>, <proc type string>]
[processor_isa: <integer>, <proc isa string>]
[error_type: <integer>
<proc error type strings>]
[operation: <integer>, <proc operation string>]
[flags: <integer>
<proc flags strings>]
[level: <integer>]
[version_info: <integer>]
[processor_id: <integer>]
[target_address: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[IP: <integer>]

<proc type string>* := IA32/X64 | IA64

<proc isa string>* := IA32 | IA64 | X64

<processor error type strings># :=
[cache error][, TLB error][, bus error][, micro-architectural error]

<proc operation string>* := unknown or generic | data read | data write | \
instruction execution

<proc flags strings># :=
[restartable][, precise IP][, overflow][, corrected]

<memory section data> :=
[error_status: <integer>]
[physical_address: <integer>]
[physical_address_mask: <integer>]
[node: <integer>]
[card: <integer>]
[module: <integer>]
[bank: <integer>]
[device: <integer>]
[row: <integer>]
[column: <integer>]
[bit_position: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[target_id: <integer>]
[error_type: <integer>, <mem error type string>]

<mem error type string>* :=
unknown | no error | single-bit ECC | multi-bit ECC | \
single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
target abort | parity error | watchdog timeout | invalid address | \
mirror Broken | memory sparing | scrub corrected error | \
scrub uncorrected error

<pcie section data> :=
[port_type: <integer>, <pcie port type string>]
[version: <integer>.<integer>]
[command: <integer>, status: <integer>]
[device_id: <integer>:<integer>:<integer>.<integer>
slot: <integer>
secondary_bus: <integer>
vendor_id: <integer>, device_id: <integer>
class_code: <integer>]
[serial number: <integer>, <integer>]
[bridge: secondary_status: <integer>, control: <integer>]

<pcie port type string>* := PCIe end point | legacy PCI end point | \
unknown | unknown | root port | upstream switch port | \
downstream switch port | PCIe to PCI/PCI-X bridge | \
PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
root complex event collector

Where, [] designate corresponding content is optional

All <field string> description with * has the following format:

field: <integer>, <field string>

Where value of <integer> should be the position of "string" in <field
string> description. Otherwise, <field string> will be "unknown".

All <field strings> description with # has the following format:

field: <integer>
<field strings>

Where each string in <fields strings> corresponding to one set bit of
<integer>. The bit position is the position of "string" in <field
strings> description.

For more detailed explanation of every field, please refer to UEFI
specification version 2.3 or later, section Appendix N: Common
Platform Error Record.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-12-13 23:42:12 -05:00
Jan Beulich
bec4f22a2d ACPI/HEST: adjust section selection
Properly const-, __init-, and __read_mostly-annotate this code.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-12-11 02:01:48 -05:00
Huang Ying
3b38bb5f7f ACPI, APEI, use raw spinlock in ERST
ERST writing may be used in NMI or Machine Check Exception handler. So
it need to use raw spinlock instead of normal spinlock.  This patch
fixes it.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-12-11 02:01:46 -05:00
Linus Torvalds
092e0e7e52 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
2010-10-22 10:52:56 -07:00
Arnd Bergmann
6038f373a3 llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-15 15:53:27 +02:00