Commit graph

149 commits

Author SHA1 Message Date
Borislav Petkov
c1ae68309b amd64_edac: Erratum #637 workaround
F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:56 +02:00
Borislav Petkov
f08e457cec amd64_edac: Factor in CC6 save area
F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:44 +02:00
Borislav Petkov
f030ddfb37 amd64_edac: Remove node interleave warning
This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:12 +02:00
Markus Trippelsdorf
4949603a6f EDAC: Remove debugging output in scrub rate handling
This patch removes superfluous debugging output in the sysfs scrub rate
handler. It also consolidates the error handling in the scrub rate
accessors.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-21 12:44:58 +02:00
Borislav Petkov
a9f0fbe2bb amd64_edac: Fix potential memleak
We check the pointers together but at least one of them could be invalid
due to failed allocation. Since we cannot continue if either of the two
allocations has failed, exit early by freeing them both.

Cc: <stable@kernel.org> # 38.x
Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-29 18:19:06 +02:00
Borislav Petkov
d34a6ecd45 amd64_edac: Fix decode_syndrome types
Those should all be unsigned.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:40 +01:00
Borislav Petkov
8c6717510f amd64_edac: Fix DCT argument type
Fix amd64_debug_display_dimm_sizes() arguments order per convention (pvt
is always first). Also, the now second arg denotes the DCT so adjust its
type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:34 +01:00
Borislav Petkov
e761359a25 amd64_edac: Fix ranges signedness
The dram ranges make sense only as an unsigned type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:33 +01:00
Borislav Petkov
972ea17ab9 amd64_edac: Drop local variable
Use the macro directly instead

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:32 +01:00
Borislav Petkov
71d2a32e8e amd64_edac: Fix PCI config addressing types
Adjust argument types to the PCI config API's types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:31 +01:00
Borislav Petkov
151fa71c58 amd64_edac: Fix DRAM base macros
Return unsigned u8 values only.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:30 +01:00
Borislav Petkov
b487c33e55 amd64_edac: Fix node id signedness
A node id can never be negative since we use it as an index into
the DRAM ranges array. This also makes one of the BUG_ON conditions
redundant.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:28 +01:00
Borislav Petkov
df71a05324 amd64_edac: Enable driver on F15h
Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov
a3b7db09a6 amd64_edac: Adjust ECC symbol size to F15h
F15h has the same ECC symbol size options as F10h revD and later so
adjust checks to that. Simplify code a bit.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov
87b3e0e6e4 amd64_edac: Simplify scrubrate setting
Drop per-instance variable and compute min scrubrate dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:25 +01:00
Borislav Petkov
41d8bfaba7 amd64_edac: Improve DRAM address mapping
Drop static tables which map the bits in F2x80 to a chip select size in
favor of functions doing the mapping with some bit fiddling. Also, add
F15 support.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:24 +01:00
Borislav Petkov
5a5d237169 amd64_edac: Sanitize ->read_dram_ctl_register
This function is relevant for F10h and higher, and it has only one
callsite so drop its function pointer from the low_ops struct.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:23 +01:00
Borislav Petkov
b15f0fcab1 amd64_edac: Adjust sys_addr to chip select conversion routine to F15h
F15h sys_addr to chip select mapping is almost identical to F10h's so
reuse that. Rename functions on that path accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov
355fba6005 amd64_edac: Beef up early exit reporting
Add paranoid checks for the sys address before going off and decoding
it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov
614ec9d853 amd64_edac: Revamp online spare handling
Replace per-DCT macros with smarter ones, drop hack and look for the
spare rank on all chip selects on a channel.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov
5d4b58e84a amd64_edac: Fix channel interleave removal
Remove the channel interleave select bit properly. See
F2x110[DctSelIntLvAddr] for details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov
e2f79dbdfb amd64_edac: Correct node interleaving removal
When node interleaving is enabled, a subset of the addr[14:12] bits has
to be removed in order to get the normalized DCT address of the DRAM
channel. The actual number of bits to remove is determined by F1x[1,
0][7C:40][IntlvEn]. Do this correctly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov
95b0ef55cd amd64_edac: Add support for interleaved region swapping
On revC3 and revE Fam10h machines and later, non-interleaved graphics
framebuffer memory under the 16G mark can be swapped with a region
located at the bottom of memory so that the GPU can use the interleaved
region and thus two channels. Add support for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov
700466249f amd64_edac: Unify get_error_address
The address bits from MC4_STATUS differ only between K8 and the rest so
no need for a per-family method.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
f192c7b16c amd64_edac: Simplify decoding path
Use the struct mce directly instead of copying from it into a custom
struct err_regs.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
7d20d14da1 amd64_edac: Adjust channel counting to F15h
The only difference is that F10h used to sport ganged DCTs and F15h
doesn't so adjust the F10h routine and reuse it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
5980bb9cd8 amd64_edac: Cleanup old defines cruft
Remove unused defines, drop family names from define names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:18 +01:00
Borislav Petkov
bcd781f46a amd64_edac: Cleanup NBSH cruft
Remove reporting of errors with UC bit set - this is done by the MCE
decoding code anyway and this driver deals with DRAM ECC errors only. UC
(NB uncorrectable error) doesn't necessarily mean it is a DRAM error.
Remove unused macros while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:17 +01:00
Borislav Petkov
a97fa68ec4 amd64_edac: Cleanup NBCFG handling
The fact whether we are chipkill capable or not does not have any
bearing when computing the channel index on a ganged DCT configuration
so remove that. Also, simplify debug statements. Finally, remove old
error injection leftovers, while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:16 +01:00
Borislav Petkov
c9f4f26eae amd64_edac: Cleanup NBCTL code
Remove family names from macro names, drop single bit defines and
comment their meaning instead.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov
78da121e15 amd64_edac: Cleanup DCT Select Low/High code
Shorten macro names, remove family name from macros, fix macro
arguments, shorten debug strings.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov
cb32850744 amd64_edac: Cleanup Dram Configuration registers handling
* Restrict DCT ganged mode check since only Fam10h supports it
* Adjust DRAM type detection for BD since it only supports DDR3
* Remove second and thus unneeded DCLR read in k8_early_channel_count() - we do
  that in read_mc_regs()
* Cleanup comments and remove family names from register macros
* Remove unused defines

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
525a1b20a6 amd64_edac: Cleanup DBAM handling
Do not read DBAM regs twice and simplify code around them.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
f678b8ccce amd64_edac: Replace huge bitmasks with a macro
Replace hard to read hex constants with a continuous masks macro.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
c8e518d567 amd64_edac: Sanitize f10_get_base_addr_offset
This function maps the system address to the normalized DCT address.
Document what the code does for more clarity and wrap insane bitmasks in
a more understandable macro which generates them. Also, reduce number of
arguments passed to the function. Finally, rename this function to what
it actually does.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov
229a7a11ac amd64_edac: Sanitize channel extraction
Cleanup and simplify f10_determine_channel(); make it more readable.
Also drop f10_map_intlv_en_to_shift() in favor of simply counting the
bits in F1x124[DramIntlvEn] which is equivalent.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov
11c75eadaf amd64_edac: Cleanup chipselect handling
Add a struct representing the DRAM chip select base/limit register
pairs. Concentrate all CS handling in a single function. Also, add CS
looping macros for cleaner, more readable code. While at it, adjust code
to F15h. Finally, do smaller macro names cleanups (remove family names
from register macros) and debug messages clarification.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov
bc21fa5787 amd64_edac: Cleanup DHAR handling
Adjust to F15h, simplify code, fixup macros.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov
7f19bf755c amd64_edac: Remove DRAM base/limit subfields caching
Add a struct representing the DRAM base/limit range pairs and remove all
cached subfields. Replace them with accessor functions, which actually
saves us some space:

   text    data     bss     dec     hex filename
  14712    1577     336   16625    40f1 drivers/edac/amd64_edac_mod.o.after
  14831    1609     336   16776    4188 drivers/edac/amd64_edac_mod.o.before

Also, it simplifies the code a lot allowing to merge the K8 and F10h
routines.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov
b2b0c60543 amd64_edac: Add support for F15h DCT PCI config accesses
F15h "multiplexes" between the configuration space of the two DRAM
controllers by toggling D18F1x10C[DctCfgSel] while F10h has a different
set of registers for DCT0, and DCT1 in extended PCI config space.

Add DCT configuration space accessors per family thus wrapping all the
different access prerequisites. Clean up code while at it, shorten
names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov
4d7963648f amd64_edac: Fix DIMMs per DCTs output
amd64_debug_display_dimm_sizes() reports the distribution of the DIMMs
on each DRAM controller and its chip select sizes. Thus, the last don't
have anything to do with whether we're running in ganged DCT mode or not
- their sizes don't change all of a sudden. Fix that by removing the
ganged-check and dump DCT0's config for DCT1 when in ganged mode since
they're identical.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-02-10 14:41:49 +01:00
Linus Torvalds
128283a47e Merge branch 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE: Fix NB error formatting
  EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
  EDAC, MCE: Enable MCE decoding on F15h
  EDAC, MCE: Allow F15h bank 6 MCE injection
  EDAC, MCE: Shorten error report formatting
  EDAC, MCE: Overhaul error fields extraction macros
  EDAC, MCE: Add F15h FP MCE decoder
  EDAC, MCE: Add F15 EX MCE decoder
  EDAC, MCE: Add an F15h NB MCE decoder
  EDAC, MCE: No F15h LS MCE decoder
  EDAC, MCE: Add F15h CU MCE decoder
  EDAC, MCE: Add F15h IC MCE decoder
  EDAC, MCE: Add F15h DC MCE decoder
  EDAC, MCE: Select extended error code mask
2011-01-07 14:54:03 -08:00
Borislav Petkov
6245288232 EDAC, MCE: Overhaul error fields extraction macros
Make macro names shorter thus making code shorter and more clear.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:21 +01:00
Borislav Petkov
a135cef79a amd64_edac: Disable DRAM ECC injection on K8
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:46 +01:00
Borislav Petkov
390944439f EDAC: Fixup scrubrate manipulation
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:31 +01:00
Borislav Petkov
360b7f3c60 amd64_edac: Remove two-stage initialization
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:

1. Probe PCI device: we only test ECC capabilities here and if none exit
early.

2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.

Remove "amd64_" prefix from static functions touched, while at it.

There actually should be no visible functional change resulting from
this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:03 +01:00
Borislav Petkov
2299ef7114 amd64_edac: Check ECC capabilities initially
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.

While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:02 +01:00
Borislav Petkov
ae7bb7c679 amd64_edac: Carve out ECC-related hw settings
This is in preparation for the init path reorganization where we want
only to

1) test whether a particular node supports ECC
2) can it be enabled

and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.

The should be no functional change introduced by this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:00 +01:00
Borislav Petkov
f1db274e1b amd64_edac: Remove PCI ECS enabling functions
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:59 +01:00
Borislav Petkov
cc4d8860fc amd64_edac: Allocate driver instances dynamically
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:57 +01:00