Commit graph

37 commits

Author SHA1 Message Date
David Rientjes
3484d79813 x86_64: fake pxm-to-node mapping for fake numa
For NUMA emulation, our SLIT should represent the true NUMA topology of the
system but our proximity domain to node ID mapping needs to reflect the
emulated state.

When NUMA emulation has successfully setup fake nodes on the system, a new
function, acpi_fake_nodes() is called.  This function determines the proximity
domain (_PXM) for each true node found on the system.  It then finds which
emulated nodes have been allocated on this true node as determined by its
starting address.  The node ID to PXM mapping is changed so that each fake
node ID points to the PXM of the true node that it is located on.

If the machine failed to register a SLIT, then we assume there is no special
requirement for emulated node affinity so we use the default LOCAL_DISTANCE,
which is newly exported to this code, as our measurement if the emulated nodes
appear in the same PXM.  Otherwise, we use REMOTE_DISTANCE.

PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we
can compare node_to_pxm() results in generic code (in this case, the SRAT
code).

Cc: Len Brown <lenb@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
David Rientjes
ae2c6dcf90 x86_64: various cleanups in NUMA scan node
In acpi_scan_nodes(), we immediately return -1 if acpi_numa <= 0, meaning
we haven't detected any underlying ACPI topology or we have explicitly
disabled its use from the command-line with numa=noacpi.

acpi_table_print_srat_entry() and acpi_table_parse_srat() are only
referenced within drivers/acpi/numa.c, so we can mark them as static and
remove their prototypes from the header file.

Likewise, pxm_to_node_map[] and node_to_pxm_map[] are only used within
drivers/acpi/numa.c, so we mark them as static and remove their externs
from the header file.

The automatic 'result' variable is unused in acpi_numa_init(), so it's
removed.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
David Rientjes
a2e212dae5 x86_64: Use LOCAL_DISTANCE and REMOTE_DISTANCE in x86_64 ACPI code
Use LOCAL_DISTANCE and  REMOTE_DISTANCE in x86_64 ACPI code

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:07 -07:00
Suresh Siddha
e3f1caeef9 [PATCH] x86-64: set node_possible_map at runtime - try 2
Set the node_possible_map at runtime on x86_64.  On a non NUMA system,
num_possible_nodes() will now say '1'.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
2007-05-02 19:27:20 +02:00
Alexey Starikovskiy
15a58ed121 ACPICA: Remove duplicate table definitions (non-conflicting), cont
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:29 -05:00
keith mannthey
926fafebc4 [PATCH] x86-64: x86_64 hot-add memory srat.c fix
This patch corrects the logic used in srat.c to figure out what
parsing what action to take when registering hot-add areas.  Hot-add
areas should only be added to the node information for the
MEMORY_HOTPLUG_RESERVE case.  When booting MEMORY_HOTPLUG_SPARSE hot-add
areas on everything but the last node are getting include in the node
data and during kernel boot the pages are setup then the kernel dies
when the pages are used. This patch fixes this issue.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
Keith Mannthey
8c2676a587 [PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid node fixup
In cases where the acpi memory-add event does not containe the pxm (node)
infomation allow the driver to look up node info based on the address.  The
acpi_get_node call returns -1 if it can't decode the pxm info, this causes
add_memory to panic.  acpi_get_node would have to decode the resource from the
handle (a lenghty proposition).  This seems to be the cleanist point to
interject the hook.

[kamezawa.hiroyu@jp.fujitsu.com: build fixes]
[y-goto@jp.fujitsu.com: build fixes]
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
4942e998b4 [PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid enable
The api for hot-add memory already has a construct for finding nodes based on
an address, memory_add_physaddr_to_nid.  This patch allows the fucntion to do
something besides return 0.  It uses the nodes_add infomation to lookup to
node info for a hot add event.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
71efa8fdc5 [PATCH] hot-add-mem x86_64: Enable SPARSEMEM in srat.c
Enable x86_64 srat.c to share code between both reserve and sparsemem based
add memory paths.  Both paths need the hot-add area node locality infomration
(nodes_add).  This code refactors the code path to allow this.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Mel Gorman
fb01439c5b [PATCH] Allow an arch to expand node boundaries
Arch-independent zone-sizing determines the size of a node
(pgdat->node_spanned_pages) based on the physical memory that was
registered by the architecture.  However, when
CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
spanned_pages will be much larger and that mem_map will be allocated that
is used lated on memory hot-add.

This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
to call push_node_boundaries() which will set the node beginning and end to
at *least* the requested boundary.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:12 -07:00
Mel Gorman
9c7cd6877c [PATCH] Account for holes that are outside the range of physical memory
absent_pages_in_range() made the assumption that users of the API would not
care about holes beyound the end of physical memory.  This was not the
case.  This patch will account for ranges outside of physical memory as
holes correctly.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Mel Gorman
5cb248abf5 [PATCH] Have x86_64 use add_active_range() and free_area_init_nodes
Size zones and holes in an architecture independent manner for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Andi Kleen
c31fbb1ad8 [PATCH] Clean up acpi_numa variable
Move it into srat.c No need to clutter up setup.c for it

And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Yasunori Goto
762834e8bf [PATCH] Unify pxm_to_node() and node_to_pxm()
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
Daniel Yeisley
0d01532451 [PATCH] x86_64: Handle empty node zero
From: Daniel Yeisley <dan.yeisley@unisys.com>

It is possible to boot a Unisys ES7000 with CPUs from multiple cells, and not
also include the memory from those cells.  This can create a scenario where
node 0 has cpus, but no associated memory.  The system will boot fine in a
configuration where node 0 has memory, but nodes 2 and 3 do not.

[AK: I rechecked the code and generic code seems to indeed handle that already.
Dan's original patch had a change for mm/slab.c that seems to be already in now.]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-30 20:31:06 -07:00
Andi Kleen
fad7906d16 [PATCH] x86_64: Fix memory hotadd heuristics
This fixes some boot failures on Dell and Unisys systems with memory
hotadd added.

 - Set hotadd_percent to 0 by default.  This means anybody using hotadd
   memory needs to specify the value on the command line.  That's
   because there are lots of Intel boxes which have a bogus hotplug area
   in their SRAT and they would waste a lot of memory before.
 - Fix calculation of how much memory to use when the hotplug area
   exceeds hotadd_percent
 - Fix fallback when the
 - Fix fallback if memory hotadd is not compiled in.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-16 07:59:31 -07:00
Andi Kleen
a8062231d8 [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.

This can happen with memory hotplug when the hotplug area defines an so
far empty node.

Now use bootmem to try to allocate the mem_map in other nodes.

And if it fails don't panic, but just ignore the node.

To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.

TBD should try to use nearby nodes here.  Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:16 -07:00
Andi Kleen
68a3a7feb0 [PATCH] x86_64: Reserve SRAT hotadd memory on x86-64
From: Keith Mannthey, Andi Kleen

Implement memory hotadd without sparsemem. The memory in the SRAT
hotadd area is just preserved instead and can be activated later.

There are a few restrictions:
- Only one continuous hotadd area allowed per node

The main problem is dealing with the many buggy SRAT tables
that are out there. The strategy here is to reject anything
suspicious.

Originally from Keith Mannthey, with several hacks and changes by AK
and also contributions from Andrew Morton

[ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:

 1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n.

    Rebuilding zonelist is necessary when the system has just memory <
    4G at boot, and hot add memory > 4G.  because x86_64 has DMA32,
    ZONE_NORAML is not included into zonelist at boot time if system
    doesn't have memory >4G at boot.

    [AK: should just force the higher zones at boot time when SRAT tells us]

 2) zone and node's spanned_pages and present_pages are not incremented.
    They should be.

    For example, our server (ia64/Fujitsu PrimeQuest) can equip memory
    from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have
    possible 1T +memory.  (Microsoft requires "write all possible memory
    in SRAT") When we reserve memmap for possible 1T memory, Linux will
    not work well in +minimum 4G configuraion ;)

    [AK: needs limiting to 5-10% of max memory]
 ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:16 -07:00
Andi Kleen
abe059e759 [PATCH] x86_64: Rename struct node in x86-64 NUMA code to struct bootnode
It conflicts with the struct node in node.h
Actually the x86-64 version was there first, but ..

Suggested by Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:52 -08:00
Andi Kleen
2aed711a39 [PATCH] x86_64: Always pass full number of nodes to NUMA hash computation
Previously the numa hash code would be confused by holes in the node space
and stop early. This is the first part of the fix for the non boot issue
with empty nodes on Opterons.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 08:00:41 -08:00
Andi Kleen
fdb9df9424 [PATCH] x86_64: Relax SRAT covers all memory check a bit
Code was refusing good SRATs because about 12K got lost somewhere.
Allow less than 1MB of difference before rejecting it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 08:00:41 -08:00
Andi Kleen
d22fe80844 [PATCH] x86_64: Do more checking in the SRAT header code
- Check if the processor/memory affinity entries are long enough
   according to the ACPI 3.0 spec.
 - Ignore memory affinity entries that define a zero length region.

All based on BIOS issues found in the field @)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04 16:43:14 -08:00
Andi Kleen
9391a3f9c7 [PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing
Might fix boot failures on systems with empty PXMs in SRAT

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04 16:43:14 -08:00
Andi Kleen
8a6fdd3e91 [PATCH] x86_64: Reject SRAT tables that don't cover all memory
Broken BIOS on Iwill 8way systems reports these and it causes the bootmem
allocator to crash. Add a sanity check if all the PXMs in the
SRAT table cover all memory as reported by e820. If the sanity
check fails the SRAT is rejected and the code will fall back
to discover the NUMA topology using the K8 northbridge registers
when applicable.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 19:04:55 -08:00
Andi Kleen
e4e94072d9 [PATCH] x86_64: Return -1 for unknown PCI bus affinity
When we don't know the node a PCI bus is connected to return -1.
This matches the generic code.

Noticed by Ravikiran G Thirumalai <kiran@scalex86.org>

Cc: Ravikiran G Thirumalai <kiran@scalex86.org>

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 19:04:52 -08:00
Andi Kleen
1584b89c92 [PATCH] x86_64: Validate SLIT table
A lot of Opteron BIOS just pass 10 in all SLIT entries (10 is the
normalized unit). This is actually worse than the default heuristic
because it leads to pci_distance not knowing the difference between
local and remote nodes anymore. This messes up some NUMA
heuristics in generic code.

In this case it's better to fall back to the default heuristic
which just does nodea == nodeb ? 10 : 20.

This patch does some basic sanity checking on the SLIT and only accepts
the SLIT when it passes.

Invariants enforced are:
- Node to itself shall be 10
- Any other distance shouldn't be 10
- Distances smaller than 10 are illegal

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 19:04:51 -08:00
Magnus Damm
ffd10a2b77 [PATCH] x86_64: Make node boundaries consistent
The current x86_64 NUMA memory code is inconsequent when it comes to node
memory ranges. The exact behaviour varies depending on which config option
that is used.

setup_node_bootmem() has start and end as arguments and these are used to
calculate the size of the node like this: (end - start). This is all fine
if end is pointing to the first non-available byte. The problem is that the
current x86_64 code sometimes treats it as the last present byte and sometimes
as the first non-available byte. The result is that some configurations might
lose a page at the end of the range.

This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU
so they all treat the end variable as the first non-available byte. This is
the same way as the single node code.

The patch is boot tested on dual x86_64 hardware with the above configurations,
but maybe the removed code is needed as some workaround?

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:17 -08:00
Andi Kleen
69d81fcde7 [PATCH] x86_64: Speed up numa_node_id by putting it directly into the PDA
Not go from the CPU number to an mapping array.
Mode number is often used now in fast paths.

This also adds a generic numa_node_id to all the topology includes

Suggested by Eric Dumazet

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14 19:55:14 -08:00
Andi Kleen
4b6a455c74 [PATCH] x86-64: Use correct mask to compute conflicting nodes in SRAT
The nodes are not set online yet at this point.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Andi Kleen
2bce2b54ae [PATCH] x86-64: reset apicid<->node tables when SRAT cannot be parsed
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Andi Kleen
e58e0d0312 [PATCH] x86-64: Clean up the SRAT node list before computing the hash function
Also use for_each_node_mask instead of hand crafted loops.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Andi Kleen
05d1fa4bf6 [PATCH] x86-64: Improve error handling for overlapping PXMs in SRAT.
- Report PXMs instead of nodes
- Report the correct PXM, not always the one of node 1.
- Only warn for the case of a PXM overlapping by itself

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:56 -07:00
Andi Kleen
69e1a33f62 [PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignments
Since this is shared code I had to implement it for i386 too

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:57 -07:00
Andi Kleen
0b07e984fc [PATCH] x86-64: Don't assign CPU numbers in SRAT parsing
Do that later when the CPU boots. SRAT just stores the APIC<->Node
mapping node. This fixes problems on systems where the order
of SRAT entries does not match the MADT.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:55 -07:00
Ravikiran G Thirumalai
8c56ac3f3b [PATCH] x86_64: fix cpu_to_node setup for sparse apic_ids
While booting with SMT disabled in bios, when using acpi srat to setup
cpu_to_node[], sparse apic_ids create problems.

Without this patch, intel x86_64 boxes with hyperthreading disabled in the
bios (and which rely on srat for numa setup) endup having incorrect values in
cpu_to_node[] arrays, causing sched domains to be built incorrectly etc.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:02 -07:00
Andi Kleen
69cb62eb6d [PATCH] x86_64: Print a boot message for hotplug memory zones
From: Keith Manning

Print a boot message for hotplug memory zones

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28 21:46:00 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00