linux

q3k/linux

Author	SHA1	Message	Date
Xiao Guangrong	060c2abe6c	KVM: MMU: support apf for nonpaing guest Let's support apf for nonpaing guest Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:13 +02:00
Xiao Guangrong	e5f3f02796	KVM: MMU: clear apfs if page state is changed If CR0.PG is changed, the page fault cann't be avoid when the prefault address is accessed later And it also fix a bug: it can retry a page enabled #PF in page disabled context if mmu is shadow page This idear is from Gleb Natapov Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:12 +02:00
Xiao Guangrong	5054c0de66	KVM: MMU: fix missing post sync audit Add AUDIT_POST_SYNC audit for long mode shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:11 +02:00
Jan Kiszka	d89f5eff70	KVM: Clean up vm creation and release IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:09 +02:00
Tracey Dent	9d893c6bc1	KVM: x86: Makefile clean up Changed makefile to use the ccflags-y option instead of EXTRA_CFLAGS. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:08 +02:00
Xiao Guangrong	2a126faafb	KVM: remove unused function declaration Remove the declaration of kvm_mmu_set_base_ptes() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:07 +02:00
Avi Kivity	30bd0c4c6c	KVM: VMX: Disallow NMI while blocked by STI While not mandated by the spec, Linux relies on NMI being blocked by an IF-enabling STI. VMX also refuses to enter a guest in this state, at least on some implementations. Disallow NMI while blocked by STI by checking for the condition, and requesting an interrupt window exit if it occurs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:04 +02:00
Xiao Guangrong	e6d53e3b0d	KVM: avoid unnecessary wait for a async pf In current code, it checks async pf completion out of the wait context, like this: if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) r = vcpu_enter_guest(vcpu); else { ...... kvm_vcpu_block(vcpu) ^- waiting until 'async_pf.done' is not empty } kvm_check_async_pf_completion(vcpu) ^- delete list from async_pf.done So, if we check aysnc pf completion first, it can be blocked at kvm_vcpu_block Fixed by mark the vcpu is unhalted in kvm_check_async_pf_completion() path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:00 +02:00
Xiao Guangrong	c7d28c2404	KVM: fix searching async gfn in kvm_async_pf_gfn_slot Don't search later slots if the slot is empty Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:59 +02:00
Xiao Guangrong	c9b263d2be	KVM: fix tracing kvm_try_async_get_page Tracing 'async' and pfn is useless, since 'async' is always true, and 'pfn' is always "fault_pfn' We can trace 'gva' and 'gfn' instead, it can help us to see the life-cycle of an async_pf Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:56 +02:00
Takuya Yoshikawa	2653503769	KVM: replace vmalloc and memset with vzalloc Let's use newly introduced vzalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:55 +02:00
Gleb Natapov	ec25d5e66e	KVM: handle exit due to INVD in VMX Currently the exit is unhandled, so guest halts with error if it tries to execute INVD instruction. Call into emulator when INVD instruction is executed by a guest instead. This instruction is not needed by ordinary guests, but firmware (like OpenBIOS) use it and fail. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:53 +02:00
Jan Kiszka	2eec734374	KVM: x86: Avoid issuing wbinvd twice Micro optimization to avoid calling wbinvd twice on the CPU that has to emulate it. As we might be preempted between smp_call_function_many and the local wbinvd, the cache might be filled again so that real work could be done uselessly. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:52 +02:00
Takuya Yoshikawa	515a01279a	KVM: pre-allocate one more dirty bitmap to avoid vmalloc() Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by vmalloc() which will be used in the next logging and this has been causing bad effect to VGA and live-migration: vmalloc() consumes extra systime, triggers tlb flush, etc. This patch resolves this issue by pre-allocating one more bitmap and switching between two bitmaps during dirty logging. Performance improvement: I measured performance for the case of VGA update by trace-cmd. The result was 1.5 times faster than the original one. In the case of live migration, the improvement ratio depends on the workload and the guest memory size. In general, the larger the memory size is the more benefits we get. Note: This does not change other architectures's logic but the allocation size becomes twice. This will increase the actual memory consumption only when the new size changes the number of pages allocated by vmalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:46 +02:00
Marcelo Tosatti	612819c3c6	KVM: propagate fault r/w information to gup(), allow read-only memory As suggested by Andrea, pass r/w error code to gup(), upgrading read fault to writable if host pte allows it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:28:40 +02:00
Marcelo Tosatti	7905d9a5ad	KVM: MMU: flush TLBs on writable -> read-only spte overwrite This can happen in the following scenario: vcpu0 vcpu1 read fault gup(.write=0) gup(.write=1) reuse swap cache, no COW set writable spte use writable spte set read-only spte Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:39 +02:00
Marcelo Tosatti	982c25658c	KVM: MMU: remove kvm_mmu_set_base_ptes Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:38 +02:00
Marcelo Tosatti	ff1fcb9ebd	KVM: VMX: remove setting of shadow_base_ptes for EPT The EPT present/writable bits use the same position as normal pagetable bits. Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. Also pass present/writable error code information from EPT violation to generic pagefault handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:37 +02:00
Avi Kivity	83bcacb1a5	KVM: Avoid double interrupt injection with vapic After an interrupt injection, the PPR changes, and we have to reflect that into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the whole interrupt injection routine to be run again (harmlessly). Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise there is no chance that a new injection is needed. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:36 +02:00
Avi Kivity	82ca2d108b	KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers This abstraction only serves to obfuscate. Remove. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:34 +02:00
Avi Kivity	dacccfdd6b	KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path ldt is never used in the kernel context; same goes for fs (x86_64) and gs (i386). So save/restore them in the heavyweight exit path instead of the lightweight path. By itself, this doesn't buy us much, but it paves the way for moving vmload and vmsave to the heavyweight exit path, since they modify the same registers. [jan: fix copy/pase mistake on i386] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:33 +02:00
Avi Kivity	afe9e66f82	KVM: SVM: Move svm->host_gs_base into a separate structure More members will join it soon. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:31 +02:00
Avi Kivity	13c34e073b	KVM: SVM: Move guest register save out of interrupts disabled section Saving guest registers is just a memory copy, and does not need to be in the critical section. Move outside the critical section to improve latency a bit. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:29 +02:00
Jan Kiszka	d4c90b0043	KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason May otherwise generates build warnings about unused kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST enabled. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:27 +02:00
Andi Kleen	f56f536956	KVM: Move KVM context switch into own function gcc 4.5 with some special options is able to duplicate the VMX context switch asm in vmx_vcpu_run(). This results in a compile error because the inline asm sequence uses an on local label. The non local label is needed because other code wants to set up the return address. This patch moves the asm code into an own function and marks that explicitely noinline to avoid this problem. Better would be probably to just move it into an .S file. The diff looks worse than the change really is, it's all just code movement and no logic change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:26 +02:00
Jan Kiszka	7e1fbeac6f	KVM: x86: Mark kvm_arch_setup_async_pf static It has no user outside mmu.c and also no prototype. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:25 +02:00
Gleb Natapov	fc5f06fac6	KVM: Send async PF when guest is not in userspace too. If guest indicates that it can handle async pf in kernel mode too send it, but only if interrupts are enabled. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:22 +02:00
Gleb Natapov	6adba52742	KVM: Let host know whether the guest can handle async PF in non-userspace context. If guest can detect that it runs in non-preemptable context it can handle async PFs at any time, so let host know that it can send async PF even if guest cpu is not in userspace. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:21 +02:00
Gleb Natapov	6c047cd982	KVM paravirt: Handle async PF in non preemptable context If async page fault is received by idle task or when preemp_count is not zero guest cannot reschedule, so do sti; hlt and wait for page to be ready. vcpu can still process interrupts while it waits for the page to be ready. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:19 +02:00
Gleb Natapov	7c90705bf2	KVM: Inject asynchronous page fault into a PV guest if page is swapped out. Send async page fault to a PV guest if it accesses swapped out memory. Guest will choose another task to run upon receiving the fault. Allow async page fault injection only when guest is in user mode since otherwise guest may be in non-sleepable context and will not be able to reschedule. Vcpu will be halted if guest will fault on the same page again or if vcpu executes kernel code. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:17 +02:00
Gleb Natapov	631bc48782	KVM: Handle async PF in a guest. When async PF capability is detected hook up special page fault handler that will handle async page fault events and bypass other page faults to regular page fault handler. Also add async PF handling to nested SVM emulation. Async PF always generates exit to L1 where vcpu thread will be scheduled out until page is available. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:16 +02:00
Gleb Natapov	fd10cde929	KVM paravirt: Add async PF initialization to PV guest. Enable async PF in a guest if async PF capability is discovered. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:14 +02:00
Gleb Natapov	344d9588a9	KVM: Add PV MSR to enable asynchronous page faults delivery. Guest enables async PF vcpu functionality using this MSR. Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:12 +02:00
Gleb Natapov	ca3f10172e	KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c. Async PF also needs to hook into smp_prepare_boot_cpu so move the hook into generic code. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:10 +02:00
Gleb Natapov	49c7754ce5	KVM: Add memory slot versioning and use it to provide fast guest write interface Keep track of memslots changes by keeping generation number in memslots structure. Provide kvm_write_guest_cached() function that skips gfn_to_hva() translation if memslots was not changed since previous invocation. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:08 +02:00
Gleb Natapov	56028d0861	KVM: Retry fault before vmentry When page is swapped in it is mapped into guest memory only after guest tries to access it again and generate another fault. To save this fault we can map it immediately since we know that guest is going to access the page. Do it only when tdp is enabled for now. Shadow paging case is more complicated. CR[034] and EFER registers should be switched before doing mapping and then switched back. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:06 +02:00
Gleb Natapov	af585b921e	KVM: Halt vcpu if page it tries to access is swapped out If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. [avi: remove call to get_user_pages_noio(), nacked by Linus; this makes everything synchrnous again] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:21:39 +02:00
Avi Kivity	010c520e20	KVM: Don't reset mmu context unnecessarily when updating EFER The only bit of EFER that affects the mmu is NX, and this is already accounted for (LME only takes effect when changing cr0). Based on a patch by Hillf Danton. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-02 12:05:15 +02:00
Avi Kivity	d0dfc6b74a	KVM: i8259: initialize isr_ack isr_ack is never initialized. So, until the first PIC reset, interrupts may fail to be injected. This can cause Windows XP to fail to boot, as reported in the fallout from the fix to https://bugzilla.kernel.org/show_bug.cgi?id=21962. Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-02 11:52:48 +02:00
Avi Kivity	649497d1a3	KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow We use the physical address instead of the base gfn for the four PAE page directories we use in unpaged mode. When the guest accesses an address above 1GB that is backed by a large host page, a BUG_ON() in kvm_mmu_set_gfn() triggers. Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962 Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com> KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-29 12:35:29 +02:00
Linus Torvalds	0a59228168	Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: handle rt_sigreturn() more cleanly arch/tile: handle CLONE_SETTLS in copy_thread(), not user space	2010-12-18 10:28:54 -08:00
Linus Torvalds	2ba16c4f45	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus * 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: MIPS: Fix build errors in sc-mips.c	2010-12-18 10:23:29 -08:00
Linus Torvalds	46bdfe6a50	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86: avoid high BIOS area when allocating address space x86: avoid E820 regions when allocating address space x86: avoid low BIOS area when allocating address space resources: add arch hook for preventing allocation in reserved areas Revert "resources: support allocating space within a region from the top down" Revert "PCI: allocate bus resources from the top down" Revert "x86/PCI: allocate space from the end of a region, not the beginning" Revert "x86: allocate space within a region top-down" Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" PCI: Update MCP55 quirk to not affect non HyperTransport variants	2010-12-18 10:13:24 -08:00
Chris Metcalf	81711cee93	arch/tile: handle rt_sigreturn() more cleanly The current tile rt_sigreturn() syscall pattern uses the common idiom of loading up pt_regs with all the saved registers from the time of the signal, then anticipating the fact that we will clobber the ABI "return value" register (r0) as we return from the syscall by setting the rt_sigreturn return value to whatever random value was in the pt_regs for r0. However, this breaks in our 64-bit kernel when running "compat" tasks, since we always sign-extend the "return value" register to properly handle returned pointers that are in the upper 2GB of the 32-bit compat address space. Doing this to the sigreturn path then causes occasional random corruption of the 64-bit r0 register. Instead, we stop doing the crazy "load the return-value register" hack in sigreturn. We already have some sigreturn-specific assembly code that we use to pass the pt_regs pointer to C code. We extend that code to also set the link register to point to a spot a few instructions after the usual syscall return address so we don't clobber the saved r0. Now it no longer matters what the rt_sigreturn syscall returns, and the pt_regs structure can be cleanly and completely reloaded. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2010-12-17 16:59:29 -05:00
Chris Metcalf	bc4cf2bb27	arch/tile: handle CLONE_SETTLS in copy_thread(), not user space Previously we were just setting up the "tp" register in the new task as started by clone() in libc. However, this is not quite right, since in principle a signal might be delivered to the new task before it had its TLS set up. (Of course, this race window still exists for resetting the libc getpid() cached value in the new task, in principle. But in any case, we are now doing this exactly the way all other architectures do it.) This change is important for 2.6.37 since the tile glibc we will be submitting upstream will not set TLS in user space any more, so it will only work on a kernel that has this fix. It should also be taken for 2.6.36.x in the stable tree if possible. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Cc: stable <stable@kernel.org>	2010-12-17 16:56:50 -05:00
Kevin Cernekee	081d835fa4	MIPS: Fix build errors in sc-mips.c Seen with malta_defconfig on Linus' tree: CC arch/mips/mm/sc-mips.o arch/mips/mm/sc-mips.c: In function 'mips_sc_is_activated': arch/mips/mm/sc-mips.c:77: error: 'config2' undeclared (first use in this function) arch/mips/mm/sc-mips.c:77: error: (Each undeclared identifier is reported only once arch/mips/mm/sc-mips.c:77: error: for each function it appears in.) arch/mips/mm/sc-mips.c:81: error: 'tmp' undeclared (first use in this function) make[2]: * [arch/mips/mm/sc-mips.o] Error 1 make[1]: * [arch/mips/mm] Error 2 make: *** [arch/mips] Error 2 [Ralf: Cosmetic changes to minimize the number of arguments passed to mips_sc_is_activated] Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/1752/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2010-12-17 19:44:35 +00:00
Bjorn Helgaas	a2c606d53a	x86: avoid high BIOS area when allocating address space This prevents allocation of the last 2MB before 4GB. The experiment described here shows Windows 7 ignoring the last 1MB: https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27 This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin says "There will be ROM at the top of the 32-bit address space; it's a fact of the architecture, and on at least older systems it was common to have a shadow 1 MiB below." Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:01:30 -08:00
Bjorn Helgaas	4dc2287c18	x86: avoid E820 regions when allocating address space When we allocate address space, e.g., to assign it to a PCI device, don't allocate anything mentioned in the BIOS E820 memory map. On recent machines (2008 and newer), we assign PCI resources from the windows described by the ACPI PCI host bridge _CRS. On many Dell machines, these windows overlap some E820 reserved areas, e.g., BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved) pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff] If we put devices at 0xbff00000, they don't work, probably because that's really RAM, not I/O memory. This patch prevents that by removing the 0xbfe4dc00-0xbfffffff area from the "available" resource. I'm not very happy with this solution because Windows solves the problem differently (it seems to ignore E820 reserved areas and it allocates top-down instead of bottom-up; details at comment 45 of the bugzilla below). That means we're vulnerable to BIOS defects that Windows would not trip over. For example, if BIOS described a device in ACPI but didn't mention it in E820, Windows would work fine but Linux would fail. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228 Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:01:24 -08:00
Bjorn Helgaas	30919b0bf3	x86: avoid low BIOS area when allocating address space This implements arch_remove_reservations() so allocate_resource() can avoid any arch-specific reserved areas. This currently just avoids the BIOS area (the first 1MB), but could be used for E820 reserved areas if that turns out to be necessary. We previously avoided this area in pcibios_align_resource(). This patch moves the test from that PCI-specific path to a generic path, so all resource allocations will avoid this area. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:01:17 -08:00
Bjorn Helgaas	d14125ecfe	Revert "x86/PCI: allocate space from the end of a region, not the beginning" This reverts commit `dc9887dc02`. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:00:49 -08:00

1 2 3 4 5 ...

52284 commits