cmps and scas instructions accept repeat prefixes F3 and F2. So in
order to emulate those prefixed instructions we need to be able to know
if prefixes are REP/REPE/REPZ or REPNE/REPNZ. Currently kvm doesn't make
this distinction. This patch introduces this distinction.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Non-x86 archs don't need this mechanism. Move it to arch, and
keep its interface in common.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The state of SECONDARY_VM_EXEC_CONTROL shouldn't depend on in-kernel IRQ chip,
this patch fix this.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:
- No way to tell which features the host supports
While some features can be supported with no changes to kvm, others
need explicit support. That means kvm needs to vet the feature set
before it is passed to the guest.
- No support for indexed or stateful cpuid entries
Some cpuid entries depend on ecx as well as on eax, or on internal
state in the processor (running cpuid multiple times with the same
input returns different output). The current cpuid machinery only
supports keying on eax.
- No support for save/restore/migrate
The internal state above needs to be exposed to userspace so it can
be saved or migrated.
This patch adds extended cpuid support by means of three new ioctls:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports
- KVM_SET_CPUID2: sets the vcpu's cpuid table
- KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state
[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
before]
Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
These are traditionally named 'page', but even more traditionally, that name
is reserved for variables that point to a 'struct page'. Rename them to 'sp'
(for "shadow page").
Signed-off-by: Avi Kivity <avi@qumranet.com>
Converting a frame number to an address is tricky since the data type changes
size. Introduce a function to do it. This fixes an actual bug when
accessing guest ptes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
If all we're doing is increasing permissions on a pte (typical for demand
paging), then there's not need to flush remote tlbs. Worst case they'll
get a spurious page fault.
Signed-off-by: Avi Kivity <avi@qumranet.com>
I spent an hour worrying why I see so many guest page faults on FC6 i386.
Turns out bypass wasn't implemented for nonpae. Implement it so it doesn't
happen again.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Split kvm_arch_vcpu_create() into kvm_arch_vcpu_create() and
kvm_arch_vcpu_setup(), enabling preemption notification between the two.
This mean that we can now do vcpu_load() within kvm_arch_vcpu_setup().
Signed-off-by: Avi Kivity <avi@qumranet.com>
Moving !user_alloc case to kvm_arch to avoid unnecessary
code logic in non-x86 platform.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of incrementally changing the mmu cache size for every memory slot
operation, recalculate it from scratch. This is simpler and safer.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of fetching one byte at a time, prefetch 15 bytes (or until the next
page boundary) to avoid guest page table walks.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Theoretically used to acccess memory known to be ordinary RAM, it was
never implemented. It is questionable whether it is possible to implement
it correctly.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Improve dirty bit setting for pages that kvm release, until now every page
that we released we marked dirty, from now only pages that have potential
to get dirty we mark dirty.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When we map a page, we check whether some other vcpu mapped it for us and if
so, bail out. But we should decrease the refcount on the page as we do so.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Use kvm_write_guest_page() with empty_zero_page, instead of doing
kmap and memset.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Ensure that segment.base == segment.selector << 4 when entering the real
mode on Intel so that the CPU will not bark at us. This fixes some old
protected mode demo from http://www.x86.org/articles/pmbasics/tspec_a1_doc.htm.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move out kvm_mmu init and exit functionality from kvm_main.c
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Meanwhile keep the interface in common, and leave as more logic in common
as possible.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of having each architecture do it individually, we
do this in the arch-independent code (just x86 as of now).
[avi: add svm to the mix, which was added to mainline during the
2.6.24-rc process]
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is a little more accurate (since it counts actual reloads, not potential
reloads), and reverses the sense of the statistic to measure a bad event like
most of the other stats (e.g. we want to minimize all counters).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add two arch hooks to handle kvm_create_vm and kvm destroy_vm. Now, just
put io_bus init and destory in common.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move kvm_vcpu_ioctl_translate to arch, since mmu would be put under arch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
We pass vcpu, vmx->fail, and vmx->launched to assembly code, but all three
are fields within vmx. Consolidate by only passing in vmx and offsets for
the rest.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Make KVM_CHECK_EXTENSION code into a function, all archs can define its
capability independently.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current 'lods' and 'stos' is depending on incoming CR2 rather than decode
memory address from registers.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Will be called once arch module registers itself.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move some includes to x86.c from kvm_main.c, since the related functions
have been moved to x86.c
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This changes kvm_write_guest_page/kvm_read_guest_page to use
copy_to_user/read_from_user, as a result we get better speed
and better dirty bit tracking.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Convert a guest frame number to the corresponding host virtual address.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Check for the "error hva", an address outside the user address space that
signals a bad gfn.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add wbinvd VM Exit support to prepare for pass-through
device cache emulation and also enhance real time
responsiveness.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If vmx fails to inject a real-mode interrupt while fetching the interrupt
redirection table, it fails to record this in the vectoring information
field. So we detect this condition and do it ourselves.
Signed-off-by: Avi Kivity <avi@qumranet.com>
We'll want to write to it in order to fix real-mode irq injection problems,
but it is a read-only field. Storing it in a variable solves that issue.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of injecting real-mode interrupts by writing the interrupt frame into
guest memory, abuse vmx by injecting a software interrupt. We need to
pretend the software interrupt instruction had a length > 0, so we have to
adjust rip backward.
This lets us not to mess with writing guest memory, which is complex and also
sleeps.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Every write access to guest pages should be tracked.
Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instructions like 'inc reg' that have the register operand encoded
in the opcode are currently specially decoded. Extend
decode_register_operand() to handle that case, indicated by having
DstReg or SrcReg without ModRM.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves implementation of the following functions from
kvm_main.c to x86.c:
free_pio_guest_pages, vcpu_find_pio_dev, pio_copy_data, complete_pio,
kernel_pio, pio_string_write, kvm_emulate_pio, kvm_emulate_pio_string
The function inject_gp, which was duplicated by yesterday's patch
series, is removed from kvm_main.c now because it is not needed anymore.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves the following functions to from kvm_main.c to x86.c:
emulator_read/write_std, vcpu_find_pervcpu_dev, vcpu_find_mmio_dev,
emulator_read/write_emulated, emulator_write_phys,
emulator_write_emulated_onepage, emulator_cmpxchg_emulated,
get_setment_base, emulate_invlpg, emulate_clts, emulator_get/set_dr,
kvm_report_emulation_failure, emulate_instruction
The following data type is moved to x86.c:
struct x86_emulate_ops emulate_ops
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves the implementation of the functions of kvm_get/set_msr,
kvm_get/set_msr_common, and set_efer from kvm_main.c to x86.c. The
definition of EFER_RESERVED_BITS is moved too.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM's nopage handler calls gfn_to_page() which acquires the mmap_sem when
calling out to get_user_pages(). nopage handlers are already invoked with the
mmap_sem held though. Introduce a __gfn_to_page() for use by the nopage
handler which requires the lock to already be held.
This was noticed by tglx.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch based on CR8/TPR patch, and enable the TPR shadow (FlexPriority)
for 32bit Windows. Since TPR is accessed very frequently by 32bit
Windows, especially SMP guest, with FlexPriority enabled, we saw significant
performance gain.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves the definitions of CR0_RESERVED_BITS,
CR4_RESERVED_BITS, and CR8_RESERVED_BITS along with the following
functions from kvm_main.c to x86.c:
set_cr0(), set_cr3(), set_cr4(), set_cr8(), get_cr8(), lmsw(),
load_pdptrs()
The static function wrapper inject_gp is duplicated in kvm_main.c and
x86.c for now, the version in kvm_main.c should disappear once the last
user of it is gone too.
The function load_pdptrs is no longer static, and now defined in x86.h
for the time being, until the last user of it is gone from kvm_main.c.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves the implementation of get_apic_base and set_apic_base
from kvm_main.c to x86.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves the definition of segment_descriptor_64 for AMD64 and
EM64T from kvm_main.c to segment_descriptor.h. It also adds a proper
#ifndef...#define...#endif around that header file.
The implementation of segment_base is moved from kvm_main.c to x86.c.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch splits kvm_vm_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
The patch is unchanged since last submission.
Common ioctls for all architectures are:
KVM_CREATE_VCPU, KVM_GET_DIRTY_LOG, KVM_SET_USER_MEMORY_REGION
x86 specific ioctls are:
KVM_SET_MEMORY_REGION,
KVM_GET/SET_NR_MMU_PAGES, KVM_SET_MEMORY_ALIAS, KVM_CREATE_IRQCHIP,
KVM_CREATE_IRQ_LINE, KVM_GET/SET_IRQCHIP
KVM_SET_TSS_ADDR
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Emulation may cause a shadow pte to be instantiated, which requires
memory resources. Make sure the caches are filled to avoid an oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The code that dispatches the page fault and emulates if we failed to map
is duplicated across vmx and svm. Merge it to simplify further bugfixing.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The 'mov abs' instruction family (opcodes 0xa0 - 0xa3) still depends on cr2
provided by the page fault handler. This is wrong for several reasons:
- if an instruction accessed misaligned data that crosses a page boundary,
and if the fault happened on the second page, cr2 will point at the
second page, not the data itself.
- if we're emulating in real mode, or due to a FlexPriority exit, there
is no cr2 generated.
So, this change adds decoding for this instruction form and drops reliance
on cr2.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of AMD i386
* Original code saves following registers:
ebx, ecx, edx, esi, edi, ebp
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
ebx, ecx, edx, esi, edi
- rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
description.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of AMD x86_64.
* Original code saves following registers:
rbx, rcx, rdx, rsi, rdi, rbp,
r8, r9, r10, r11, r12, r13, r14, r15
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
rbx, rcx, rdx, rsi, rdi
r8, r9, r10, r11, r12, r13, r14, r15
- rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
description.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of intel i386.
* Original code saves following registers:
eax, ebx, ecx, edx, edi, esi, ebp (using popa)
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
ebx, edi, rsi
- doesn't save eax because it is an output operand (vmx->fail)
- cannot put ecx in clobber description because it is an input operand,
but as we modify it and we want to keep its value (vcpu), we must
save it (pop/push)
- ebp is saved (pop/push) because GCC seems to ignore its use the clobber
description.
- edx is saved (pop/push) because it is reserved by GCC (REGPARM) and
cannot be put in the clobber description.
- line "mov (%%esp), %3 \n\t" has been removed because %3
is ecx and ecx is restored just after.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch lets GCC to determine which registers to save when we
switch to/from a VCPU in the case of intel x86_64.
* Original code saves following registers:
rax, rbx, rcx, rdx, rsi, rdi, rbp,
r8, r9, r10, r11, r12, r13, r14, r15
* Patched code:
- informs GCC that we modify following registers
using the clobber description:
rbx, rdi, rsi,
r8, r9, r10, r11, r12, r13, r14, r15
- doesn't save rax because it is an output operand (vmx->fail)
- cannot put rcx in clobber description because it is an input operand,
but as we modify it and we want to keep its value (vcpu), we must
save it (pop/push)
- rbp is saved (pop/push) because GCC seems to ignore its use in the clobber
description.
- rdx is saved (pop/push) because it is reserved by GCC (REGPARM) and
cannot be put in the clobber description.
- line "mov (%%rsp), %3 \n\t" has been removed because %3
is rcx and rcx is restored just after.
- line ASM_VMX_VMWRITE_RSP_RDX() is moved out of the ifdef/else/endif
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel. This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Reserve a few memory slots for kernel internal use. This is good for case
you have to register memory region and you want to be sure it was not
registered from userspace, and for case you want to register a memory region
that won't be seen from userspace.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Remove kvm memory slot allocation mechanism from the ioctl
and put it to exported function.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_vm_ioctl_set_memory_region() is able to remove memory in addition to
adding it. Therefore when using kernel swapping support for old userspaces,
we need to munmap the memory if the user request to remove it
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Split guest reset code out of vmx_vcpu_setup(). Besides being cleaner, this
moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup()
(which is executed with preemption enabled).
[izik: remove unused variable]
Signed-off-by: Avi Kivity <avi@qumranet.com>
First step to split kvm_vcpu. Currently, we just use an macro to define
the common fields in kvm_vcpu for all archs, and all archs need to define
its own kvm_vcpu struct.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Allocate a userspace buffer for older userspaces. Also eliminate phys_mem
buffer. The memset() in kvmctl really kills initial memory usage but swapping
works even with old userspaces.
A side effect is that maximum guest side is reduced for older userspace on
i386.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
ppc and s390 offer the possibility to track process times precisely
by looking at cpu timer on every context switch, irq, softirq etc.
We can use that infrastructure as well for guest time accounting.
We need to account the used time before we change the state.
This patch adds a call to account_system_vtime to kvm_guest_enter
and kvm_guest exit. If CONFIG_VIRT_CPU_ACCOUNTING is not set,
account_system_vtime is defined in hardirq.h as an empty function,
which means this patch does not change the behaviour on other
platforms.
I compile tested this patch on x86 and function tested the patch on
s390.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This allows guest memory to be swapped. Pages which are currently mapped
via shadow page tables are pinned into memory, but all other pages can
be freely swapped.
The patch makes gfn_to_page() elevate the page's reference count, and
introduces kvm_release_page() that pairs with it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
In case the page is not present in the guest memory map, return a dummy
page the guest can scribble on.
This simplifies error checking in its users.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current kvm mmu only reverse maps writable translation. This is used
to write-protect a page in case it becomes a pagetable.
But with swapping support, we need a reverse mapping of read-only pages as
well: when we evict a page, we need to remove any mapping to it, whether
writable or not.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instruction: cmc, clc, cli, sti
opcodes: 0xf5, 0xf8, 0xfa, 0xfb respectively.
[avi: fix reference to EFLG_IF which is not defined anywhere]
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Simplify the walker level loop not to carry so much information from one
loop to the next. In addition to being complex, this made kmap_atomic()
critical sections difficult to manage.
As a result of this change, kmap_atomic() sections are limited to actually
touching the guest pte, which allows the other functions called from the
walker to do sleepy operations. This will happen when we enable swapping.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Beside the obvious goodness of making code more common, this prevents
a livelock with the next patch which moves interrupt injection out of the
critical section.
Signed-off-by: Avi Kivity <avi@qumranet.com>
If no apic is enabled in the bitmap of an interrupt delivery with delivery
mode of lowest priority, a warning should be reported rather than select
a fallback vcpu
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Since the mmu uses different shadow pages for dirty large pages and clean
large pages, this allows the mmu to drop ptes that are now invalid.
Signed-off-by: Avi Kivity <avi@qumranet.com>
By forcing clean huge pages to be read-only, we have separate roles
for the shadow of a clean large page and the shadow of a dirty large
page. This is necessary because different ptes will be instantiated
for the two cases, even for read faults.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is more consistent with the accessed bit management, and makes the dirty
bit available earlier for other purposes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does
not depend on the VCPU state unlike GVA to GPA so there's no need to pass in
the kvm_vcpu.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Some of the MMU functions take a struct kvm_vcpu even though they affect all
VCPUs. This patch cleans up some of them to instead take a struct kvm. This
makes things a bit more clear.
The main thing that was confusing me was whether certain functions need to be
called on all VCPUs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.
This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Since vcpu->apic is of the correct type, there's not need to cast.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm
and vmx do it. And make it return the error rather than a fairly
random -ENOMEM.
This also solves the problem that neither svm.c nor vmx.c actually
handles the error path properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic().
And guess what? I found a minor bug: we don't need to hrtimer_cancel()
from kvm_main.c, because we do that in kvm_free_apic().
Also:
1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init.
2) Don't set apic->regs_page to zero before freeing apic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem. So we move the rmap base pointers to the memory slot.
A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.
Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Now that smp_call_function_single() knows how to call a function on the
current cpu, there's no need to check explicitly.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch modifies the management of REX prefix according behavior
I saw in Xen 3.1. In Xen, this modification has been introduced by
Jan Beulich.
http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.html
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The only valid case is on protected page access, other cases are errors.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as
it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags
only if emulation doesn't fail.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation
parts into functions.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch removes the fault injected when the guest attempts to set reserved
bits in cr3. X86 hardware doesn't generate a fault when setting reserved bits.
The result of this patch is that vmware-server, running within a kvm guest,
boots and runs memtest from an iso.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When we allow guest page faults to reach the guests directly, we lose
the fault tracking which allows us to detect demand paging. So we provide
an alternate mechnism by clearing the accessed bit when we set a pte, and
checking it later to see if the guest actually used it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM avoids reloading the efer msr when the difference between the guest
and host values consist of the long mode bits (which are switched by
hardware) and the NX bit (which is emulated by the KVM MMU).
This patch also allows KVM to ignore SCE (syscall enable) when the guest
is running in 32-bit mode. This is because the syscall instruction is
not available in 32-bit mode on Intel processors, so the SCE bit is
effectively meaningless.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm
module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to
not modify the context if it must be re-entered.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn().
x86_emulate_insn() is x86_emulate_memop() without the decoding part.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Split the decoding process into a new function x86_decode_insn().
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move all x86_emulate_memop() common variables between decode and execute to a
structure decode_cache. This will help in later separating decode and
emulate.
struct decode_cache {
u8 twobyte;
u8 b;
u8 lock_prefix;
u8 rep_prefix;
u8 op_bytes;
u8 ad_bytes;
struct operand src;
struct operand dst;
unsigned long *override_base;
unsigned int d;
unsigned long regs[NR_VCPU_REGS];
unsigned long eip;
/* modrm */
u8 modrm;
u8 modrm_mod;
u8 modrm_reg;
u8 modrm_rm;
u8 use_modrm_ea;
unsigned long modrm_ea;
unsigned long modrm_val;
};
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch refactors the current hypercall infrastructure to better
support live migration and SMP. It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.
A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace. There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used. There is no code
in tree that uses userspace hypercalls.
[avi: fix #ud injection on vmx]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add vmmcall/vmcall to x86_emulate. Future patch will implement functionality
for these instructions.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (890 commits)
x86: fix nodemap_size according to nodeid bits
x86: fix overlap between pagetable with bss section
x86: add PCI IDs to k8topology_64.c
x86: fix early_ioremap pagetable ops
x86: use the same pgd_list for PAE and 64-bit
x86: defer cr3 reload when doing pud_clear()
x86: early boot debugging via FireWire (ohci1394_dma=early)
x86: don't special-case pmd allocations as much
x86: shrink some ifdefs in fault.c
x86: ignore spurious faults
x86: remove nx_enabled from fault.c
x86: unify fault_32|64.c
x86: unify fault_32|64.c with ifdefs
x86: unify fault_32|64.c by ifdef'd function bodies
x86: arch/x86/mm/init_32.c printk fixes
x86: arch/x86/mm/init_32.c cleanup
x86: arch/x86/mm/init_64.c printk fixes
x86: unify ioremap
x86: fixes some bugs about EFI memory map handling
x86: use reboot_type on EFI 32
...
Both the old e1000 driver and the new e1000e driver can drive some
PCI-Express e1000 cards, and we should avoid ambiguity about which
driver will pick up the support for those cards when both drivers are
enabled.
This solves the problem by having the old driver support those cards if
the new driver isn't configured, but otherwise ceding support for PCI
Express versions of the e1000 chipset to the newer driver. Thus
allowing both legacy configurations where only the old driver is active
(and handles all chips it knows about) and the new configuration with
the new driver handling the more modern PCIE variants.
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts various users of change_page_attr() to the new,
more intent driven set_page_*/set_memory_* API set.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the previous patch in the old RTC driver. It also removes the direct
rtc_interrupt() call from arch/x86/kernel/hpetc.c so that there's finally no
(code) dependency to CONFIG_RTC in arch/x86/kernel/hpet.c.
Because of this, it's possible to compile the drivers/char/rtc.ko driver as
module and still use the HPET emulation functionality. This is also expressed
in Kconfig.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The ACPI code currently disables TSC use in any C2 and C3
states. But the AMD Fam10h BKDG documents that the TSC
will never stop in any C states when the CONSTANT_TSC bit is
set. Make this disabling conditional on CONSTANT_TSC
not set on AMD.
I actually think this is true on Intel too for C2 states
on CPUs with p-state invariant TSC, but this needs
further discussions with Len to really confirm :-)
So far it is only enabled on AMD.
Cc: lenb@kernel.org
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Aviod TLB flush IPIs during C3 states by voluntary leave_mm()
before entering C3.
The performance impact of TLB flush on C3 should not be significant with
respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in
C3 anyways.
On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes
down from 40 per second to ~ 0. Total number of interrupts during the run
of this workload was ~1200 per second, which makes it ~3% savings in wakeups.
There was no measurable performance or power impact however.
[ akpm@linux-foundation.org: symbol export fixes. ]
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
People with HP Desktops (including me) encounter couple of DMI errors
during boot - dmi_save_oem_strings_devices: out of memory and
dmi_string: out of memory.
On some HP desktops the DMI data include OEM strings (type 11) out of
which only few are meaningful and most other are empty. DMI code
religiously creates copies of these 27 strings (65 bytes each in my
case) and goes OOM in dmi_string().
If DMI_MAX_DATA is bumped up a little then it goes and fails in
dmi_save_oem_strings while allocating dmi_devices of sizeof(struct
dmi_device) corresponding to these strings.
On x86_64 since we cannot use alloc_bootmem this early, the code uses a
static array of 2048 bytes (DMI_MAX_DATA) for allocating the memory DMI
needs. It does not survive the creation of empty strings and devices.
Fix this by detecting and not newly allocating empty strings and instead
using a one statically defined dmi_empty_string.
Also do not create a new struct dmi_device for each empty string - use
one statically define dmi_device with .name=dmi_empty_string and add
that to the dmi_devices list.
On x64 this should stop the OOM with same current size of DMI_MAX_DATA
and on x86 this should save a good amount of (27*65 bytes +
27*sizeof(struct dmi_device) bootmem.
Compile and boot tested on both 32-bit and 64-bit x86.
Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There's no need for the *_MASK flags (TF_MASK, IF_MASK, etc), found in
processor.h (both _32 and _64). They have a one-to-one mapping with the
EFLAGS value. This patch removes the definitions, and use the already
existent X86_EFLAGS_ version when applicable.
[ roland@redhat.com: KVM build fixes. ]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
replace outb_p() with udelay(2). This is a real ISA device so it likely
needs this particular delay.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch unifies struct desc_ptr between i386 and x86_64.
They can be expressed in the exact same way in C code, only
having to change the name of one of them. As Xgt_desc_struct
is ugly and big, this is the one that goes away.
There's also a padding field in i386, but it is not really
needed in the C structure definition.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
tons of style cleanup in drivers/char/rtc.c - no code changed:
text data bss dec hex filename
6400 384 32 6816 1aa0 rtc.o.before
6400 384 32 6816 1aa0 rtc.o.after
since we seem to have a number of open breakages in this code we might
as well start with making the code more readable and maintainable.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This changes size-specific register names (eip/rip, esp/rsp, etc.) to
generic names in the thread and tss structures.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Looks like IRQ 31 is assigned to timer 3, even without the patch!
I wonder who wrote the number 31. But the manual says that it is
zero by default.
I think we should check whether the timer has been allocated an IRQ before
proceeding to assign one to it. Here is a patch that does this.
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
device. This patch fixes it by allocating IRQs to timer blocks in the HPET.
arch/x86/kernel/hpet.c | 13 +++++--------
drivers/char/hpet.c | 45 ++++++++++++++++++++++++++++++++++++++-------
include/linux/hpet.h | 2 +-
3 files changed, 44 insertions(+), 16 deletions(-)
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
x86_64 don't expose the intermediate representation with one underline,
_PAGE_KERNEL, just the double-underlined one.
Use it, to get a common ground between 32 and 64-bit
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
explicitly use ktime.h include
explicitly use hrtimer.h include
explicitly use sched.h include
This patch adds headers explicitly to lguest sources file,
to avoid depending on them being included somewhere else.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can save some lines of code by getting rid of
*lg = cpu... lines of code spread everywhere by now.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gpte_addr() does not depend on any guest information. So we wipe out
the lg parameter from it completely.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
events represented in the 'changed' bitmap are per-cpu, not per-guest.
move it to the lg_cpu structure
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
in our new model, pages are assigned to a virtual cpu, not to a guest.
We move it to the lg_cpu structure.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
in our model, a guest does not run in a cpu anymore: a virtual cpu
does. So we change last_guest to last_cpu
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
spte_addr does not depend on any guest information, so we
wipe out the lg parameter completely.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
this patch makes the pgdir management per-vcpu. The pgdirs pool
is still guest-wide (although it'll probably need to grow when we
are really executing more vcpus), but the pgdidx index is gone,
since it makes no sense anymore. Instead, we use a per-vcpu
index.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
this patch makes the pending_notify field, used to control
pending notifications, per-vcpu, instead of per-guest
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lguest struct have room for some fields, namely, cr2, ts, esp1
and ss1, that are not really guest-wide, but rather, vcpu-wide.
This patch puts it in the vcpu struct
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lguest uses tasks to control its running behaviour (like sending
breaks, controlling halted state, etc). In a per-vcpu environment,
each vcpu will have its own underlying task. So this patch
makes the infrastructure for that possible
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The fields found in lguest_arch are not really per-guest,
but per-cpu (gdt, idt, etc). So this patch turns lguest_arch
into lg_cpu_arch.
It makes sense to have a per-guest per-arch struct, but this
can be addressed later, when the need arrives.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the most obvious per-vcpu field: registers.
So this patch moves it from struct lguest to struct vcpu,
and patch the places in which they are used, accordingly
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
emulate_insn() needs to know about current eip, which will be,
in the future, a per-vcpu thing. So in this patch, the function
prototype is modified to receive a vcpu struct
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The switcher needs to be mapped per-vcpu, because different vcpus
will potentially have different page tables (they don't have to,
because threads will share the same).
So our first step is the make the function receive a vcpu struct
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch adapts interrupt processing for using the vcpu struct.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Here, I introduce per-vcpu timers. With this, we can have
local expiries, needed for accounting time in smp guests
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
this patch changes do_hcall() and do_async_hcall() interfaces (and obviously their
callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the whole
guest
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch makes the write() file operation smp aware. Which means, receiving
the vcpu_id value through the offset parameter, and being well aware to which
vcpu we're talking to.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch makes the run_guest() routine use the lg_cpu struct.
This is required since in a smp guest environment, there's no
more the notion of "running the guest", but rather, it is "running the vcpu"
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
this patch initializes the first vcpu in the initialize() routing,
which is responsible for starting the process of putting the guest up.
right now, as much of the fields are still not per-vcpu, it does not
do much.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
this patch introduces a vcpu struct for lguest. In upcoming patches,
more and more fields will be moved from the lguest struct to the vcpu
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently, lguest module can't be compiled without the PARAVIRT flag being
on. This is a fake dependency, since the module itself shouldn't need any
paravirt override. Reason for that is the reference to pv_info structure
in initial loading tests.
This patch removes it in favour of a more generic error message.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Parts depend on CONFIG_LGUEST, not just CONFIG_LGUEST_GUEST
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The new e1000e driver is apparently not yet suitable for general use, so
mark it experimental, and re-instate all the PCI-Express device IDs in
the old and stable e1000 driver so that people (namely me) can continue
to use a driver that actually works.
Auke & co have been appraised of the situation.
Cc: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ace_fsm_dostate(), the variable 'i' was used only for passing
sector size of the request to end_that_request_first().
So I removed it and changed the code to pass the size in bytes
directly to __blk_end_request()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use SGI_HAS_INDYDOG for INDYDOG depends.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
[IPV6] ADDRLABEL: Fix double free on label deletion.
[PPP]: Sparse warning fixes.
[IPV4] fib_trie: remove unneeded NULL check
[IPV4] fib_trie: More whitespace cleanup.
[NET_SCHED]: Use nla_policy for attribute validation in ematches
[NET_SCHED]: Use nla_policy for attribute validation in actions
[NET_SCHED]: Use nla_policy for attribute validation in classifiers
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
[NET_SCHED]: sch_api: introduce constant for rate table size
[NET_SCHED]: Use typeful attribute parsing helpers
[NET_SCHED]: Use typeful attribute construction helpers
[NET_SCHED]: Use NLA_PUT_STRING for string dumping
[NET_SCHED]: Use nla_nest_start/nla_nest_end
[NET_SCHED]: Propagate nla_parse return value
[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
[NET_SCHED]: act_api: use nlmsg_parse
[NET_SCHED]: act_api: fix netlink API conversion bug
[NET_SCHED]: sch_netem: use nla_parse_nested_compat
[NET_SCHED]: sch_atm: fix format string warning
[NETNS]: Add namespace for ICMP replying code.
...
Fix a bunch of warnings in PPP and related drivers. Mostly because
sparse doesn't like it when the the function is only marked private in
the forward declaration.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to propagate it down to the __ip_route_output_key.
Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
always extend the rx timestamp with the local TSF, since this information is
also needed for proper IBSS merging. this is done in the tasklet for now, maybe
has to be moved to the interrupt handler like in madwifi.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
in "11.1.2.2 Beacon generation in an IBSS" the IEEE802.11 standard says, each
STA should... "b) Calculate a random delay uniformly distributed in the range
between zero and twice aCWmin × aSlotTime,".
configure cwmin and cwmax of the beacon queue in IBSS mode according to this.
unfortunately beacon backoff does not work reliably yet, so i suspect we have a
problem somewhere else, since the same settings (and similar beacon timer
configuration) work for madwifi.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
use SWBA (software beacon alert) interrupts to keep track of the next beacon
time und check if a HW merge (automatic TSF update) has happened on every
received beacon with the same BSSID.
this is necessary because the atheros hardware will silently update the local
TSF in IBSS mode, but not its beacon timers. if the TSF is ahead of the beacon
timers no beacons are sent until the timers wrap around (typically after about
1 minute).
this solution is not very nice, since we have to look into every beacon, but
there is apparently no other way to detect HW merges.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
drivers/net/wireless/ath5k/base.h: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
update ath5k_beacon_update_timers() for better beacon timer calculation in a
variety of situations. most important is the possibility to call it with the
timestamp of a received beacon, when we detected that a HW merge has happened
and we need to reconfigure the beacon timers based on that.
we call this from the mac80211 callback reset_tsf now instead of beacon_update,
and there will be more use of it in the next patch.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
the beacon interval is passed by mac80211 in TU already, so we can directly use
it without conversion. also update the comments about TU (1 TU is defined by
802.11 as 1024usec).
drivers/net/wireless/ath5k/ath5k.h: Changes-licensed-under: ISC
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
drivers/net/wireless/ath5k/base.h: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
reviewed beacon timer initialization with register traces from madwifi: what we
are doing is correct :). one minor fix: use 3 instead of 0x00000003 - it's more
readable.
drivers/net/wireless/ath5k/hw.c: Changes-licensed-under: ISC
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This zeros out all microcode related memory before loading
the microcode.
This also fixes initialization of the MAC control register.
The _only_ place where we overwrite the contents of the MAC control
register is at the beginning of b43_chip_init().
All other places must do read() -> mask/set -> write() to not
overwrite existing bits.
This also adds a longer delay for waiting for the microcode
to initialize itself. It seems that the current timeout is sufficient
on all available devices, but there's no real reason why we shouldn't
wait for up to one second. Slow embedded devices might exist.
Better safe than sorry.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise it may be impossible to connected to an open network after a
resume.
This is a modified version of an original patch by
Alex Eskin <alexeskin@yahoo.com>:
https://bugzilla.redhat.com/show_bug.cgi?id=425950#c8
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We must also store the ID string (filename) for the cached firmware blobs
and verify that we really have the right firmware cached before using it.
If we don't have the right fw cached, we must free it and request the
correct blobs.
This fixes bandswitch on A/B/G multi-PHY devices.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This also adds lots of TODOs. Oh well. Lots of work. :)
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes a sparse warning about weird locking.
The spinlock is not needed, so simply remove it.
This also adds some sanity checks to the PHY and radio locking
to protect against recursive locking.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We are pleased to announce a new Gigabit Ethernet product and its
driver to the linux community. This product is the Intel(R) 82575
Gigabit Ethernet adapter family. Physical adapters will be available
to the public soon. These adapters come in 2- and 4-port versions
(copper PHY) currently. Other variants will be available later.
The 82575 chipset supports significantly different features that
warrant a new driver. The descriptor format is (just like the
ixgbe driver) different. The device can use multiple MSI-X vectors
and multiple queues for both send and receive. This allows us to
optimize some of the driver code specifically as well compared to
the e1000-supported devices.
This version of the igb driver no lnger uses fake netdevices and
incorporates napi_struct members for each ring to do the multi-
queue polling. multi-queue is enabled by default and the driver
supports NAPI mode only.
All the namespace collisions should be gone in this version too. The
register macro's have been condensed to improve readability.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prefix "bp->phy_flags" names with BNX2_PHY_FLAG_* for consistency.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some blade systems using the 5706 serdes, the hardware sometimes
does not properly generate link down interrupts. We add a workaround
in the driver's timer to force a link-down when some PHY registers
report loss of SYNC.
The parallel detect logic is cleaned up slightly to better integrate
the workaround.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is more correct to get the status block from the bnx2_napi struct
instead of the bnx2 struct. It happens that they are the same in this
case because we are using the first MSIX vector.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip has problem running in this mode and needs to be disabled.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/wireless/iwlwifi/iwl-3945.c: In function 'iwl3945_add_radiotap':
drivers/net/wireless/iwlwifi/iwl-3945.c:269: warning: format '%d' expects type 'int', but argument 3 has type 'long unsigned int'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix rfkill code which caused a use-after-free bug. Thanks to David
Woodhouse for spotting this out.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
libertas: re-pepper debug statementThe recent fluff of updates
didn't put proper lbs_deb_enter/leave calls into the source code.
Add them where appropriate.
Also contains some whitespace changes.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The CF card only has a very old firmware (5.0.16p0). This firmware doesn't
know anything about mesh config. However, current code blindly calls
mesh_config when the card is inserted. So check the firmware version before
issuing this command.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Always shows the firmware release.
Also converts the firmware release into something that is easily comparable.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
boot2_version is purely USB specific, so move it to struct if_usb_card.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
netif_rx should be called only from interrupt context. if_cs and if_sdio receive
packets from other contexts, and thus should call netif_rx_ni.
Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Acked-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The << and >> operators need space on each side.
Cc: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The external iwlwifi driver comes with a README file that is
referenced by the Kconfig. This README is not present in the
driver included in the kernel. Remove references to this
documentation.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch renames iwl3945_rate_scale_priv to iwl3945_rs_sta as it
better represents the purpose of this variable.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1, This patch renames iwl4965_rate_scale_priv to iwl4965_lq_sta.
This type represents a station's link quality.
2. The names of the variables of this type were rs_priv, lq_data, lq, crl
across the file. All are now unified under the name lq_sta.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds iwl_free_fw_desc ucode helper function.
It also moves ucode helper functions to iwl-helpers.h.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After we delay device initialization until interface up, there are more
conditions for the hardware rf_kill switch states during suspend and
resume. For example, before suspend we can have interface up or down,
rf_kill enable or disable; before resume we can have rf_kill enable or
disable. So there are totally 2^3 = 8 conditions to handle. This patch
addressed this problem and makes sure every condition works correctly.
This patch also merges the device suspend and resume handlers with the
mac_start and mac_stop code since they are basically doing the same
thing.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the firmware loading (read firmware from disk and load
it into the device SRAM) from pci_probe time to the first network
interface open time. There are two reasons for doing this:
1. To support kernel buildin iwlwifi drivers. Because kernel initializes
network devices subsystem before hard disk and SATA subsystem, it is
impossible to get the firmware image from hard disk in the PCI probe
handler. Thus delaying the firmware loading into the network
interface open time is the way to go. Note, we only read the firmware
image from hard disk the first time the interface is open. After this
is succeeded, we cache the firmware image into the host memory. This
is a performance gain when user open and close the interface multiple
times and is necessary for device suspend and resume.
2. For better power saving. When the iwlwifi modules are loaded (or
buildin the kernel) but the wireless network interface is not being
used, it is a good practice the wireless device consumes as less
power as possible. Unloading the firmware from the wireless device
and unregister the driver's interrupt handler in the network
interface close handler provides users a way to achieve this. User
space network configuration tools (i.e NetworkManager) can also
contribute here when it detects a wired cable is connected and
close the wireless interface automatically.
This patch also includes the pci_save/restore_state() fixed by Ian Schram
upon the first version.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Ian Schram <ischram@telenet.be>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Document scan command.
Signed-off-by: Ben Cahill <ben.m.cahill@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch removes iwl4965_tx_cmd function and splits its content to
iwl4965_hw_build_tx_cmd_rate, iwl4965_build_tx_cmd_basic,
and iwl4965_tl_get_stats function. The latest one will be deprecated
when traffic load will move to rate scale module.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch moves iwl4965_get_dma_hi_address function to iwl-headers.h
as iwl_get_dma_hi_address. This function will be used in more chipsets
than only 4965.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This function removes redundant code in iwl4965_tx_cmd
function, leftovers of previous design.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds code and table data for channel switching on NPHYs.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds some code to init the 2055 radio.
This patch adds two files "tables_nphy.h" and "tables_nphy.c"
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the register definitions for the Broadcom 2055 N-radio.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Convert optional struct size checks to non-optional compile-time checks.
Furthermore BUILD_BUG_ON() which will be optimized away by the compiler.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds support for new firmware.
Old firmware is still supported until July 2008.
To get new firmware, go to
ftp://ftp.linksys.com/opensourcecode/wrt150nv11/1.51.3/
and download the tarball. We don't have a smaller tarball, yet.
That will be fixed later.
You can extract firmware out of the "wl_ap.o" file contained
in this tarball using latest fwcutter. You must pass the option
--unsupported to fwcutter.
Fwcutter-010 with official support for a new firmware image will
be released soon.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes reading of the high 16 bits of the radio ID
on new devices. 2055 radios want lo16 to be read first.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For TX rings the queue_idx should start at
IEEE80211_TX_QUEUE_DATA0 and for each followup
ring this index needs to be increased.
For the RX ring the queue_idx should be set
to 0. We don't need to initialize the tx_params.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rt2500usb and rt73usb data and desc pointer initialization
was incorrect because it was using uninitialized variables
to determine the length.
In addition rt2500usb used skb_pull and removed the ieee80211
from each received frame instead of using skb_trim to remove
the device descriptor from the frame.
Finally this also fixes the descriptor override when 4 byte
aligning occured. We still need a completely valid descriptor
when using the TX/RX dumping capabilities in debugfs.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds the initval filenames for the N-PHY firmware.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use the length of the variable section of the beacon instead of the
whole beacon length for bounds checking.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This queues frames flagged as "send after DTIM" by mac80211
on the special multicast queue. The firmware will take care
to send the packet after the DTIM.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes uploading of the beacon data and writing of the
TIM and DTIM offsets.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch (based on Ron Rindjunsky's) creates a framework for
a unified way to pass BSS configuration to drivers that require
the information, e.g. for implementing power save mode.
This patch introduces new ieee80211_bss_conf structure that is
passed to the driver via the new bss_info_changed() callback
when the BSS configuration changes.
This new BSS configuration infrastructure adds the following
new features:
* drivers are notified of their association AID
* drivers are notified of association status
and replaces the erp_ie_changed() callback. The patch also does
the relevant driver updates for the latter change.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mattias Nissler's "clean up rate selection" patch incorrectly changes
the behavior of txrate setting in sta_info. This patch backs out parts
of the rate selection consolidation in order to fix this issue for now.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch enables the A-MPDU Rx flow. it contains several
adjustments to new mac80211 A-MPDU Rx flow.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes an oops which was introduced as a regression by
commit fd640775bd16e1df50c867cc547af0, on the patch titled,
"mac80211: dont use interface indices in drivers".
ieee80211_generic_frame_duration() now relies on sdata->flags which
itself gets set upon bringing the interface up. We check for the
virtual interface now before setting the rate duration registers.
After the mode changes are introduced onto mac80211 we should revisit
these changes.
This patch was tested on the following cards:
1) BG card:
Atheros AR5213A chip found (MAC: 0x59, PHY: 0x43)
RF2112A 2GHz radio found (0x46)
2) ABG card:
Atheros AR5213A chip found (MAC: 0x59,PHY: 0x43)
RF5112A multiband radio found (0x36)
Signed-off-by: Luis R. Rodriguez <mcgrof@winlab.rutgers.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch gets rid of the if_id stuff where possible in favour of
a new per-virtual-interface structure "struct ieee80211_vif". This
structure is located at the end of the per-interface structure and
contains a variable length driver-use data area.
This has two advantages:
* removes the need to look up interfaces by if_id, this is better
for working with network namespaces and performance
* allows drivers to store and retrieve per-interface data without
having to allocate own lists/hash tables
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a mac80211 based wireless driver for the rtl8180 and
rtl8185 PCI wireless cards. Also included are some rtl8187 changes
required due to the relationship between that driver and this one.
Michael Wu is primarily responsible for the initial driver and rtl8185
support. Andreas Merello provided the additional rtl8180 support.
Thanks to Jukka Ruohonen for the donating a rtl8185 card! It was very
helpful for the rtl8225z2 code.
The Signed-off-by information below is collected from the individual
patches submitted to wireless-2.6 before merging this driver upstream.
Signed-off-by: Andrea Merello <andreamrl@tiscali.it>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
add ath5k wireless driver
Portions of this driver are covered by one or both of the ISC and
3-clause BSD licenses. Specific license information is cited at the top
of each file.
Acked-by and Signed-off-by information is collected from individual
patches as collected in the wireless-2.6 tree prior to upstream
submission.
Acked-by: Matthew W. S. Bell <mentor@madwifi.org>
Acked-by: Michael Taylor <mike.taylor@apprion.com>
Acked-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bradley M. Kuhn <bkuhn@softwarefreedom.org>
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Francesco Gringoli <francesco.gringoli@ing.unibs.it>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Karen Sandler <karen@softwarefreedom.org>
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Luis R. Rodriguez <mcgrof@gmail.com>
Signed-off-by: Matt Norwood <norwood@softwarefreedom.org>
Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: Richard Fontana <fontana@softwarefreedom.org>
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Ulrich Meis <meis@nets.rwth-aachen.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If the third PCMCIA ID string specifies the MAC chip, the fourth ID
string doesn't need to be matched. Even if it's different, it will be
compatible with the driver.
This ensures that other different revisions of the card will be
supported.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes a sparse warning about weird locking.
The spinlock is not needed, so simply remove it.
This also adds some sanity checks to the PHY and radio locking
to protect against recursive locking.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes the PHY routing bit handling.
This is needed for N-PHY.
No functional change to A-PHY and G-PHY code.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds all register definitions for the N-PHY.
This adds two new files: nphy.h and nphy.c
No functional changes to existing code.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes RX packet alignment issues in the zd1211rw driver.
This is based on a patch by Johannes Berg.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes a compilation warning in 'iwl-4965.c'.
"warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘long unsigned int’"
Signed-off-by: Miguel Botón <mboton@gmail.com
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rt2500usb and rt73usb store the descriptor in different
places. This means we should move the initialization of
the 2 pointers to the driver callback function fill_rxdone().
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Prior to enabling the radio rt2x00lib should go through all
rings and for each entry should call the callback function
init_txentry() and init_rxentry().
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use the MY_BSS descriptor field to determine if the
received frame belongs to the same BSS as the interface.
This can be used by rxdone to determine if the frame
should be updated or not.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Send the skb structure with write_tx_desc() and use
the skbdesc structure to read all information about
the frame. This saves several arguments in the function
definition and it is easier to send more information
later as well.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The packet filter flags don't belong in the interface structure
because they are device based instead of interface based.
So move the filter fields out of struct interface and into rt2x00_dev.
Additionally we shouldn't change the filter based on the working
mode, if such a thing is needed than mac80211 should have done that.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
suspend & resume was broken since it called rt2x00mac_start()
and rt2x00mac_stop() which would fail to execute because the
DEVICE_PRESENT flag was not set.
Move the start and stop handlers into rt2x00lib.c which are called
from rt2x00mac_start() and rt2x00mac_stop() after they have checked
the DEVICE_PRESENT flag, while suspend and resume handlers can
directly call those functions.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Store the queue idx inside structure data_ring
Store the entry idx inside structure data_entry
This saves us a few calls to ARRAY_INDEX() which is now unused.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These flags used to be fixed to one in rt2500pci_config_type, which
caused the beacon timer interrupt to fire. This would lead to
rt2x00lib_beacondone adding work which called
rt2x00lib_beacondone_scheduled which called ieee80211_beacon_get which
printed an error about not having any beacon data.
With this patch, these interrupts are only generated when the interface
is configured to send beacons.
Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Apparently it was possible that ieee80211_stop_queue() was not full while
NETDEV_TX_BUSY was being reported back. I think that is what causing the WARN_ON().
This moves all calls to ieee80211_stop_queue() in rt2x00mac.c where it is easier
to determine if the queue should be halted.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Initialize blob->data before moving the data pointer
Initialize blob->size based on blob->data size
This fixes the empty chipset file in debugfs.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds a new Kconfig option for enabling probing of N-PHYs.
This option will be removed again once the stuff works.
For now it is to help in development. This way real users won't
execute the broken N-PHY codepaths, but the developers can easily
enable N-PHY stuff.
To enable N-PHY probing simply remove the BROKEN dependency
and enable the option in the kernel config.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is just this patch (http://lkml.org/lkml/2007/7/1/51) but adapted
to the 'b44' ssb driver.
Signed-off-by: Miguel Botón <mboton@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds some definitions for the MAC Control register
and uses them.
This basically is no functional change.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove b43 PIO support.
DMA works well on all supported devices. There's no reason to use PIO.
Additionally, new devices don't support PIO in hardware anymore.
b43 PIO support is dead and unused code.
After applying this patch please do
git rm drivers/net/wireless/b43/pio.h
git rm drivers/net/wireless/b43/pio.c
to remove the main PIO support code.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes lowlevel register access for PCMCIA based devices.
The patch also adds a temporary workaround for the device mac address.
It simply adds generation of a random address. The real SPROM extraction
will follow in another patch.
The temporary workaround will be removed then, but for now it's OK.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes chip access validation for newer devices
(4318 and up, I think)
This patch fixes probing of a PCMCIA based 4318 device.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes antenna selection in b43. It adds a sanity check
for the antenna numbers we get from mac80211.
This patch depends on
ssb: Fix extraction of values from SPROM
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes extraction of some values from the SPROM.
It mainly fixes extraction of antenna related values, which
is needed for another b43 fix sent later.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Based on a patch by Miguel.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Cc: Miguel Boton <mboton.lkml@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reviewing the semaphore usage I noticed these down_interruptible calls. Most
of these aren't returning anything, so a caller can't tell if the operation
completed or not. prism54_wpa_bss_ie_get() returns zero, but it's treated as
the function failing which doesn't seem correct.
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit e4128a54d790658ab265c915e5da9153ff74af97.
On Sunday 02 December 2007 17:17:51 Michael Wu wrote:
> CCK and OFDM power levels are stored in adjacent bytes, not nibbles.
>
This turns out to be true only for rtl8180. On rtl8187, power levels are
indeed stored in nibbles, so this patch is wrong. Please revert this patch.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
keep it little-endian, update places that use its members
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
->ring_control_dma is dma_addr_t, needs conversion to little-endian
before __raw_writel()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Just leave hfa384x_info_frame as-is, don't convert in place.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Don't byteswap any fields, annotate. That has caught a bug,
BTW - will be handled in the next patch.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
don't byteswap, update users to match that, annotate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Stop byteswap-in-place in readBSSListRid(), annotate the sucker.
BTW, that had immediately found a bug - another codepath fetching
the same struct from card did _not_ byteswap, but used ->dBm the
same as everything else - host-endian. Fix in the next patch...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* store SSID_rid without conversions
* sanitize proc_SSID_on_close() (and avoid access past the end of
buffer, while we are at it)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We'd just set tfd->u.data.chunk_len[i] to cpu_to_le16(remaining_bytes);
passing it to pci_map_single() is a bad idea - it expects host-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A couple of places forgot cpu_to_le16() in assignments to
that field, even though right next to those in other branches
of if-else we do it correctly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
bugs galore:
* 0xf380 instead of htons(ETH_P_AARP), etc. Works only on l-e.
* back in 2.3.20 driver got readb() and friends instead of
direct dereferencing of iomem. Somebody got too enthusiatic and replaced
ntohs(p->mrx_overflow)
with
ntohs(read(&p->mrx_overflow)
without noticing that (a) the sucker is 16bit and (b) that expression can't possibly
be portable anyway (hell, on l-e it's always less than 256, on b-e it's always a
multiple of 256). Proper fix is
swab16(readw(&p->mrx_overflow)
taking into account the conversion done by readw() itself. That crap happened
in several places; the same fix applies.
* untranslate() assumes little-endian almost everywhere, except for
the code checking for IPX/AARP packets; there we forgot ntohs(), so that part
only works on big-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* ->exp_id in bootrec_exp_if is __le16; missing conversion in its use
* !(x & y) misspelled as !x & y
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
in writerids() we do _not_ byteswap, so we want to access
->opmode as little-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
never had been byteswapped, used as host-endian...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On big-endian we end up with swapped first two bytes in packet,
due to earlier conversion to host-endian and forgotten conversion
back.
The code we calculated that host-endian for had been duplicated
several time - it finds the 802.11 MAC header length by the first
two bytes of packet; taken into a new helper (header_len(__le16 ctl)).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
airo_translate_scan() reads BSSListRid directly, does _not_ byteswap
and uses ->dBm (__le16) as host-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
a) gaplen would better be stored little-endian
b) for control packets (shorter than 24-byte header) we ended up with
bap_write(ai, hdrlen == 30 ?
(const u16*)&gap.gaplen : (const u16*)&gap, 38 - hdrlen, BAP1);
passing to card the data past the end of gap (i.e. random stuff from stack)
and did _not_ feed the gaplen at the right offset.
c) sending the contents of uninitialized fields of struct is Not Nice(tm) either
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make it match the on-the-wire endianness, eliminate byteswapping.
The only driver that used this sucker (ipw2200) updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the lower device's carrier is off, the macvlan devices's
carrier state should be checked to decide whether it needs to
be turned off. Currently the lower device's state is checked
a second time.
This still works, but unnecessarily tries to turn off the
carrier when its already off.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes IrPORT and the old dongle drivers (all off them
have replacement drivers).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all non TAH equipped 4xx PPC's call emac_start_xmit() upon
xmit. This routine doesn't check if the frame length exceeds the max.
MAL buffer size.
This patch now changes the driver to call emac_start_xmit_sg() on all
GigE platforms and not only the TAH equipped ones (440GX). This enables
an MTU of 9000 instead 4080.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>