linux

Commit Graph

Author	SHA1	Message	Date
Eric W. Biederman	5234f5eb04	[PATCH] kexec: x86_64 kexec implementation This is the x86_64 implementation of machine kexec. 32bit compatibility support has been implemented, and machine_kexec has been enhanced to not care about the changing internal kernel paget table structures. From: Alexander Nyberg <alexn@dsv.su.se> build fix Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:50 -07:00
Eric W. Biederman	5033cba087	[PATCH] kexec: x86 kexec core This is the i386 implementation of kexec. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:49 -07:00
Eric W. Biederman	dc009d9243	[PATCH] kexec: add kexec syscalls This patch introduces the architecture independent implementation the sys_kexec_load, the compat_sys_kexec_load system calls. Kexec on panic support has been integrated into the core patch and is relatively clean. In addition the hopefully architecture independent option crashkernel=size@location has been docuemented. It's purpose is to reserve space for the panic kernel to live, and where no DMA transfer will ever be setup to access. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:48 -07:00
Eric W. Biederman	d0537508a9	[PATCH] kexec: x86_64: add CONFIG_PHYSICAL_START For one kernel to report a crash another kernel has created we need to have 2 kernels loaded simultaneously in memory. To accomplish this the two kernels need to built to run at different physical addresses. This patch adds the CONFIG_PHYSICAL_START option to the x86_64 kernel so we can do just that. You need to know what you are doing and the ramifications are before changing this value, and most users won't care so I have made it depend on CONFIG_EMBEDDED bzImage kernels will work and run at a different address when compiled with this option but they will still load at 1MB. If you need a kernel loaded at a different address as well you need to boot a vmlinux. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:48 -07:00
Eric W. Biederman	3d345e3fc9	[PATCH] kexec: x86: add CONFIG_PYSICAL_START For one kernel to report a crash another kernel has created we need to have 2 kernels loaded simultaneously in memory. To accomplish this the two kernels need to built to run at different physical addresses. This patch adds the CONFIG_PHYSICAL_START option to the x86 kernel so we can do just that. You need to know what you are doing and the ramifications are before changing this value, and most users won't care so I have made it depend on CONFIG_EMBEDDED bzImage kernels will work and run at a different address when compiled with this option but they will still load at 1MB. If you need a kernel loaded at a different address as well you need to boot a vmlinux. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:48 -07:00
Eric W. Biederman	60bad7fadf	[PATCH] kexec: vmlinux: fix physical addresses In vmlinux.lds.h the code is carefull to define every section so vmlinux properly reports the correct physical load address of code, as well as it's virtual address. The new SECURITY_INIT definition fails to follow that convention and and causes incorrect physical address to appear in the vmlinux if there are any security initcalls. This patch updates the SECURITY_INIT to follow the convention in the rest of the file. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:47 -07:00
Eric W. Biederman	208fb93162	[PATCH] kexec: x86_64: restore apic virtual wire mode on shutdown When coming out of apic mode attempt to set the appropriate apic back into virtual wire mode. This improves on previous versions of this patch by by never setting bot the local apic and the ioapic into veritual wire mode. This code looks at data from the mptable to see if an ioapic has an ExtInt input to make this decision. A future improvement is to figure out which apic or ioapic was in virtual wire mode at boot time and to remember it. That is potentially a more accurate method, of selecting which apic to place in virutal wire mode. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:47 -07:00
Eric W. Biederman	650927ef8a	[PATCH] kexec: x86: resture apic virtual wire mode on shutdown When coming out of apic mode attempt to set the appropriate apic back into virtual wire mode. This improves on previous versions of this patch by by never setting bot the local apic and the ioapic into veritual wire mode. This code looks at data from the mptable to see if an ioapic has an ExtInt input to make this decision. A future improvement is to figure out which apic or ioapic was in virtual wire mode at boot time and to remember it. That is potentially a more accurate method, of selecting which apic to place in virutal wire mode. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:47 -07:00
Eric W. Biederman	9635b47d91	[PATCH] kexec: x86: local apic fix From: "Maciej W. Rozycki" <macro@linux-mips.org> Fix a kexec problem whcih causes local APIC detection failure. The problem is detect_init_APIC() is called early, before the command line have been processed. Therefore "lapic" (and "nolapic") have not been seen, yet. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:46 -07:00
Eric W. Biederman	8f43d03fe2	[PATCH] kexec: x86: rename APIC_MODE_EXINT From: "Maciej W. Rozycki" <macro@linux-mips.org> Rename APIC_MODE_EXINT to APIC_MODE_EXTINT - I think it should be named after what the mode is called in documentation. From: "Eric W. Biederman" <ebiederm@lnxi.com> I have reduced this patch to just the name change in the header. And integrated the changes into the patches that add those lines. Otherwise I ran into some ugly dependencies. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:46 -07:00
Ingo Molnar	f8cbd99bd3	[PATCH] sched: voluntary kernel preemption This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The 3 models can be selected from a new menu: (X) No Forced Preemption (Server) ( ) Voluntary Kernel Preemption (Desktop) ( ) Preemptible Kernel (Low-Latency Desktop) we still default to the stock (Server) preemption model. Voluntary preemption works by adding a cond_resched() (reschedule-if-needed) call to every might_sleep() check. It is lighter than CONFIG_PREEMPT - at the cost of not having as tight latencies. It represents a different latency/complexity/overhead tradeoff. It has no runtime impact at all if disabled. Here are size stats that show how the various preemption models impact the kernel's size: text data bss dec hex filename 3618774 547184 179896 4345854 424ffe vmlinux.stock 3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2% 3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5% voluntary-preempt is +0.2% of .text, preempt is +3.5%. This feature has been tested for many months by lots of people (and it's also included in the RHEL4 distribution and earlier variants were in Fedora as well), and it's intended for users and distributions who dont want to use full-blown CONFIG_PREEMPT for one reason or another. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:45 -07:00
Dinakar Guniguntala	1a20ff27ef	[PATCH] Dynamic sched domains: sched changes The following patches add dynamic sched domains functionality that was extensively discussed on lkml and lse-tech. I would like to see this added to -mm o The main advantage with this feature is that it ensures that the scheduler load balacing code only balances against the cpus that are in the sched domain as defined by an exclusive cpuset and not all of the cpus in the system. This removes any overhead due to load balancing code trying to pull tasks outside of the cpu exclusive cpuset only to be prevented by the tasks' cpus_allowed mask. o cpu exclusive cpusets are useful for servers running orthogonal workloads such as RT applications requiring low latency and HPC applications that are throughput sensitive o It provides a new API partition_sched_domains in sched.c that makes dynamic sched domains possible. o cpu_exclusive cpusets sets are now associated with a sched domain. Which means that the users can dynamically modify the sched domains through the cpuset file system interface o ia64 sched domain code has been updated to support this feature as well o Currently, this does not support hotplug. (However some of my tests indicate hotplug+preempt is currently broken) o I have tested it extensively on x86. o This should have very minimal impact on performance as none of the fast paths are affected Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:45 -07:00
Nick Piggin	476d139c21	[PATCH] sched: consolidate sbe sbf Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:44 -07:00
Nick Piggin	4866cde064	[PATCH] sched: cleanup context switch locking Instead of requiring architecture code to interact with the scheduler's locking implementation, provide a couple of defines that can be used by the architecture to request runqueue unlocked context switches, and ask for interrupts to be enabled over the context switch. Also replaces the "switch_lock" used by these architectures with an oncpu flag (note, not a potentially slow bitflag). This eliminates one bus locked memory operation when context switching, and simplifies the task_running function. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:43 -07:00
Nick Piggin	687f1661d3	[PATCH] sched: sched tuning Do some basic initial tuning. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:42 -07:00
Nick Piggin	68767a0ae4	[PATCH] sched: schedstats update for balance on fork Add SCHEDSTAT statistics for sched-balance-fork. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:42 -07:00
Nick Piggin	147cbb4bbe	[PATCH] sched: balance on fork Reimplement the balance on exec balancing to be sched-domains aware. Use this to also do balance on fork balancing. Make x86_64 do balance on fork over the NUMA domain. The problem that the non sched domains aware blancing became apparent on dual core, multi socket opterons. What we want is for the new tasks to be sent to a different socket, but more often than not, we would first load up our sibling core, or fill two cores of a single remote socket before selecting a new one. This gives large improvements to STREAM on such systems. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:42 -07:00
Nick Piggin	cafb20c1f9	[PATCH] sched: no aggressive idle balancing Remove the very aggressive idle stuff that has recently gone into 2.6 - it is going against the direction we are trying to go. Hopefully we can regain performance through other methods. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:42 -07:00
Nick Piggin	7897986bad	[PATCH] sched: balance timers Do CPU load averaging over a number of different intervals. Allow each interval to be chosen by sending a parameter to source_load and target_load. 0 is instantaneous, idx > 0 returns a decaying average with the most recent sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased). So generally a higher number will result in more conservative balancing. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:41 -07:00
Paul E. McKenney	b2b1866006	[PATCH] RCU: clean up a few remaining synchronize_kernel() calls 2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not all) in comments. This patch changes these synchronize_kernel() calls (and comments) to synchronize_rcu() or synchronize_sched() as follows: - arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to handle races with machine-check exceptions (synchronize_rcu() would not cut it given RCU implementations intended for hardcore realtime use. - drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to handle races with i8042_interrupt() interrupt handler. Again, synchronize_rcu() would not cut it given RCU implementations intended for hardcore realtime use. - include/*/kdebug.h comments: change to synchronize_sched() to handle races with NMIs. As before, synchronize_rcu() would not cut it... - include/linux/list.h comment: change to synchronize_rcu(), since this comment is for list_del_rcu(). - security/keys/key.c unregister_key_type(): change to synchronize_rcu(), since this is interacting with RCU read side. - security/keys/process_keys.c install_session_keyring(): change to synchronize_rcu(), since this is interacting with RCU read side. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:38 -07:00
Michael Holzheu	66a464dbc8	[PATCH] s390: debug feature changes This patch changes the memory allocation method for the s390 debug feature. Trace buffers had been allocated using the get_free_pages() function before. Therefore it was not possible to get big memory areas in a running system due to memory fragmentation. Now the trace buffers are subdivided into several subbuffers with pagesize. Therefore it is now possible to allocate more memory for the trace buffers and more trace records can be written. In addition to that, dynamic specification of the size of the trace buffers is implemented. It is now possible to change the size of a trace buffer using a new debugfs file instance. When writing a number into this file, the trace buffer size is changed to 'number * pagesize'. In the past all the traces could be obtained from userspace by accessing files in the "proc" filesystem. Now with debugfs we have a new filesystem which should be used for debugging purposes. This patch moves the debug feature from procfs to debugfs. Since the interface of debug_register() changed, all device drivers, which use the debug feature had to be adjusted. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:37 -07:00
Christian Borntraeger	6b979de395	[PATCH] s390: add vmcp interface Add interface to issue VM control program commands. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:37 -07:00
Heiko Carstens	77fa22450d	[PATCH] s390: improved machine check handling Improved machine check handling. Kernel is now able to receive machine checks while in kernel mode (system call, interrupt and program check handling). Also register validation is now performed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:37 -07:00
Paolo 'Blaisorblade' Giarrusso	84dd8d7e9c	[PATCH] uml: add profile_pc for i386 Cope with a conditional i386 definition, which is wrong for UML. Before we just used that one, but it wasn't defined for CONFIG_SMP, so in that case we got link errors. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:36 -07:00
Pavel Machek	8d783b3e02	[PATCH] swsusp: clean assembly parts This patch fixes register saving so that each register is only saved once, and adds missing saving of %cr8 on x86-64. Some reordering so that save/restore is more logical/safer (segment registers should be restored after gdt). Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:33 -07:00
Pavel Machek	620b032764	[PATCH] properly stop devices before poweroff Without this patch, Linux provokes emergency disk shutdowns and similar nastiness. It was in SuSE kernels for some time, IIRC. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:33 -07:00
Li Shaohua	5a72e04df5	[PATCH] suspend/resume SMP support Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4 SMP patch. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:32 -07:00
Ashok Raj	884d9e40b4	[PATCH] x86_64: Dont use broadcast shortcut to make it cpu hotplug safe. Broadcast IPI's provide un-expected behaviour for cpu hotplug. CPU's in offline state also end up receiving the IPI. Once the cpus become online they receive these stale IPI's which are bad and introduce unexpected behaviour. This is easily avoided by not sending a broadcast and addressing just the CPU's in online map. Doing prelim cycle counts it appears there is no big overhead and numbers seem around 0x3000-0x3900 on an average on x86 and x86_64 systems with CPUS running 3G, both for broadcast and mask version of the API's. The shortcuts are useful only for flat mode (where the perf shows no degradation), and in cluster mode, its unicast anyway. Its simpler to just not use broadcast anymore. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Acked-by: Andi Kleen <ak@muc.de> Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:31 -07:00
Ashok Raj	76e4f660d9	[PATCH] x86_64: CPU hotplug support Experimental CPU hotplug patch for x86_64 ----------------------------------------- This supports logical CPU online and offline. - Test with maxcpus=1, and then kick other cpu's off to test if init code is all cleaned up. CONFIG_SCHED_SMT works as well. - idle threads are forked on demand from keventd threads for clean startup TBD: 1. Not tested on a real NUMA machine (tested with numa=fake=2) 2. Handle ACPI pieces for physical hotplug support. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Acked-by: Andi Kleen <ak@muc.de> Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Shaohua.li<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:30 -07:00
Ashok Raj	e6982c671c	[PATCH] x86_64: Change init sections for CPU hotplug support This patch adds __cpuinit and __cpuinitdata sections that need to exist past boot to support cpu hotplug. Caveat: This is done only for EM64T CPU Hotplug support, on request from Andi Kleen. Much of the generic hotplug code in kernel, and none of the other archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont mark sections with __cpuinit, but only mark them as __devinit, and __devinitdata. If someone is motivated to change generic code, we need to make sure all existing hotplug code does not break, on other arch's that dont use __cpuinit, and __cpudevinit. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Acked-by: Andi Kleen <ak@muc.de> Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:30 -07:00
Ashok Raj	52a119feaa	[PATCH] make smp_prepare_cpu to a weak function I really wish smp_prepare_cpu() would disappear eventually. In the interim this is ideally a weak function, so we dont end up changing several places to define this dummy in headers. Today since the dummy declaration is done only in drivers/base/cpu.c but the function is called in kernel/power/smp.c i get undefined reference in my cpu hotplug code for x86_64 under development. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:30 -07:00
Li Shaohua	e1367daf3e	[PATCH] cpu state clean after hot remove Clean CPU states in order to reuse smp boot code for CPU hotplug. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:30 -07:00
Li Shaohua	6fe940d6c3	[PATCH] sep initializing rework Make SEP init per-cpu, so it is hotplug safe. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:29 -07:00
Ashok Raj	67664c8f7e	[PATCH] i386: Dont use IPI broadcast when using cpu hotplug. This patch introduces a startup parameter no_broadcast. When we enable CONFIG_HOTPLUG_CPU, we dont want to use broadcast shortcut as it has ill effects on a offline cpu. If we issue broadcast, the IPI is also delivered to offline cpus, or partially up cpu causing stale IPI's to be handled, which is a problem and can cause undesirable effects. Introduces a new startup cmdline option no_ipi_broadcast, that can be switched at cmdline if necessary. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Acked-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:29 -07:00
Zwane Mwaikambo	f370513640	[PATCH] i386 CPU hotplug (The i386 CPU hotplug patch provides infrastructure for some work which Pavel is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua <shaohua.li@intel.com> is doing) The following provides i386 architecture support for safely unregistering and registering processors during runtime, updated for the current -mm tree. In order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being that on cpu offline, fixup_irqs() is called before we clear the cpu from cpu_online_map and a long delay in order to ensure that we never have any queued external interrupts on the APICs. There are additional changes to s390 and ppc64 to account for this change. 1) Add CONFIG_HOTPLUG_CPU 2) disable local APIC timer on dead cpus. 3) Disable preempt around irq balancing to prevent CPUs going down. 4) Print irq stats for all possible cpus. 5) Debugging check for interrupts on offline cpus. 6) Hacky fixup_irqs() to redirect irqs when cpus go off/online. 7) play_dead() for offline cpus to spin inside. 8) Handle offline cpus set in flush_tlb_others(). 9) Grab lock earlier in smp_call_function() to prevent CPUs going down. 10) Implement __cpu_disable() and __cpu_die(). 11) Enable local interrupts in cpu_enable() after fixup_irqs() 12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus. 13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline. Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:29 -07:00
Kumar Gala	3d9077afea	[PATCH] ppc32: Remove FSL OCP support Support for the OCP device model on Freescale (FSL) PPC's is no longer used. All FSL PPC's that were using OCP have be converted to using the platform device model. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:27 -07:00
Kumar Gala	33d9e9b56d	[PATCH] ppc32: Add support for Freescale e200 (Book-E) core The e200 core is a Book-E core (similar to e500) that has a unified L1 cache and is not cache coherent on the bus. The e200 core also adds a separate exception level for debug exceptions. Part of this patch helps to cleanup a few cases that are true for all Freescale Book-E parts, not just e500. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:26 -07:00
Yoichi Yuasa	b4819b5937	[PATCH] mips: add MIPS-specific support for flatmem/discontigmem 2.6.12-git6 doesn't boot on some MIPS machines. They need the support of flat memory and discontig memory. Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:25 -07:00
Dmitry Torokhov	e70c9d5e61	[PATCH] I8K: use standard DMI interface I8K: Change to use stock dmi infrastructure instead of homegrown parsing code. The driver now requires box's DMI data to match list of supported models so driver can be safely compiled-in by default without fear of it poking into random SMM BIOS code. DMI checks can be ignored with i8k.ignore_dmi option. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 16:24:24 -07:00
Linus Torvalds	4208ff04a2	Merge master.kernel.org:/home/rmk/linux-2.6-arm	2005-06-25 16:03:08 -07:00
Russell King	8749af6821	[PATCH] ARM: Generic Dynamic Tick Timer support for ARM, take 4 This patch adds support for Dynamic Tick Timer for ARM. Dynamic Tick is also known as VST (Variable Scheduling Timeouts). Dynamic Tick has been in use in the OMAP tree since last October. The patch is not intrusive, and does not do anything unless CONFIG_NO_IDLE_HZ is defined. This patch has the following fixed based on comments from RMK: - Time is updated before calling interrupt handlers. - Added new interrupt flag SA_TIMER to avoid duplicate timer interrupts - Moved struct dyn_tick_timer to time.h until we at some point probably have an arch independent dyn-tick.h - Cleaned up testing for DYN_TICK_ENABLED in irq.c I've cleaned up this patch to fix some remaining issues: - Call the timer tick handler with irqs disabled, as it would be from a normal interrupt - if we have a dyn_tick, we better implement all methods. - generic timer_dyn_reprogram() call, to be called before sleeping - added command line option - "dyntick=" to allow boot-time control of this feature -- rmk Signed-off-by: Tony Lindgren Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-25 19:39:45 +01:00
Lennert Buytenhek	321ab6a5fa	[PATCH] ARM: 2752/1: disable ixp2000 PCI I/O software workaround on chips that don't need it Patch from Lennert Buytenhek The later ixp2000 models don't need the PCI I/O workaround that we currently perform. Add a config option to disable the workaround, and panic on boot if a kernel without the workaround is booted on a buggy chip. As only pre-production ixp2000s need the workaround, the default is for it not to be configured in. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-25 19:30:04 +01:00
David S. Miller	e55c57e0b5	[SPARC64]: Report any user access faults in termios accessors. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 20:11:03 -07:00
Linus Torvalds	f647a27417	Merge master.kernel.org:/home/rmk/linux-2.6-arm	2005-06-24 15:32:01 -07:00
Lennert Buytenhek	2966207c7e	[PATCH] ARM: 2748/1: ixp2000 implementation of the iomap api Patch from Lennert Buytenhek A number of ixp2000 models have a bug where the byte lanes for PCI I/O transactions are swapped. We already work around this in our versions of {in,out}{b,w,l}, but we also need to perform these workarounds in a custom implementation of the new iomap API, provided in this patch. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-24 23:11:31 +01:00
Lennert Buytenhek	7533fca8e8	[PATCH] ARM: 2747/1: allow platforms to provide their own iomap implementation Patch from Lennert Buytenhek This patch conditionalises the io{read,write}{8,16,32} defines and the prototypes for ioport_map/ioport_unmap in asm-arm/io.h on ioread8 not already having been defined. This is done so that platforms can provide their own implementation of the iomap API, ixp2000 for example needs this. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-24 23:11:31 +01:00
Alexey Dobriyan	75043cb5b3	[PATCH] fs/qnx4/*: fix sparse warnings This patch fixes sparse warnings in the qnx4fs (and might even make qnx4fs work on big-endian boxes) Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Anders Larsen <al@alarsen.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 14:14:24 -07:00
Deepak Saxena	5932ae3f5d	[PATCH] ARM: 2745/1: Fix IXP4xx debug macros Patch from Deepak Saxena Current IXP4xx debug macros do not work in the small window between the MMU being enabled and the call to map_io() b/c the standard peripheral mapping is not properly setup for use with the low-level debug code. This patch creates a new section-aligned mapping for the UART specifically for use with the debug macros. Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-24 20:54:35 +01:00
Lennert Buytenhek	c4982887ca	[PATCH] ARM: 2744/1: ixp2000 gpio irq support Patch from Lennert Buytenhek This patch cleans up the ixp2000 gpio irq code and implements the set_irq_type method for gpio irqs so that users can select for which events (falling edge/rising edge/level low/level high) on the gpio pin they want the corresponding gpio irq to be triggered. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-24 20:54:35 +01:00
Lennert Buytenhek	c6b56949de	[PATCH] ARM: 2740/1: ixp2000 align{b,w} need to parenthesize their arguments Patch from Lennert Buytenhek Two macros that are used on the ixp2000 to fixup byte lane enables for I/O space accesses, align{b,w}, use their arguments without parenthesizing them. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2005-06-24 20:54:34 +01:00
Linus Torvalds	793ae77469	Add "memory" clobbers to the x86 inline asm of strncmp and friends They don't actually clobber memory, but gcc doesn't even know they _read_ memory, so can apparently re-order memory accesses around them. Which obviously does the wrong thing if the memory access happens to change the memory that the compare function is accessing.. Verified to fix a strange boot problem by Jens Axboe.	2005-06-24 10:39:17 -07:00
Linus Torvalds	59a49e3871	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-24 00:31:46 -07:00
Adrian Bunk	52c1da3953	[PATCH] make various thing static Another rollup of patches which give various symbols static scope Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:43 -07:00
Carsten Otte	eb6fe0c388	[PATCH] xip: reduce code duplication This patch reworks filemap_xip.c with the goal to reduce code duplication from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of implementing the aio functions, this one implements the synchronous read/write functions only. For readv and writev, the generic fallback is used. For aio, we rely on the application doing the fallback. Since our "synchronous" function does memcpy immediately anyway, there is no performance difference between using the fallbacks or implementing each operation. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:41 -07:00
Carsten Otte	6d79125bba	[PATCH] xip: ext2: execute in place These are the ext2 related parts. Ext2 now uses the xip_* file operations along with the get_xip_page aop when mounted with -o xip. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:41 -07:00
Carsten Otte	ceffc07852	[PATCH] xip: fs/mm: execute in place - generic_file* file operations do no longer have a xip/non-xip split - filemap_xip.c implements a new set of fops that require get_xip_page aop to work proper. all new fops are exported GPL-only (don't like to see whatever code use those except GPL modules) - __xip_unmap now uses page_check_address, which is no longer static in rmap.c, and defined in linux/rmap.h - mm/filemap.h is now much more clean, plainly having just Linus' inline funcs moved here from filemap.c - fix includes in filemap_xip to make it build cleanly on i386 Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:41 -07:00
Carsten Otte	420edbcc09	[PATCH] xip: bdev: execute in place This is the block device related part. The block device operation direct_access now has a struct block_device as first parameter. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:41 -07:00
Matt Domsch	c988d2b284	[PATCH] modules: add version and srcversion to sysfs This patch adds version and srcversion files to /sys/module/${modulename} containing the version and srcversion fields of the module's modinfo section (if present). /sys/module/e1000 \|-- srcversion `-- version This patch differs slightly from the version posted in January, as it now uses the new kstrdup() call in -mm. Why put this in sysfs? a) Tools like DKMS, which deal with changing out individual kernel modules without replacing the whole kernel, can behave smarter if they can tell the version of a given module. The autoinstaller feature, for example, which determines if your system has a "good" version of a driver (i.e. if the one provided by DKMS has a newer verson than that provided by the kernel package installed), and to automatically compile and install a newer version if DKMS has it but your kernel doesn't yet have that version. b) Because sysadmins manually, or with tools like DKMS, can switch out modules on the file system, you can't count on 'modinfo foo.ko', which looks at /lib/modules/${kernelver}/... actually matching what is loaded into the kernel already. Hence asking sysfs for this. c) as the unbind-driver-from-device work takes shape, it will be possible to rebind a driver that's built-in (no .ko to modinfo for the version) to a newly loaded module. sysfs will have the currently-built-in version info, for comparison. d) tech support scripts can then easily grab the version info for what's running presently - a question I get often. There has been renewed interest in this patch on linux-scsi by driver authors. As the idea originated from GregKH, I leave his Signed-off-by: intact, though the implementation is nearly completely new. Compiled and run on x86 and x86_64. From: Matthew Dobson <colpatch@us.ibm.com> build fix From: Thierry Vignaud <tvignaud@mandriva.com> build fix From: Matthew Dobson <colpatch@us.ibm.com> warning fix Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:40 -07:00
Mauro Carvalho Chehab	ac19ecc6fa	[PATCH] v4l: update for SAA7134 cards This patch adds support for various SAA7134 cards and brings some fixes. Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br> Signed-off-by: Fabrice Aeschbacher <fabrice.aeschbacher@laposte.net> Signed-off-by: Hermann Pitton <hermann.pitton@onlinehome.de>. Signed-off-by: Nickolay V Shmyrev <nshmyrev@yandex.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:39 -07:00
Mauro Carvalho Chehab	56fc08ca37	[PATCH] v4l: update for tuner cards and some V4L chips Tuner improvements and additions. TEA5767 FM tuner added. Several small fixes. Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br> Signed-off-by: Nickolay V Shmyrev <nshmyrev@yandex.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:39 -07:00
Michael Krufky	3c1d0185db	[PATCH] v4l: support tuner for Thomson DDT 7611 (ATSC/NTSC) Add support for tuner#60: Thomson DDT 7611 (ATSC/NTSC) Change tuner in card#28 (DViCO FusionHDTV3 Gold-T) from tuner=52 (Tuner Thomson DDT 7610) to tuner=60 (Tuner Thomson DDT 7611) Signed-off-by: Michael Krufky <mkrufky@m1k.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:38 -07:00
Manuel Capinha	239df2e2b0	[PATCH] v4l: add support for PixelView Ultra Pro The following patch adds support for the PixelView Ultra Pro video capture card in v4l. - It removes the remote control key definitions from ir-kbd-gpio.c and moves them to ir-common.c so that they can be shared between bt878 and cx88 based cards. - The patch also moves the FUSIONHDTV_3_GOLD_Q card from number 27 to 28 to regain compatibility with the V4L cvs. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:38 -07:00
NeilBrown	0964a3d3f1	[PATCH] knfsd: nfsd4 reboot dirname fix Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir. It may be changed any time, but is used only on startup. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:36 -07:00
NeilBrown	c7b9a45927	[PATCH] knfsd: nfsd4: reboot recovery This patch adds the code to create and remove client subdirectories from the recovery directory, as described in the previous patch comment. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:36 -07:00
NeilBrown	190e4fbf96	[PATCH] knfsd: nfsd4: initialize recovery directory NFSv4 clients are required to know what state they have on the server so that they can reclaim it on server reboot. However, it is possible for pathalogical combinations of server reboots and network partitions to leave a client in a state where it cannot know whether it has lost its state on the server. For this reason, rfc3530 requires that we store some information about clients to stable storage. So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for each client with active state. We leave open the possibility of including files underneath each such subdirectory with information about the client, but for now the subdirectories are empty. We create a client subdirectory whenever a client makes its first non-reclaim open_confirm. We remove a client subdirectory whenever either a) its lease expires, or b) the grace period ends without it reclaiming anything. When handling reclaims, we allow the reclaim if and only if the client doing the reclaim has a subdirectory. This patch adds just the code to scan the recovery directory on nfsd startup. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:35 -07:00
NeilBrown	cb36d63457	[PATCH] knfsd: nfsd4: remove cb_parsed The cb_parsed field is only used by probe_callback, to determine whether the callback information has been filled in by setclientid. But there is no way that probe_callback() can be called without that having already happened, so that check is superfluous, as is cb_parsed. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:35 -07:00
NeilBrown	ea1da636e9	[PATCH] knfsd: nfsd4: rename state list fields Trivial renaming patch: I can never remember, while looking at various lists relating the nfsd4 state structures, which are the "heads" and which are items on other lists, or which structures are actually on the various lists. The following convention helps me: given structures foo and bar, with foo containing the head of a list of bars, use "bars" for the name of the head of the list contained in the struct foo, and use "per_foo" for the entries in the struct bars. Already done for struct nfs4_file; go ahead and do it for the other nfsd4 state structures. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:35 -07:00
NeilBrown	fd39ca9a80	[PATCH] knfsd: nfsd4: make needlessly global code static This patch contains the following possible cleanups: - make needlessly global code static Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:33 -07:00
NeilBrown	a55370a3c0	[PATCH] knfsd: nfsd4: reboot hash For the purposes of reboot recovery we keep a directory with subdirectories each having a name that is the ascii hex representation of the md5 sum of a client identifier for an active client. This adds the code to calculate that name. We also use it for the purposes of comparing clients, so if someone ever manages to find two client names that are md5 collisions, then we'll return clid_inuse to the second. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:33 -07:00
NeilBrown	bd0b1e954e	[PATCH] knfsd: nfsd4: idmap initialization Adopt standard kernel style by defining a no-op function instead of putting ifdef's in the code where the function is called. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:32 -07:00
NeilBrown	ac4d8ff2a5	[PATCH] knfsd: nfsd4: clean up state initialization Separate out stuff that needs initialization on startup from stuff that only needs initialization on module init from static data. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:32 -07:00
NeilBrown	76a3550ec5	[PATCH] knfsd: nfsd4: rename nfs4_state_init Somewhat gratuitous rename to simplify following patch. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:31 -07:00
NeilBrown	7b190fecfa	[PATCH] knfsd: nfsd4: delegation recovery Allow recovery of delegations after reboot. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:31 -07:00
NeilBrown	13cd21845d	[PATCH] nfsd4: reference count struct nfs4_file Add a struct kref to each nfs4_file and take a reference to it from each stateid and delegation that refers to it. The atomicity guarantees are overkill given that all this stuff is done under the single nfsd4 state lock, but a) we'd like finer-grained locking some day, and b) this simplifies the cleanup of the structures a bit, something that has previously been a bit complicated and bug-prone. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:30 -07:00
NeilBrown	8beefa2493	[PATCH] nfsd4: rename nfs4_file fields Trivial renaming patch: I can never remember, while looking at various lists relating the nfsd4 state structures, which are the "heads" and which are items on other lists, or which structures are actually on the various lists. The following convention helps me: given structures foo and bar, with foo containing the head of a list of bars, use "bars" for the name of the head of the list contained in the struct foo, and use "per_foo" for the entries in the struct bars. Go ahead and do this for struct nfs4_file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:29 -07:00
NeilBrown	496400014f	[PATCH] nfsd4: fix fh_expire_type We're returning NFS4_FH_NOEXPIRE_WITH_OPEN \| NFS4_FH_VOL_RENAME for the fh_expire_type attribute. This is incorrect: 1. The spec actually only allows NOEXPIRE_WITH_OPEN when VOLATILE_ANY is also set. 2. Filehandles for open files can expire, if the file is removed and there is a reboot. 3. Filehandles are only volatile on rename in the nosubtree check case. Unfortunately, there's no way to indicate that we only expire on remove. So our only choice is FH4_VOLATILE_ANY. Although it's redundant, we also set FH4_VOL_RENAME in the subtree check case, since subtreecheck does actually cause problems in practice and it seems possibly useful to give clients some way to distinguish that case. Fix a mispelled #define while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:28 -07:00
Mauro Carvalho Chehab	391cd727ea	[PATCH] tuner-core.c improvments and Ymec Tvision TVF8533MF support tuner-core.c, tuner.h: - tuner-core changed to support multiple I2C devices used on some adapters; - Kconfig now has an option (CONFIG_TUNER_MULTI_I2C) to enable this new behavor; - By default, even enabling CONFIG_TUNER_MULTI_I2C, tuner-core emulates the old behavor, using first I2C device for both FM and TV; - There is a new i2c command (TUNER_SET_ADDR) to allow tuner clients to select I2C address for FM or TV tuner; - Tuner I2C dettach now generates a warning on syslog if failed. tuner-simple.c: - TVision TVF-8531MF and TVF-5533 MF tuner included. It uses, by default, I2C on 0xC2 address for TV and on 0xC0 for Radio. Both TV and FM Radio mode are working. Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:31 -07:00
Markus Lidel	f33213ecf4	[PATCH] I2O: Lindent run and replacement of printk through osm printing functions Lindent run and replaced printk() through the corresponding osm_*() function Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:29 -07:00
Markus Lidel	9e87545f06	[PATCH] I2O: second code cleanup of sparse warnings and unneeded syncronization Changes: - Added header "core.h" for i2o_core.ko internal definitions - More sparse fixes - Changed display of TID's in sysfs attributes from XXX to 0xXXX - Use the right functions for accessing I/O and normal memory - Removed error handling of SCSI device errors and let the SCSI layer take care of it - Added new device / removed device handling to SCSI-OSM - Make status access volatile - Cleaned up activation of I2O controller - Removed unnecessary wmb() and rmb() calls - Use own struct i2o_io for I/O memory instead of struct i2o_dma Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:29 -07:00
Markus Lidel	b2aaee33fb	[PATCH] I2O: Adaptec specific SG_IO access, firmware access through sysfs and 2400A workaround Changes: - Provide SG_IO access to BLOCK and EXECUTIVE class on Adaptec controllers - Use PRIVATE messages in SCSI-OSM because on some controllers normal SCSI class commands like READ or READ CAPACITY cause errors - Use new DMA and SG list creation function - Added workaround to limit sectors per request for Adaptec 2400A controllers Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:28 -07:00
Markus Lidel	f10378fff6	[PATCH] I2O: new sysfs attributes and Adaptec specific block device access and 64-bit DMA support Changes: - Added Bus-OSM which could be used by user space programs to reset a channel on the controller - Make ioctl's in Config-OSM obsolete in prefer for sysfs attributes and move those to its own file - Added sysfs attribute for firmware read and write access for I2O controllers - Added special handling of firmware read and write access for Adaptec controllers - Added vendor id and product id as sysfs-attribute to Executive classes - Added automatic notification of LCT change handling to Exec-OSM - Added flushing function to Block-OSM for later barrier implementation - Use PRIVATE messages for Block access on Adaptec controllers, which are faster then BLOCK class access - Cleaned up support for Promise controller - New messages are now detected using the IRQ status register as suggested by the I2O spec - Added i2o_dma_high() and i2o_dma_low() functions - Added facility for SG tablesize calculation when using 32-bit and 64-bit DMA addresses - Added i2o_dma_map_single() and i2o_dma_map_sg() which could build the SG list for 32-bit as well as 64-bit DMA addresses Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:28 -07:00
Markus Lidel	f88e119c4b	[PATCH] I2O: first code cleanup of spare warnings and unused functions Changes: - Removed unnecessary checking of NULL before calling kfree() - Make some functions static - Changed pr_debug() into osm_debug() - Use i2o_msg_in_to_virt() for getting a pointer to the message frame - Cleaned up some comments - Changed some le32_to_cpu() into readl() where necessary - Make error messages of OSM's look the same - Cleaned up error handling in i2o_block_end_request() - Removed unused error handling of failed messages in Block-OSM, which are not allowed by the I2O spec - Corrected the blocksize detection in i2o_block - Added hrt and lct sysfs-attribute to controller - Call done() function in SCSI-OSM after freeing DMA buffers - Removed unneeded variable for message size calculation in i2o_scsi_queuecommand() - Make some changes to remove sparse warnings - Reordered some functions - Cleaned up controller initialization - Replaced some magic numbers by defines - Removed unnecessary dma_sync_single_for_cpu() call on coherent DMA - Removed some unused fields in i2o_controller and removed some unused functions Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:28 -07:00
Markus Lidel	61fbfa8129	[PATCH] I2O: bugfixes and compability enhancements Changes: - Fixed sysfs bug where user and parent links where added to the I2O device itself - Fixed bug when calculating TID for the event handler and cleaned up the workflow of i2o_driver_dispatch() - Fixed oops when no I2O device could be found for an event delivered to Exec-OSM - Fixed initialization of spinlock in Exec-OSM - Fixed memory leak in i2o_cfg_passthru() and i2o_cfg_passthru() - Removed MTRR support - Added PCI ID of Promise SX6000 with firmware >= 1.20.x.x - Turn of caching for ioremapped memory of in_queue - Added initialization sequence for Promise controllers - Moved definition of u8 / u16 / u32 for raidutils before first use Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:28 -07:00
Kylene Hall	a6df7da8f7	[PATCH] tpm: TPMs on additional LPC bus Add support for TPMs on additional LPC buses. Signed-off-by: Kylene Hall <kjhall@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:27 -07:00
Corey Minyard	3b6259432d	[PATCH] ipmi: add power cycle capability This patch to adds "power cycle" functionality to the IPMI power off module ipmi_poweroff. It also contains changes to support procfs control of the feature. The power cycle action is considered an optional chassis control in the IPMI specification. However, it is definitely useful when the hardware supports it. A power cycle is usually required in order to reset a firmware in a bad state. This action is critical to allow remote management of servers. The implementation adds power cycle as optional to the ipmi_poweroff module. It can be modified dynamically through the proc entry mentioned above. During a power down and enabled, the power cycle command is sent to the BMC firmware. If it fails either due to non-support or some error, it will retry to send the command as power off. Signed-off-by: Christopher A. Poblete <Chris_Poblete@dell.com> Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:23 -07:00
Chris Zankel	7282bee787	[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 8 The attached patches provides part 8 of an architecture implementation for the Tensilica Xtensa CPU series. Signed-off-by: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:22 -07:00
Chris Zankel	e344b63eee	[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 7 The attached patches provides part 7 of an architecture implementation for the Tensilica Xtensa CPU series. Signed-off-by: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:22 -07:00
Chris Zankel	9a8fd55899	[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 6 The attached patches provides part 6 of an architecture implementation for the Tensilica Xtensa CPU series. Signed-off-by: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:22 -07:00
Jan Kara	556a2a45bc	[PATCH] quota: reiserfs: improve quota credit estimates Use improved credits estimates for quota operations. Also reserve space for a quota operation in a transaction only if filesystem was mounted with some quota option. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:20 -07:00
Jan Kara	1f54587bea	[PATCH] quota: ext3: Improve quota credit estimates Use improved credits estimates for quota operations. Also reserve a space for a quota operation in a transaction only if filesystem was mounted with some quota options. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:20 -07:00
Jan Kara	4e5117ba0a	[PATCH] quota: improve credits estimates Improve estimates on the number of needed credits for quota transaction. Now we distinguish blocks that might need to be allocated and blocks that only need to be rewritten. Also we distinguish deleting of a quota structure and creating of a new one. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
Christoph Hellwig	92198f7eaa	[PATCH] pass iocb to dio_iodone_t XFS will have to look at iocb->private to fix aio+dio. No other filesystem is using the blockdev_direct_IO* end_io callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
David Howells	3e30148c3d	[PATCH] Keys: Make request-key create an authorisation key The attached patch makes the following changes: (1) There's a new special key type called ".request_key_auth". This is an authorisation key for when one process requests a key and another process is started to construct it. This type of key cannot be created by the user; nor can it be requested by kernel services. Authorisation keys hold two references: (a) Each refers to a key being constructed. When the key being constructed is instantiated the authorisation key is revoked, rendering it of no further use. (b) The "authorising process". This is either: (i) the process that called request_key(), or: (ii) if the process that called request_key() itself had an authorisation key in its session keyring, then the authorising process referred to by that authorisation key will also be referred to by the new authorisation key. This means that the process that initiated a chain of key requests will authorise the lot of them, and will, by default, wind up with the keys obtained from them in its keyrings. (2) request_key() creates an authorisation key which is then passed to /sbin/request-key in as part of a new session keyring. (3) When request_key() is searching for a key to hand back to the caller, if it comes across an authorisation key in the session keyring of the calling process, it will also search the keyrings of the process specified therein and it will use the specified process's credentials (fsuid, fsgid, groups) to do that rather than the calling process's credentials. This allows a process started by /sbin/request-key to find keys belonging to the authorising process. (4) A key can be read, even if the process executing KEYCTL_READ doesn't have direct read or search permission if that key is contained within the keyrings of a process specified by an authorisation key found within the calling process's session keyring, and is searchable using the credentials of the authorising process. This allows a process started by /sbin/request-key to read keys belonging to the authorising process. (5) The magic KEY_SPEC__KEYRING key IDs when passed to KEYCTL_INSTANTIATE or KEYCTL_NEGATE will specify a keyring of the authorising process, rather than the process doing the instantiation. (6) One of the process keyrings can be nominated as the default to which request_key() should attach new keys if not otherwise specified. This is done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_ constants. The current setting can also be read using this call. (7) request_key() is partially interruptible. If it is waiting for another process to finish constructing a key, it can be interrupted. This permits a request-key cycle to be broken without recourse to rebooting. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-Off-By: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:19 -07:00
David Howells	7888e7ff4e	[PATCH] Keys: Pass session keyring to call_usermodehelper() The attached patch makes it possible to pass a session keyring through to the process spawned by call_usermodehelper(). This allows patch 3/3 to pass an authorisation key through to /sbin/request-key, thus permitting better access controls when doing just-in-time key creation. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:18 -07:00
David Howells	76d8aeabfe	[PATCH] keys: Discard key spinlock and use RCU for key payload The attached patch changes the key implementation in a number of ways: (1) It removes the spinlock from the key structure. (2) The key flags are now accessed using atomic bitops instead of write-locking the key spinlock and using C bitwise operators. The three instantiation flags are dealt with with the construction semaphore held during the request_key/instantiate/negate sequence, thus rendering the spinlock superfluous. The key flags are also now bit numbers not bit masks. (3) The key payload is now accessed using RCU. This permits the recursive keyring search algorithm to be simplified greatly since no locks need be taken other than the usual RCU preemption disablement. Searching now does not require any locks or semaphores to be held; merely that the starting keyring be pinned. (4) The keyring payload now includes an RCU head so that it can be disposed of by call_rcu(). This requires that the payload be copied on unlink to prevent introducing races in copy-down vs search-up. (5) The user key payload is now a structure with the data following it. It includes an RCU head like the keyring payload and for the same reason. It also contains a data length because the data length in the key may be changed on another CPU whilst an RCU protected read is in progress on the payload. This would then see the supposed RCU payload and the on-key data length getting out of sync. I'm tempted to drop the key's datalen entirely, except that it's used in conjunction with quota management and so is a little tricky to get rid of. (6) Update the keys documentation. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:05:18 -07:00
David S. Miller	a8acfbac75	[TCP]: Need to declare 'tcp_reno' in net/tcp.h Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 23:45:02 -07:00
Thomas Graf	d675c989ed	[PKT_SCHED]: Packet classification based on textsearch (ematch) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:58 -07:00
Thomas Graf	3fc7e8a6d8	[NET]: skb_find_text() - Find a text pattern in skb data Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:17 -07:00
Thomas Graf	677e90eda3	[NET]: Zerocopy sequential reading of skb data Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' / if abort then / abort read if we don't wait for * skb_seq_read() to return 0 / skb_abort_seq_read(&state) return endif / not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:51 -07:00
Thomas Graf	6408f79cce	[LIB]: Naive finite state machine based textsearch A finite state machine consists of n states (struct ts_fsm_token) representing the pattern as a finite automation. The data is read sequentially on a octet basis. Every state token specifies the number of recurrences and the type of value accepted which can be either a specific character or ctype based set of characters. The available type of recurrences include 1, (0\|1), [0 n], and [1 n]. The algorithm differs between strict/non-strict mode specyfing whether the pattern has to start at the first octect. Strict mode is enabled by default and can be disabled by inserting TS_FSM_HEAD_IGNORE as the first token in the chain. The runtime performance of the algorithm should be around O(n), however while in strict mode the average runtime can be better. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:16 -07:00
Thomas Graf	2de4ff7bd6	[LIB]: Textsearch infrastructure. The textsearch infrastructure provides text searching facitilies for both linear and non-linear data. Individual search algorithms are implemented in modules and chosen by the user. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:49:30 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	51b0bdedb8	[NET]: Separate two usages of netdev_max_backlog. Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:14:40 -07:00
Stephen Hemminger	31aa02c53c	[NET]: Eliminate netif_rx massive packet drops. Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:12:48 -07:00
Stephen Hemminger	34008d8c63	[NET]: Remove obsolete netif_rx congestion sensing mechanism. Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:10:00 -07:00
Stephen Hemminger	c1ebcdb8c4	[NET]: Remove obsolete fastroute stats. Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:08:59 -07:00
Linus Torvalds	16822e6205	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6	2005-06-23 17:27:24 -07:00
David Mosberger-Tang	e608a8072b	[IA64] Fix pfn_to_nid() so the kernel compiles again for !CONFIG_DISCONTIGMEM. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-06-23 14:52:51 -07:00
Stephen Hemminger	056ede6cfa	[TCP]: Report congestion control algorithm in tcp_diag. Enhancement to the tcp_diag interface used by the iproute2 ss command to report the tcp congestion control being used by a socket. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:21:28 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00
Adrian Bunk	4749f32da9	[PATCH] better USB_MON dependencies This makes the USB_MON less confusing. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 10:04:15 -07:00
Linus Torvalds	24665cd00d	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/paulus/ppc64-2.6	2005-06-23 09:49:55 -07:00
Alexey Dobriyan	bfb07599da	[PATCH] Introduce tty_unregister_ldisc() It's a bit strange to see tty_register_ldisc call in modules' exit functions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:35 -07:00
Benjamin LaHaise	c43dc2fd88	[PATCH] aio: make wait_queue ->task ->private In the upcoming aio_down patch, it is useful to store a private data pointer in the kiocb's wait_queue. Since we provide our own wake up function and do not require the task_struct pointer, it makes sense to convert the task pointer into a generic private pointer. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:34 -07:00
Christoph Hellwig	9a59f452ab	[PATCH] remove <linux/xattr_acl.h> This file duplicates <linux/posix_acl_xattr.h>, using slightly different names. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Christoph Hellwig	f9fd27a253	[PATCH] acl endianess annotations Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Christoph Lameter	45778ca819	[PATCH] Remove f_error field from struct file The following patch removes the f_error field and all checks of f_error. Trond said: f_error was introduced for NFS, and made sense when we were guaranteed always to have a file pointer around when write errors occurred. Since then, we have (for various reasons) had to introduce the nfs_open_context in order to track the file read/write state, and it made sense to move our f_error tracking there too. Signed-off-by: Christoph Lameter <christoph@lameter.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:33 -07:00
Arnd Bergmann	bb93e3a52f	[PATCH] block: add unlocked_ioctl support for block devices This patch allows block device drivers to convert their ioctl functions to unlocked_ioctl() like character devices and other subsystems. All functions that were called with the BKL held before are still used that way, but I would not be surprised if it could be removed from the ioctl functions in drivers/block/ioctl.c themselves. As a side note, I found that compat_blkdev_ioctl() acquires the BKL as well, which looks like a bug. I have checked that every user of disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so it could easily be removed from compat_blkdev_ioctl(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:32 -07:00
Stephen Rothwell	0d77e5a2c2	[PATCH] compat: introduce compat_time_t This patch is based on work by Carlos O'Donell and Matthew Wilcox. It introduces/updates the compat_time_t type and uses it for compat siginfo structures. I have built this on ppc64 and x86_64. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:32 -07:00
Daniel Ritz	fa912bcb06	[PATCH] yenta TI: turn off interrupts during card power-on #2 - make boot-up card recognition more reliable (ie. redo interrogation always if there is no valid 'card inserted' state) (and yes, i saw it happening on an o2micro controller that both CB_CBARD and CB_16BITCARD bits were set at the same time) - also redo interrogation before probing the ISA interrupts. it's safer to do the probing with the socket in a clean state. - make card insert detect more reliable. yenta_get_status() now returns SS_PENDING as long as the card is not completley inserted and one of the voltage bits is set. also !CB_CBARD doesn't mean CB_16BITCARD. there is CB_NOTACARD as well, so make an explicit check for CB_16BITCARD. - for TI bridges: disable IRQs during power-on. in all-serial and tied interrupt mode the interrupts are always disabled for single-slot controllers. for two-slot contollers the disabling is only done when the other slot is empty. to force disabling there is a new module parameter now: pwr_irqs_off=Y (which is a regression for working setups. that's why it's an option, only use when required) - modparm to disable ISA interrupt probing (isa_probe, defaults to on) - remove unneeded code/cleanups (ie. merge yenta_events() into yenta_interrupts()) Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:31 -07:00
Peter Osterlund	46c271bedd	[PATCH] Improve CD/DVD packet driver write performance This patch improves write performance for the CD/DVD packet writing driver. The logic for switching between reading and writing has been changed so that streaming writes are no longer interrupted by read requests. Signed-off-by: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:30 -07:00
Jan Beulich	11c80c8367	[PATCH] adjust per_cpu definition in non-SMP case Fix (in the architectures I'm actually building for) the UP definition of per_cpu so that the cpu specified may be any expression, not just an identifier or a suffix expression. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:28 -07:00
Yoav Zach	ef3daeda7b	[PATCH] Don't force O_LARGEFILE for 32 bit processes on ia64 In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This is problematic for execution of 32 bit processes, which are not largefile aware, either by SW emulation or by HW execution. For such processes, the problem is two-fold: 1) When trying to open a file that is larger than 4G the operation should fail, but it's not 2) Writing to offset larger than 4G should fail, but it's not The proposed patch takes advantage of the way 32 bit processes are identified in ia64 systems. Such processes have PER_LINUX32 for their personality. With the patch, the ia64 kernel will not enforce the O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior for all other architectures remains unchanged. Signed-off-by: Yoav Zach <yoav.zach@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:28 -07:00
Alan Cox	d6e7114481	[PATCH] setuid core dump Add a new `suid_dumpable' sysctl: This value can be used to query and set the core dump mode for setuid or otherwise protected/tainted binaries. The modes are 0 - (default) - traditional behaviour. Any process which has changed privilege levels or is execute only will not be dumped 1 - (debug) - all processes dump core when possible. The core dump is owned by the current user and no security is applied. This is intended for system debugging situations only. Ptrace is unchecked. 2 - (suidsafe) - any binary which normally would not be dumped is dumped readable by root only. This allows the end user to remove such a dump but not access it directly. For security reasons core dumps in this mode will not overwrite one another or other files. This mode is appropriate when adminstrators are attempting to debug problems in a normal environment. (akpm: > > +EXPORT_SYMBOL(suid_dumpable); > > EXPORT_SYMBOL_GPL? No problem to me. > > if (current->euid == current->uid && current->egid == current->gid) > > current->mm->dumpable = 1; > > Should this be SUID_DUMP_USER? Actually the feedback I had from last time was that the SUID_ defines should go because its clearer to follow the numbers. They can go everywhere (and there are lots of places where dumpable is tested/used as a bool in untouched code) > Maybe this should be renamed to `dump_policy' or something. Doing that > would help us catch any code which isn't using the #defines, too. Fair comment. The patch was designed to be easy to maintain for Red Hat rather than for merging. Changing that field would create a gigantic diff because it is used all over the place. ) Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:26 -07:00
Prasanna S Panchamukhi	ea32c65cc2	[PATCH] kprobes: Temporary disarming of reentrant probe In situations where a kprobes handler calls a routine which has a probe on it, then kprobes_handler() disarms the new probe forever. This patch removes the above limitation by temporarily disarming the new probe. When the another probe hits while handling the old probe, the kprobes_handler() saves previous kprobes state and handles the new probe without calling the new kprobes registered handlers. kprobe_post_handler() restores back the previous kprobes state and the normal execution continues. However on x86_64 architecture, re-rentrancy is provided only through pre_handler(). If a routine having probe is referenced through post_handler(), then the probes on that routine are disarmed forever, since the exception stack is gets changed after the processor single steps the instruction of the new probe. This patch includes generic changes to support temporary disarming on reentrancy of probes. Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:24 -07:00
Anil S Keshavamurthy	1674eafcbd	[PATCH] Kprobes IA64: cmp ctype unc support The current Kprobes when patching the original instruction with the break instruction tries to retain the original qualifying predicate(qp), however for cmp.crel.ctype where ctype == unc, which is a special instruction always needs to be executed irrespective of qp. Hence, if the instruction we are patching is of this type, then we should not copy the original qp to the break instruction, this is because we always want the break fault to happen so that we can emulate the instruction. This patch is based on the feedback given by David Mosberger Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:23 -07:00
Rusty Lynch	8bc76772ad	[PATCH] Kprobes ia64 cleanup A cleanup of the ia64 kprobes implementation such that all of the bundle manipulation logic is concentrated in arch_prepare_kprobe(). With the current design for kprobes, the arch specific code only has a chance to return failure inside the arch_prepare_kprobe() function. This patch moves all of the work that was happening in arch_copy_kprobe() and most of the work that was happening in arch_arm_kprobe() into arch_prepare_kprobe(). By doing this we can add further robustness checks in arch_arm_kprobe() and refuse to insert kprobes that will cause problems. Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy	cd2675bf65	[PATCH] Kprobes/IA64: support kprobe on branch/call instructions This patch is required to support kprobe on branch/call instructions. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:23 -07:00
Anil S Keshavamurthy	b2761dc262	[PATCH] Kprobes/IA64: architecture specific JProbes support This patch adds IA64 architecture specific JProbes support on top of Kprobes Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:22 -07:00
Anil S Keshavamurthy	fd7b231ff9	[PATCH] Kprobes/IA64: arch specific handling This is an IA64 arch specific handling of Kprobes Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:22 -07:00
Anil S Keshavamurthy	7213b25218	[PATCH] Kprobes/IA64: kdebug die notification mechanism As many of you know that kprobes exist in the main line kernel for various architecture including i386, x86_64, ppc64 and sparc64. Attached patches following this mail are a port of Kprobes and Jprobes for IA64. I have tesed this patches for kprobes and Jprobes and this seems to work fine. I have tested this patch by inserting kprobes on various slots and various templates including various types of branch instructions. I have also tested this patch using the tool http://marc.theaimsgroup.com/?l=linux-kernel&m=111657358022586&w=2 and the kprobes for IA64 works great. Here is list of TODO things and pathes for the same will appear soon. 1) Support kprobes on "mov r1=ip" type of instruction 2) Support Kprobes and Jprobes to exist on the same address 3) Support Return probes 3) Architecture independent cleanup of kprobes This patch adds the kdebug die notification mechanism needed by Kprobes. For break instruction on Branch type slot, imm21 is ignored and value zero is placed in IIM register, hence we need to handle kprobes for switch case zero. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Rusty Lynch <Rusty.lynch@intel.com> From: Rusty Lynch <rusty.lynch@intel.com> At the point in traps.c where we recieve a break with a zero value, we can not say if the break was a result of a kprobe or some other debug facility. This simple patch changes the informational string to a more correct "break 0" value, and applies to the 2.6.12-rc2-mm2 tree with all the kprobes patches that were just recently included for the next mm cut. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:22 -07:00
Hien Nguyen	0aa55e4d7d	[PATCH] kprobes: moves lock-unlock to non-arch kprobe_flush_task This patch moves the lock/unlock of the arch specific kprobe_flush_task() to the non-arch specific kprobe_flusk_task(). Signed-off-by: Hien Nguyen <hien@us.ibm.com> Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Rusty Lynch	7e1048b11c	[PATCH] Move kprobe [dis]arming into arch specific code The architecture independent code of the current kprobes implementation is arming and disarming kprobes at registration time. The problem is that the code is assuming that arming and disarming is a just done by a simple write of some magic value to an address. This is problematic for ia64 where our instructions look more like structures, and we can not insert break points by just doing something like: p->addr = BREAKPOINT_INSTRUCTION; The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent functions: void arch_arm_kprobe(struct kprobe p) void arch_disarm_kprobe(struct kprobe *p) and then adds the new functions for each of the architectures that already implement kprobes (spar64/ppc64/i386/x86_64). I thought arch_[dis]arm_kprobe was the most descriptive of what was really happening, but each of the architectures already had a disarm_kprobe() function that was really a "disarm and do some other clean-up items as needed when you stumble across a recursive kprobe." So... I took the liberty of changing the code that was calling disarm_kprobe() to call arch_disarm_kprobe(), and then do the cleanup in the block of code dealing with the recursive kprobe case. So far this patch as been tested on i386, x86_64, and ppc64, but still needs to be tested in sparc64. Signed-off-by: Rusty Lynch <rusty.lynch@intel.com> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Rusty Lynch	73649dab0f	[PATCH] x86_64 specific function return probes The following patch adds the x86_64 architecture specific implementation for function return probes. Function return probes is a mechanism built on top of kprobes that allows a caller to register a handler to be called when a given function exits. For example, to instrument the return path of sys_mkdir: static int sys_mkdir_exit(struct kretprobe_instance i, struct pt_regs regs) { printk("sys_mkdir exited\n"); return 0; } static struct kretprobe return_probe = { .handler = sys_mkdir_exit, }; <inside setup function> return_probe.kp.addr = (kprobe_opcode_t ) kallsyms_lookup_name("sys_mkdir"); if (register_kretprobe(&return_probe)) { printk(KERN_DEBUG "Unable to register return probe!\n"); / do error path / } <inside cleanup function> unregister_kretprobe(&return_probe); The way this works is that: At system initialization time, kernel/kprobes.c installs a kprobe on a function called kretprobe_trampoline() that is implemented in the arch/x86_64/kernel/kprobes.c (More on this later) * When a return probe is registered using register_kretprobe(), kernel/kprobes.c will install a kprobe on the first instruction of the targeted function with the pre handler set to arch_prepare_kretprobe() which is implemented in arch/x86_64/kernel/kprobes.c. * arch_prepare_kretprobe() will prepare a kretprobe instance that stores: - nodes for hanging this instance in an empty or free list - a pointer to the return probe - the original return address - a pointer to the stack address With all this stowed away, arch_prepare_kretprobe() then sets the return address for the targeted function to a special trampoline function called kretprobe_trampoline() implemented in arch/x86_64/kernel/kprobes.c * The kprobe completes as normal, with control passing back to the target function that executes as normal, and eventually returns to our trampoline function. * Since a kprobe was installed on kretprobe_trampoline() during system initialization, control passes back to kprobes via the architecture specific function trampoline_probe_handler() which will lookup the instance in an hlist maintained by kernel/kprobes.c, and then call the handler function. * When trampoline_probe_handler() is done, the kprobes infrastructure single steps the original instruction (in this case just a top), and then calls trampoline_post_handler(). trampoline_post_handler() then looks up the instance again, puts the instance back on the free list, and then makes a long jump back to the original return instruction. So to recap, to instrument the exit path of a function this implementation will cause four interruptions: - A breakpoint at the very beginning of the function allowing us to switch out the return address - A single step interruption to execute the original instruction that we replaced with the break instruction (normal kprobe flow) - A breakpoint in the trampoline function where our instrumented function returned to - A single step interruption to execute the original instruction that we replaced with the break instruction (normal kprobe flow) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Hien Nguyen	b94cce926b	[PATCH] kprobes: function-return probes This patch adds function-return probes to kprobes for the i386 architecture. This enables you to establish a handler to be run when a function returns. 1. API Two new functions are added to kprobes: int register_kretprobe(struct kretprobe rp); void unregister_kretprobe(struct kretprobe rp); 2. Registration and unregistration 2.1 Register To register a function-return probe, the user populates the following fields in a kretprobe object and calls register_kretprobe() with the kretprobe address as an argument: kp.addr - the function's address handler - this function is run after the ret instruction executes, but before control returns to the return address in the caller. maxactive - The maximum number of instances of the probed function that can be active concurrently. For example, if the function is non- recursive and is called with a spinlock or mutex held, maxactive = 1 should be enough. If the function is non-recursive and can never relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should be enough. maxactive is used to determine how many kretprobe_instance objects to allocate for this particular probed function. If maxactive <= 0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 * NR_CPUS) else maxactive=NR_CPUS) For example: struct kretprobe rp; rp.kp.addr = /* entrypoint address / rp.handler = /return probe handler / rp.maxactive = / e.g., 1 or NR_CPUS or 0, see the above explanation */ register_kretprobe(&rp); The following field may also be of interest: nmissed - Initialized to zero when the function-return probe is registered, and incremented every time the probed function is entered but there is no kretprobe_instance object available for establishing the function-return probe (i.e., because maxactive was set too low). 2.2 Unregister To unregiter a function-return probe, the user calls unregister_kretprobe() with the same kretprobe object as registered previously. If a probed function is running when the return probe is unregistered, the function will return as expected, but the handler won't be run. 3. Limitations 3.1 This patch supports only the i386 architecture, but patches for x86_64 and ppc64 are anticipated soon. 3.2 Return probes operates by replacing the return address in the stack (or in a known register, such as the lr register for ppc). This may cause __builtin_return_address(0), when invoked from the return-probed function, to return the address of the return-probes trampoline. 3.3 This implementation uses the "Multiprobes at an address" feature in 2.6.12-rc3-mm3. 3.4 Due to a limitation in multi-probes, you cannot currently establish a return probe and a jprobe on the same function. A patch to remove this limitation is being tested. This feature is required by SystemTap (http://sourceware.org/systemtap), and reflects ideas contributed by several SystemTap developers, including Will Cohen and Ananth Mavinakayanahalli. Signed-off-by: Hien Nguyen <hien@us.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:21 -07:00
Christoph Hellwig	84de856ed3	[PATCH] quota: consolidate code surrounding vfs_quota_on_mount Move some code duplicated in both callers into vfs_quota_on_mount Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:20 -07:00
Neil Horman	ac20427ef6	[PATCH] add check to /proc/devices read routines Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads of /proc/devices from spilling over the provided page if more than 4096 bytes of string data are generated from all the registered character and block devices in a system Signed-off-by: Neil Horman <nhorman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:19 -07:00
Jesper Juhl	dcd497f99a	[PATCH] streamline preempt_count type across archs The preempt_count member of struct thread_info is currently either defined as int, unsigned int or __s32 depending on arch. This patch makes the type of preempt_count an int on all archs. Having preempt_count be an unsigned type prevents the catching of preempt_count < 0 bugs, and using int on some archs and __s32 on others is not exactely "neat" - much nicer when it's just int all over. A previous version of this patch was already ACK'ed by Robert Love, and the only change in this version of the patch compared to the one he ACK'ed is that this one also makes sure the preempt_count member is consistently commented. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:19 -07:00
Nick Piggin	35a82d1a53	[PATCH] optimise loop driver a bit Looks like locking can be optimised quite a lot. Increase lock widths slightly so lo_lock is taken fewer times per request. Also it was quite trivial to cover lo_pending with that lock, and remove the atomic requirement. This also makes memory ordering explicitly correct, which is nice (not that I particularly saw any mem ordering bugs). Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K block size, 4K page size). System is 2 socket Xeon with HT (4 thread). intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh Before: 0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k 0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k 0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k After: 0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k 0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Paulo Marques	543537bd92	[PATCH] create a kstrdup library function This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Alexander Viro	991114c6fa	[PATCH] fix for prune_icache()/forced final iput() races Based on analysis and a patch from Russ Weight <rweight@us.ibm.com> There is a race condition that can occur if an inode is allocated and then released (using iput) during the ->fill_super functions. The race condition is between kswapd and mount. For most filesystems this can only happen in an error path when kswapd is running concurrently. For isofs, however, the error can occur in a more common code path (which is how the bug was found). The logic here is "we want final iput() to free inode now instead of letting it sit in cache if fs is going down or had not quite come up". The problem is with kswapd seeing such inodes in the middle of being killed and happily taking over. The clean solution would be to tell kswapd to leave those inodes alone and let our final iput deal with them. I.e. add a new flag (I_FORCED_FREEING), set it before write_inode_now() there and make prune_icache() leave those alone. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:17 -07:00
Oleg Nesterov	fd450b7318	[PATCH] timers: introduce try_to_del_timer_sync() This patch splits del_timer_sync() into 2 functions. The new one, try_to_del_timer_sync(), returns -1 when it hits executing timer. It can be used in interrupt context, or when the caller hold locks which can prevent completion of the timer's handler. NOTE. Currently it can't be used in interrupt context in UP case, because ->running_timer is used only with CONFIG_SMP. Should the need arise, it is possible to kill #ifdef CONFIG_SMP in set_running_timer(), it is cheap. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:16 -07:00
Oleg Nesterov	55c888d6d0	[PATCH] timers fixes/improvements This patch tries to solve following problems: 1. del_timer_sync() is racy. The timer can be fired again after del_timer_sync have checked all cpus and before it will recheck timer_pending(). 2. It has scalability problems. All cpus are scanned to determine if the timer is running on that cpu. With this patch del_timer_sync is O(1) and no slower than plain del_timer(pending_timer), unless it has to actually wait for completion of the currently running timer. The only restriction is that the recurring timer should not use add_timer_on(). 3. The timers are not serialized wrt to itself. If CPU_0 does mod_timer(jiffies+1) while the timer is currently running on CPU 1, it is quite possible that local interrupt on CPU_0 will start that timer before it finished on CPU_1. 4. The timers locking is suboptimal. __mod_timer() takes 3 locks at once and still requires wmb() in del_timer/run_timers. The new implementation takes 2 locks sequentially and does not need memory barriers. Currently ->base != NULL means that the timer is pending. In that case ->base.lock is used to lock the timer. __mod_timer also takes timer->lock because ->base can be == NULL. This patch uses timer->entry.next != NULL as indication that the timer is pending. So it does __list_del(), entry->next = NULL instead of list_del() when the timer is deleted. The ->base field is used for hashed locking only, it is initialized in init_timer() which sets ->base = per_cpu(tvec_bases). When the tvec_bases.lock is locked, it means that all timers which are tied to this base via timer->base are locked, and the base itself is locked too. So __run_timers/migrate_timers can safely modify all timers which could be found on ->tvX lists (pending timers). When the timer's base is locked, and the timer removed from ->entry list (which means that _run_timers/migrate_timers can't see this timer), it is possible to set timer->base = NULL and drop the lock: the timer remains locked. This patch adds lock_timer_base() helper, which waits for ->base != NULL, locks the ->base, and checks it is still the same. __mod_timer() schedules the timer on the local CPU and changes it's base. However, it does not lock both old and new bases at once. It locks the timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base, and adds this timer. This simplifies the code, because AB-BA deadlock is not possible. __mod_timer() also ensures that the timer's base is not changed while the timer's handler is running on the old base. __run_timers(), del_timer() do not change ->base anymore, they only clear pending flag. So del_timer_sync() can test timer->base->running_timer == timer to detect whether it is running or not. We don't need timer_list->lock anymore, this patch kills it. We also don't need barriers. del_timer() and __run_timers() used smp_wmb() before clearing timer's pending flag. It was needed because __mod_timer() did not lock old_base if the timer is not pending, so __mod_timer()->list_add() could race with del_timer()->list_del(). With this patch these functions are serialized through base->lock. One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch adds global struct timer_base_s { spinlock_t lock; struct timer_list *running_timer; } __init_timer_base; which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s struct are replaced by struct timer_base_s t_base. It is indeed ugly. But this can't have scalability problems. The global __init_timer_base.lock is used only when __mod_timer() is called for the first time AND the timer was compile time initialized. After that the timer migrates to the local CPU. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Renaud Lienhart <renaud.lienhart@free.fr> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:16 -07:00
Tejun Heo	f7d37d028d	[PATCH] blk: remove BLK_TAGS_{PER_LONG\|MASK} Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Tejun Heo	fa72b903f7	[PATCH] blk: remove blk_queue_tag->real_max_depth optimization blk_queue_tag->real_max_depth was used to optimize out unnecessary allocations/frees on tag resize. However, the whole thing was very broken - tag_map was never allocated to real_max_depth resulting in access beyond the end of the map, bits in [max_depth..real_max_depth] were set when initializing a map and copied when resizing resulting in pre-occupied tags. As the gain of the optimization is very small, well, almost nill, remove the whole thing. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:15 -07:00
Vincent Hanquez	e9129e56e9	[PATCH] xen: x86_64: Add macro for debugreg Add 2 macros to set and get debugreg on x86_64. This is useful for Xen because it will need only to redefine each macro to a hypervisor call. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:14 -07:00
Vincent Hanquez	fa1e1bdf78	[PATCH] xen: x86: Rename usermode macro Rename user_mode to user_mode_vm and add a user_mode macro similar to the x86-64 one. This is useful for Xen because the linux xen kernel does not runs on the same priviledge that a vanilla linux kernel, and with this we just need to redefine user_mode(). Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:14 -07:00
Vincent Hanquez	f5012310e3	[PATCH] xen: x86: add macro for debugreg Add 2 macros to set and get debugreg on x86. This is useful for Xen because it will need only to redefine each macro to a hypervisor call. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:13 -07:00
Jan Beulich	32ecd42b6f	[PATCH] eliminate duplicate rdpmc definition Eliminate duplicate definition of rdpmc in x86-64's mtrr.h. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:13 -07:00
Andrew Morton	a3a255e744	[PATCH] x86: cpu_khz type fix x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use unsigned long. So make cpu_khz unsigned int on x86 as well. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:11 -07:00
Alexey Dobriyan	a9ed881796	[PATCH] x86: #include asm/uaccess.h in asm/checksum.h csum_and_copy_to_user is static inline and uses VERIFY_WRITE. Patch allows to remove asm/uaccess.h from i386_ksyms.c without dependency surprises. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:11 -07:00
Christoph Lameter	b5d23e5b8c	[PATCH] ia64: Selectable Timer Interrupt Frequency It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ. Reducing the timer frequency may have important performance benefits on large systems. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:10 -07:00
Christoph Lameter	5912100372	[PATCH] i386: Selectable Frequency of the Timer Interrupt Make the timer frequency selectable. The timer interrupt may cause bus and memory contention in large NUMA systems since the interrupt occurs on each processor HZ times per second. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:10 -07:00
Natalie Protasevich	ca05fea6db	[PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386) This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and use heuristics to prevent unique I/O APIC ID check for systems that don't need it. The patch disables unique I/O APIC ID check for Xeon-based and other platforms that don't use serial APIC bus for interrupt delivery. Andi stated that AMD systems don't need unique IO_APIC_IDs either. Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:09 -07:00
Christoph Lameter	1946089a10	[PATCH] NUMA aware block device control structure allocation Patch to allocate the control structures for for ide devices on the node of the device itself (for NUMA systems). The patch depends on the Slab API change patch by Manfred and me (in mm) and the pcidev_to_node patch that I posted today. Does some realignment too. Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org> Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Pravin Shelar <pravin@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:09 -07:00
Christoph Lameter	8c5a09082f	[PATCH] x86/x86_64: pcibus_to_node Define pcibus_to_node to be able to figure out which NUMA node contains a given PCI device. This defines pcibus_to_node(bus) in include/linux/topology.h and adjusts the macros for i386 and x86_64 that already provided a way to determine the cpumask of a pci device. x86_64 was changed to not build an array of cpumasks anymore. Instead an array of nodes is build which can be used to generate the cpumask via node_to_cpumask. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Christoph Lameter	e164f5573b	[PATCH] ppc64: pcibus_to_node fix asm-generic/topology.h must also be included if CONFIG_NUMA is set in order to provide the fall back pcibus_to_node function. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Hirokazu Takata	4a35293667	[PATCH] m32r: build fix for asm-m32r/topology.h Use asm-generic/topology.h to fix yet another pcibus_to_node() build error. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Venkatesh Pallipadi	8a9e1b0f56	[PATCH] Platform SMIs and their interferance with tsc based delay calibration Issue: Current tsc based delay_calibration can result in significant errors in loops_per_jiffy count when the platform events like SMIs (System Management Interrupts that are non-maskable) are present. This could lead to potential kernel panic(). This issue is becoming more visible with 2.6 kernel (as default HZ is 1000) and on platforms with higher SMI handling latencies. During the boot time, SMIs are mostly used by BIOS (for things like legacy keyboard emulation). Description: The psuedocode for current delay calibration with tsc based delay looks like (0) Estimate a value for loops_per_jiffy (1) While (loops_per_jiffy estimate is accurate enough) (2) wait for jiffy transition (jiffy1) (3) Note down current tsc (tsc1) (4) loop until tsc becomes tsc1 + loops_per_jiffy (5) check whether jiffy changed since jiffy1 or not and refine loops_per_jiffy estimate Consider the following cases Case 1: If SMIs happen between (2) and (3) above, we can end up with a loops_per_jiffy value that is too low. This results in shorted delays and kernel can panic () during boot (Mostly at IOAPIC timer initialization timer_irq_works() as we don't have enough timer interrupts in a specified interval). Case 2: If SMIs happen between (3) and (4) above, then we can end up with a loops_per_jiffy value that is too high. And with current i386 code, too high lpj value (greater than 17M) can result in a overflow in delay.c:__const_udelay() again resulting in shorter delay and panic(). Solution: The patch below makes the calibration routine aware of asynchronous events like SMIs. We increase the delay calibration time and also identify any significant errors (greater than 12.5%) in the calibration and notify it to user. Patch below changes both i386 and x86-64 architectures to use this new and improved calibrate_delay_direct() routine. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:08 -07:00
Matt Tolentino	bbfceef47f	[PATCH] add x86-64 specific support for sparsemem This patch adds in the necessary support for sparsemem such that x86-64 kernels may use sparsemem as an alternative to discontigmem for NUMA kernels. Note that this does no preclude one from continuing to build NUMA kernels using discontigmem, but merely allows the option to build NUMA kernels with sparsemem. Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels results in reduced text size for otherwise equivalent kernels as shown in the example builds below: text data bss dec hex filename 2371036 765884 1237108 4374028 42be0c vmlinux.discontig 2366549 776484 1302772 4445805 43d66d vmlinux.sparse Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:07 -07:00
Matt Tolentino	2b97690f4c	[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options In order to use the alternative sparsemem implmentation for NUMA kernels, we need to reorganize the config options. This patch effectively abstracts out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus, the discontigmem implementation may be employed as always, but the sparsemem implementation may be used alternatively. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:06 -07:00
Andy Whitcroft	145e664231	[PATCH] ppc64: sparsemem memory model Provide the architecture specific implementation for SPARSEMEM for PPC64 systems. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part) Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:06 -07:00
Andy Whitcroft	510f8fa7ba	[PATCH] ppc64: add early_pfn_to_nid Provide an implementation of early_pfn_to_nid for PPC64. This is used by memory models to determine the node from which to take allocations before the memory allocators are fully initialised. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	29751f6991	[PATCH] sparsemem hotplug base Make sparse's initalization be accessible at runtime. This allows sparse mappings to be created after boot in a hotplug situation. This patch is separated from the previous one just to give an indication how much of the sparse infrastructure is just for hotplug memory. The section_mem_map doesn't really store a pointer. It stores something that is convenient to do some math against to get a pointer. It isn't valid to just do section_mem_map, so I don't think it should be stored as a pointer. There are a couple of things I'd like to store about a section. First of all, the fact that it is !NULL does not mean that it is present. There could be such a combination where section_mem_map is* NULL, but the math gets you properly to a real mem_map. So, I don't think that check is safe. Since we're storing 32-bit-aligned structures, we have a few bits in the bottom of the pointer to play with. Use one bit to encode whether there's really a mem_map there, and the other one to tell whether there's a valid section there. We need to distinguish between the two because sometimes there's a gap between when a section is discovered to be present and when we can get the mem_map for it. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	641c767389	[PATCH] sparsemem swiss cheese numa layouts The part of the sparsemem patch which modifies memmap_init_zone() has recently become a problem. It changes behavior so that there is a call to pfn_to_page() for each individual page inside of a node's range: node_start_pfn through node_end_pfn. It used to simply do this once, at the beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside of a node made it necessary to change. Mike Kravetz recently wrote a patch which made the NUMA code accept some new kinds of layouts. The system's memory was laid out like this, with node 0's memory in two pieces: one before and one after node 1's memory: Node 0: +++++ +++++ Node 1: +++++ Previous behavior before Mike's patch was to assign nodes like this: Node 0: 00000 XXXXX Node 1: 11111 Where the 'X' areas were simply thrown away. The new behavior was to make the pg_data_t span node 0 across all of its areas, including areas that are really node 1's: Node 0: 000000000000000 Node 1: 11111 This wastes a little bit of mem_map space, but ends up being OK, and more fully utilizes the system's memory. memmap_init_zone() initializes all of the "struct page"s for node 0, even for the "hole", but those never get used, because there is no pfn_to_page() that resolves to those pages. However, only calling pfn_to_page() once, memmap_init_zone() always uses the pages that were allocated for node0->node_mem_map because: struct page *start = pfn_to_page(start_pfn); // effectively start = &node->node_mem_map[0] for (page = start; page < (start + size); page++) { init_page_here();... page++; } Slow, and wasteful, but generally harmless. But, modify that to call pfn_to_page() for each loop iteration (like sparsemem does): for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) { page = pfn_to_page(pfn); } And you end up trying to initialize node 1's pages too early, along with bogus data from node 0. This patch checks for those weird layouts and declines to touch the pages, making the more frequent pfn_to_page() calls OK to do. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	05b79bdcb4	[PATCH] sparsemem memory model for i386 Provide the architecture specific implementation for SPARSEMEM for i386 SMP and NUMA systems. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:05 -07:00
Andy Whitcroft	d41dee369b	[PATCH] sparsemem memory model Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of mem_map[] is needed by discontiguous memory machines (like in the old CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually become a complete replacement. A significant advantage over DISCONTIGMEM is that it's completely separated from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA and DISCONTIG are often confused. Another advantage is that sparse doesn't require each NUMA node's ranges to be contiguous. It can handle overlapping ranges between nodes with no problems, where DISCONTIGMEM currently throws away that memory. Sparsemem uses an array to provide different pfn_to_page() translations for each SECTION_SIZE area of physical memory. This is what allows the mem_map[] to be chopped up. In order to do quick pfn_to_page() operations, the section number of the page is encoded in page->flags. Part of the sparsemem infrastructure enables sharing of these bits more dynamically (at compile-time) between the page_zone() and sparsemem operations. However, on 32-bit architectures, the number of bits is quite limited, and may require growing the size of the page->flags type in certain conditions. Several things might force this to occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of memory), an increase in the physical address space, or an increase in the number of used page->flags. One thing to note is that, once sparsemem is present, the NUMA node information no longer needs to be stored in the page->flags. It might provide speed increases on certain platforms and will be stored there if there is room. But, if out of room, an alternate (theoretically slower) mechanism is used. This patch introduces CONFIG_FLATMEM. It is used in almost all cases where there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM often have to compile out the same areas of code. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:04 -07:00
Andy Whitcroft	b159d43fbf	[PATCH] generify early_pfn_to_nid Provide a default implementation for early_pfn_to_nid returning node 0. Allow architectures to override this with their own implementation out of asm/mmzone.h. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:04 -07:00
Dave Hansen	93b7504e3e	[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG There is some confusion that arose when working on SPARSEMEM patch between what is needed for DISCONTIG vs. NUMA. Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently. All of the current NUMA implementations require an implementation of DISCONTIG. Because of this, quite a lot of code which is really needed for NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n case. Introducing this new NEED_MULTIPLE_NODES config option allows code that is needed for both NUMA or DISCONTIG to be separated out from code that is specific to DISCONTIG. One great advantage of this approach is that it doesn't require every architecture to be converted over. All of the current implementations should "just work", only the ones implementing SPARSEMEM will have to be fixed up. The change to free_area_init() makes it work inside, or out of the new config option. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:03 -07:00
Dave Hansen	5b505b90b2	[PATCH] sparsemem base: teach discontig about sparse ranges discontig.c has some assumptions that mem_map[]s inside of a node are contiguous. Teach it to make sure that each region that it's bringing online is actually made up of valid ranges of ram. Written-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	348f8b6c48	[PATCH] sparsemem base: reorganize page->flags bit operations Generify the value fields in the page_flags. The aim is to allow the location and size of these fields to be varied. Additionally we want to move away from fixed allocations per field whilst still enforcing the overall bit utilisation limits. We rely on the compiler to spot and optimise the accessor functions. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	6f167ec721	[PATCH] sparsemem base: simple NUMA remap space allocator Introduce a simple allocator for the NUMA remap space. This space is very scarce, used for structures which are best allocated node local. This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep the pgdat->node_mem_map initialized in a consistent place for all architectures. Issues: o alloc_remap takes a node_id where we might expect a pgdat which was intended to allow us to allocate the pgdat's using this mechanism; which we do not yet do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:01 -07:00
Dave Hansen	408fde81c1	[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map This patch effectively eliminates direct use of pgdat->node_mem_map outside of the DISCONTIG code. On a flat memory system, these fields aren't currently used, neither are they on a sparsemem system. There was also a node_mem_map(nid) macro on many architectures. Its use along with the use of ->node_mem_map itself was not consistent. It has been removed in favor of two new, more explicit, arch-independent macros: pgdat_page_nr(pgdat, pagenr) nid_page_nr(nid, pagenr) I called them "pgdat" and "nid" because we overload the term "node" to mean "NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I believe the newer names are much clearer. These macros can be overridden in the sparsemem case with a theoretically slower operation using node_start_pfn and pfn_to_page(), instead. We could make this the only behavior if people want, but I don't want to change too much at once. One thing at a time. This patch removes more code than it adds. Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386 generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/ Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin J. Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:00 -07:00
John Rose	dad32bbf43	[PATCH] pSeries - read irqs dynamically For I/O DLPAR to work properly, the kernel needs to allow for dynamic assignment of the irq field of the pci_dev structure upon dynamic bus addition. This patch moves the assignment of that field from pSeries_final_fixup() to pcibios_fixup_bus(), which enables dynamic assignment for the children of a newly added bus. Currently, pci_devs receive their irq numbers in one of two ways. The irq line is either read at boot for all pci_devs, or read by the rpaphp module at slot enable time. The latter is no longer sufficient for DLPAR addition of slots that don't qualify as PCI-hotplug capable. This solution handles the cases of boot and dynamic add. Signed-off-by: John Rose <johnrose@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 17:09:54 +10:00
Linus Torvalds	060de20e82	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-22 23:11:50 -07:00
Shaun Pereira	ebc3f64b86	[X25]: Fast select with no restriction on response This patch is a follow up to patch 1 regarding "Selective Sub Address matching with call user data". It allows use of the Fast-Select-Acceptance optional user facility for X.25. This patch just implements fast select with no restriction on response (NRR). What this means (according to ITU-T Recomendation 10/96 section 6.16) is that if in an incoming call packet, the relevant facility bits are set for fast-select-NRR, then the called DTE can issue a direct response to the incoming packet using a call-accepted packet that contains call-user-data. This patch allows such a response. The called DTE can also respond with a clear-request packet that contains call-user-data. However, this feature is currently not implemented by the patch. How is Fast Select Acceptance used? By default, the system does not allow fast select acceptance (as before). To enable a response to fast select acceptance, After a listen socket in created and bound as follows socket(AF_X25, SOCK_SEQPACKET, 0); bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr)); but before a listen system call is made, the following ioctl should be used. ioctl(call_soc,SIOCX25CALLACCPTAPPRV); Now the listen system call can be made listen(call_soc, 4); After this, an incoming-call packet will be accepted, but no call-accepted packet will be sent back until the following system call is made on the socket that accepts the call ioctl(vc_soc,SIOCX25SENDCALLACCPT); The network (or cisco xot router used for testing here) will allow the application server's call-user-data in the call-accepted packet, provided the call-request was made with Fast-select NRR. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:16:17 -07:00
Shaun Pereira	cb65d506c3	[X25]: Selective sub-address matching with call user data. From: Shaun Pereira <spereira@tusc.com.au> This is the first (independent of the second) patch of two that I am working on with x25 on linux (tested with xot on a cisco router). Details are as follows. Current state of module: A server using the current implementation (2.6.11.7) of the x25 module will accept a call request/ incoming call packet at the listening x.25 address, from all callers to that address, as long as NO call user data is present in the packet header. If the server needs to choose to accept a particular call request/ incoming call packet arriving at its listening x25 address, then the kernel has to allow a match of call user data present in the call request packet with its own. This is required when multiple servers listen at the same x25 address and device interface. The kernel currently matches ALL call user data, if present. Current Changes: This patch is a follow up to the patch submitted previously by Andrew Hendry, and allows the user to selectively control the number of octets of call user data in the call request packet, that the kernel will match. By default no call user data is matched, even if call user data is present. To allow call user data matching, a cudmatchlength > 0 has to be passed into the kernel after which the passed number of octets will be matched. Otherwise the kernel behavior is exactly as the original implementation. This patch also ensures that as is normally the case, no call user data will be present in the Call accepted / call connected packet sent back to the caller Future Changes on next patch: There are cases however when call user data may be present in the call accepted packet. According to the X.25 recommendation (ITU-T 10/96) section 5.2.3.2 call user data may be present in the call accepted packet provided the fast select facility is used. My next patch will include this fast select utility and the ability to send up to 128 octets call user data in the call accepted packet provided the fast select facility is used. I am currently testing this, again with xot on linux and cisco. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> (With a fix from Alexey Dobriyan <adobriyan@gmail.com>) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:15:01 -07:00
Jeff Moyer	fbeec2e155	[NETPOLL]: allow multiple netpoll_clients to register against one interface This patch provides support for registering multiple netpoll clients to the same network device. Only one of these clients may register an rx_hook, however. In practice, this restriction has not been problematic. It is worth mentioning, though, that the current design can be easily extended to allow for the registration of multiple rx_hooks. The basic idea of the patch is that the rx_np pointer in the netpoll_info structure points to the struct netpoll that has rx_hook filled in. Aside from this one case, there is no need for a pointer from the struct net_device to an individual struct netpoll. A lock is introduced to protect the setting and clearing of the np_rx pointer. The pointer will only be cleared upon netpoll client module removal, and the lock should be uncontested. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:59 -07:00
Jeff Moyer	115c1d6e61	[NETPOLL]: Introduce a netpoll_info struct This patch introduces a netpoll_info structure, which the struct net_device will now point to instead of pointing to a struct netpoll. The reason for this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock should be maintained per net_device, not per netpoll; and 2) this is a first step in providing support for multiple netpoll clients to register against the same net_device. The struct netpoll is now pointed to by the netpoll_info structure. As such, the previous behaviour of the code is preserved. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:31 -07:00
Jeff Moyer	6ca4f65e6b	[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock() This trivial patch moves the assignment of poll_owner to -1 inside of the lock. This fixes a potential SMP race in the code. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:04:55 -07:00
Arnd Bergmann	fef1c772fa	[PATCH] ppc64: add BPA platform type This adds the basic support for running on BPA machines. So far, this is only the IBM workstation, and it will not run on others without a little more generalization. It should be possible to configure a kernel for any combination of CONFIG_PPC_BPA with any of the other multiplatform targets. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:37 +10:00
Utz Bacher	5f5b4e669a	[PATCH] ppc64: add a minimal nvram driver The firmware provides the location and size of the nvram in the device tree, so it does not really contain any hardware specific bits and could be used on other machines as well. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:31 +10:00
Arnd Bergmann	6566c6f1f1	[PATCH] ppc64: pSeries_progress -> rtas_progress The pSeries_progress function is called from some places in the rtas code, which may also be used by non-pSeries platforms. Though pSeries is currently the only platform type that implements display-character, the code is actually generic enough to be part of the rtas subsystem. I hit a bug here because the generic rtas code tried calling ppc_md.progress, which points to an __init function on most platforms. We could also clear the ppc_md.progress pointer when freeing the init memory to make it more explicit that ppc_md.progress must not be called after bootup. Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:28 +10:00
Arnd Bergmann	773bf9c469	[PATCH] ppc64: rename pSeries rtc functions into rtas_* The rtc rtas functions are not pSeries specific but can also be used by BPA and other SLOF based platforms Signed-off-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:18 +10:00
Arnd Bergmann	10f7e7c15e	[PATCH] ppc64: consolidate calibrate_decr implementations pSeries and maple have almost the same code for calibrate_decr, and BPA would need yet another copy. Instead, I'm moving the code to arch/ppc64/kernel/time.c. Some of the related declarations were missing from header files, so I'm moving those as well. It makes sense to merge this with the pmac function of the same name, so we end up having just one implemetation for iSeries and one for Open Firmware based machines. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-06-23 09:43:07 +10:00
Linus Torvalds	a493604400	Merge master.kernel.org:/home/rmk/linux-2.6-arm	2005-06-22 14:51:06 -07:00
Linus Torvalds	9092131f7e	Merge rsync://client.linux-nfs.org/pub/linux/nfs-2.6	2005-06-22 14:32:15 -07:00
Trond Myklebust	8d0a8a9d0e	[PATCH] NFSv4: Clean up nfs4 lock state accounting Ensure that lock owner structures are not released prematurely. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:42 -04:00
Trond Myklebust	ecdbf769b2	[PATCH] NLM: fix a client-side race on blocking locks. If the lock blocks, the server may send us a GRANTED message that races with the reply to our LOCK request. Make sure that we catch the GRANTED by queueing up our request on the nlm_blocked list before we send off the first LOCK rpc call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:42 -04:00
Trond Myklebust	3da28eb1c6	[PATCH] NFS: Replace nfs_page insertion sort with a radix sort Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00
Trond Myklebust	c6a556b88a	[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient. Basically copies the VFS's method for tracking writebacks and applies it to the struct nfs_page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:39 -04:00
Trond Myklebust	fe51beecc5	[PATCH] NFS: Ensure that fstat() always returns the correct mtime Even if the file is open for writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:37 -04:00
Trond Myklebust	7d52e86274	[PATCH] NFS: Cleanup of caching code, and slight optimization of writes. Unless we're doing O_APPEND writes, we really don't care about revalidating the file length. Just make sure that we catch any page cache invalidations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:37 -04:00
Trond Myklebust	951a143b3f	[PATCH] NFS: Fix the file size revalidation Instead of looking at whether or not the file is open for writes before we accept to update the length using the server value, we should rather be looking at whether or not we are currently caching any writes. Failure to do so means in particular that we're not updating the file length correctly after obtaining a POSIX or BSD lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:36 -04:00
Trond Myklebust	f0dd2136da	[PATCH] NFS: Clean up readdir changes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:34 -04:00
Olivier Galibert	00a9264227	[PATCH] NFS: Hide NFS server-generated readdir cookies from userland NFSv3 currently returns the unsigned 64-bit cookie directly to userspace. The following patch causes the kernel to generate loff_t offsets for the benefit of userland. The current server-generated READDIR cookie is cached in the nfs_open_context instead of in filp->f_pos, so we still end up work correctly under directory insertions/deletion. Signed-off-by: Olivier Galibert <galibert@pobox.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:33 -04:00
Andreas Gruenbacher	5c6a9f7d92	[PATCH] NFS: Cache the NFSv3 acls. Attach acls to inodes in the icache to avoid unnecessary GETACL RPC round-trips. As long as the client doesn't retrieve any acls itself, only the default acls of exiting directories and the default and access acls of new directories will end up in the cache, which preserves some memory compared to always caching the access and default acl of all files. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:25 -04:00
Andreas Gruenbacher	055ffbea05	[PATCH] NFS: Fix handling of the umask when an NFSv3 default acl is present. NFSv3 has no concept of a umask on the server side: The client applies the umask locally, and sends the effective permissions to the server. This behavior is wrong when files are created in a directory that has a default ACL. In this case, the umask is supposed to be ignored, and only the default ACL determines the file's effective permissions. Usually its the server's task to conditionally apply the umask. But since the server knows nothing about the umask, we have to do it on the client side. This patch tries to fetch the parent directory's default ACL before creating a new file, computes the appropriate create mode to send to the server, and finally sets the new file's access and default acl appropriately. Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial version of this patch, as well as for arguing why we need this change. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:24 -04:00
Andreas Gruenbacher	b7fa0554cf	[PATCH] NFS: Add support for NFSv3 ACLs This adds acl support fo nfs clients via the NFSACL protocol extension, by implementing the getxattr, listxattr, setxattr, and removexattr iops for the system.posix_acl_access and system.posix_acl_default attributes. This patch implements a dumb version that uses no caching (and thus adds some overhead). (Another patch in this patchset adds caching as well.) Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:24 -04:00
Andreas Gruenbacher	a257cdd0e2	[PATCH] NFSD: Add server support for NFSv3 ACLs. This adds functions for encoding and decoding POSIX ACLs for the NFSACL protocol extension, and the GETACL and SETACL RPCs. The implementation is compatible with NFSACL in Solaris. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:23 -04:00

... 2 3 4 5 6 ...

957 Commits (bc971dee6ece1fd0d431948924becd9c50e7b778)