linux

q3k/linux

Author	SHA1	Message	Date
Joe Perches	a2681a7542	MAINTAINERS: update SCORE architecture name style and add file pattern Signed-off-by: Joe Perches <joe@perches.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:26 -07:00
Joe Perches	ee709b0c62	MAINTAINERS: update Kernel Janitors after mismerge Fix the mismerge of the W: URL and the S: status fields. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:26 -07:00
Joe Perches	7a241d6ecd	MAINTAINERS: use tab not spaces after field types Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:26 -07:00
Joe Perches	476604de5a	MAINTAINERS: change ATM mailing list to moderated Signed-off-by: Joe Perches <joe@perches.com> Cc: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Joe Perches	0e24bdd49d	MAINTAINERS: update OMAP Tony Lindgren email name Which had an embedded and duplicated email address Signed-off-by: Joe Perches <joe@perches.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Joe Perches	d6f005a108	MAINTAINERS: update TRACING section Move to alphabetic position Use single line F: entries Signed-off-by: Joe Perches <joe@perches.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Joe Perches	bda2562c34	MAINTAINERS: update GENERIC UIO FOR PCI DEVICES Quote a name with a period remove L: linux-kernel@vger.kernel.org Signed-off-by: Joe Perches <joe@perches.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Roger Quadros	8753298a11	omap_hsmmc: add missing probe handler hook The missing probe handler hook will never probe the driver. Add it back. Fixes broken MMC on OMAP. We use platform_driver_probe() API since omap_hsmmc is not a hot-pluggable device. Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com> Tested-by: Felipe Contreras <felipe.contreras@gmail.com> Tested-by: Tony Lindgren <tony@atomide.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Felipe Contreras <felipe.contreras@gmail.com> Cc: Denis Karpov <ext-denis.2.karpov@nokia.com> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
KOSAKI Motohiro	0a1b71b400	strstrip(): mark as as must_check strstrip() can return a modified value of its input argument, when removing elading whitesapce. So it is surely bug for this function's return value to be ignored. The caller is probably going to use the incorrect original pointer. So mark it __must_check to prevent this frm happening (as it has before). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
KOSAKI Motohiro	478988d3b2	cgroup: fix strstrip() misuse cgroup_write_X64() and cgroup_write_string() ignore the return value of strstrip(). it makes small inconsistent behavior. example: ========================= # cd /mnt/cgroup/hoge # cat memory.swappiness 60 # echo "59 " > memory.swappiness # cat memory.swappiness 59 # echo " 58" > memory.swappiness bash: echo: write error: Invalid argument This patch fixes it. Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
KOSAKI Motohiro	58355c7876	congestion_wait(): don't use WRITE commit `8aa7e847d` (Fix congestion_wait() sync/async vs read/write confusion) replace WRITE with BLK_RW_ASYNC. Unfortunately, concurrent mm development made the unchanged place accidentally. This patch fixes it too. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Christian Borntraeger	0d0df599f1	connector: fix regression introduced by sid connector Since commit `02b51df1b0` (proc connector: add event for process becoming session leader) we have the following warning: Badness at kernel/softirq.c:143 [...] Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0) [...] Call Trace: ([<000000013fe04100>] 0x13fe04100) [<000000000048a946>] sk_filter+0x9a/0xd0 [<000000000049d938>] netlink_broadcast+0x2c0/0x53c [<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0 [<00000000003baef0>] proc_sid_connector+0xc4/0xd4 [<0000000000142604>] __set_special_pids+0x58/0x90 [<0000000000159938>] sys_setsid+0xb4/0xd8 [<00000000001187fe>] sysc_noemu+0x10/0x16 [<00000041616cb266>] 0x41616cb266 The warning is ---> WARN_ON_ONCE(in_irq() \|\| irqs_disabled()); The network code must not be called with disabled interrupts but sys_setsid holds the tasklist_lock with spinlock_irq while calling the connector. After a discussion we agreed that we can move proc_sid_connector from __set_special_pids to sys_setsid. We also agreed that it is sufficient to change the check from task_session(curr) != pid into err > 0, since if we don't change the session, this means we were already the leader and return -EPERM. One last thing: There is also daemonize(), and some people might want to get a notification in that case. Since daemonize() is only needed if a user space does kernel_thread this does not look important (and there seems to be no consensus if this connector should be called in daemonize). If we really want this, we can add proc_sid_connector to daemonize() in an additional patch (Scott?) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Scott James Remnant <scott@ubuntu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Hugh Dickins	370c28def6	hwpoison: fix/proc/meminfo alignment Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted line is being shown too far right (it does align with x86_64's VmallocChunk above, but I hope nobody will ever have that much corrupted!). Align it. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:25 -07:00
Hugh Dickins	92f7ba70ee	hwpoison: fix oops on ksm pages Memory failure on a KSM page currently oopses on its NULL anon_vma in page_lock_anon_vma(): that may not be much worse than the consequence of ignoring it, but it is better to be consistent with how ZERO_PAGE and hugetlb pages and other awkward cases are treated. Just skip it. We could fix it for 2.6.32 at the KSM end, by putting a dummy anon_vma pointer in there; but that would get harder next time, when KSM will put a pointer to something else there (and I'm not currently planning to do any work to open that up to memory_failure). So I would prefer this simple PageKsm test, until the other exceptions are handled. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:24 -07:00
Randy Dunlap	2eca40a8cc	cpufreq: add cpufreq_get() stub for CONFIG_CPU_FREQ=n When CONFIG_CPU_FREQ is disabled, cpufreq_get() needs a stub. Used by kvm (although it looks like a bit of the kvm code could be omitted when CONFIG_CPU_FREQ is disabled). arch/x86/built-in.o: In function `kvm_arch_init': (.text+0x10de7): undefined reference to `cpufreq_get' (Needed in linux-next's KVM tree, but it's correct in 2.6.32). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Eric Paris <eparis@redhat.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:24 -07:00
Heiko Carstens	b3dcf3de8e	[S390] smp: fix sigp sense handling sigp sense only returns the status of a cpu if it is non zero. If the status of the sensed cpu is all zeros condition code 0 (accpeted) is set and no status bits are returned. The current code however assumes that a status was returned and tests bits in it. This means uninitalized data is accessed with random results. Worst case is that the code that checks if cpu is offline on cpu hotplug assumes that the target cpu is offline while it is still running. This leads potentially to memory corruption since resources that are still needed by the target cpu will be freed and could be resused while still in use. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:13 +01:00
Heiko Carstens	f8501ba77d	[S390] smp: fix sigp stop handling According to the architecture a cpu must not necessarily enter stopped state after completion of a sigp instruction with "stop" order code. So remove the BUG() statement after self sending sigp stop to avoid that it ever gets reached. Also add a sigp busy check to make sure that the order gets delivered. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:13 +01:00
Martin Schwidefsky	70f5dc514c	[S390] cputime: fix overflow on 31 bit systems The cputime_to_msecs / cputime_to_clock_t and cputime64_to_clock_t cause fixpoint divide exceptions if the cputime is too large. On a machine that collected 49.7 days worth of idle time reading from /proc/stat will generate oopses like this: Kernel BUG at 001b0c92 [verbose debug info unavailable] fixpoint divide exception: 0009 [#13] SMP Modules linked in: ipv6 CPU: 1 Tainted: G D 2.6.27.10 #5 Process cat (pid: 21352, task: 1fb34138, ksp: 1d2a3d98) Krnl PSW : 070c2000 801b0c92 (show_stat+0x2ca/0x68c) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 Krnl GPRS: 00000001 00001388 00000bb8 0015d2a1 00000000 00000000 000003e8 0001fd91 00000000 00000000 0000129d eecd2ff0 1cc533b9 0036f780 801b0bce 1d2a3cc0 Krnl Code: 801b0c86: f18890abf198 mvo 171(9,%r9),408(9,%r15) 801b0c8c: 98abf170 lm %r10,%r11,368(%r15) 801b0c90: 1da1 dr %r10,%r1 >801b0c92: 90abf170 stm %r10,%r11,368(%r15) 801b0c96: 98abf190 lm %r10,%r11,400(%r15) 801b0c9a: 1da1 dr %r10,%r1 801b0c9c: 90abf190 stm %r10,%r11,400(%r15) 801b0ca0: 18a3 lr %r10,%r3 Call Trace: ([<00000000001b09f4>] show_stat+0x2c/0x68c) [<000000000018dcee>] seq_read+0xb2/0x364 [<00000000001a9980>] proc_reg_read+0x68/0x98 [<00000000001705ee>] vfs_read+0x6e/0xe8 [<0000000000170732>] sys_read+0x36/0x78 [<000000000010f750>] sysc_do_restart+0x12/0x16 [<0000000077f3ad6a>] 0x77f3ad6a <4>---[ end trace 1436ea9559d3de9e ]--- Reported-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:13 +01:00
Heiko Carstens	e8a79c9ec7	[S390] call home: fix string length handling After copying uts->nodename to the static nodename array the static version isn't necessarily zero termininated, since the size of the array is one byte too short. Afterwards doing strncat(data, nodename, strlen(nodename)); may copy an arbitrary large amount of bytes. Fix this by getting rid of the static array and using strncat with proper length limit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:12 +01:00
Heiko Carstens	4a0fb4c445	[S390] call home: fix error handling in init function Fix missing unregister_sysctl_table in case the SCLP doesn't provide the requested feature. Also simplify the whole error handling while at it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:12 +01:00
Heiko Carstens	4f8048ee73	[S390] smp: fix prefix handling of offlined cpus Offlined cpus still have valid prefix register contents. Dumpers will store the register contents of a cpu to the location where its prefix register points to. For offlined cpus the area (lowcore) has been freed and the dumper would write the uninteresting contents of the offline cpu to a memory location which might be in use by some other component and destroy valueable information. To fix this set the prefix register of offline cpus to absolute address zero again. This prevents the current dumpers to write to random memory locations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:12 +01:00
Martin Schwidefsky	8ca45667f9	[S390] s/r: cmm resume fix If a suspended z/VM guest has been logged off before the resume the 'SET SMSG IUCV' CP command need to be repeated to reenable sending message via SMSG. This fixes the following error: HCPMFS057I H4214002 not receiving; SMSG off Error: non-zero CP response for command 'SMSG H4214002 CMM SHRINK 5010': #57 Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:12 +01:00
Sebastian Ott	3f0b3c33ee	[S390] call home: fix local buffer usage in proc handler Fix the size of the local buffer and use snprintf to prevent further miscalculations. Also fix the usage of bitwise vs logic operations. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-29 15:05:12 +01:00
Christoph Hellwig	ab0a9735e0	blkdev: flush disk cache on ->fsync Currently there is no barrier support in the block device code. That means we cannot guarantee any sort of data integerity when using the block device node with dis kwrite caches enabled. Using the raw block device node is a typical use case for virtualization (and I assume databases, too). This patch changes block_fsync to issue a cache flush and thus make fsync on block device nodes actually useful. Note that in mainline we would also need to add such code to the ->aio_write method for O_SYNC handling, but assuming that Jan's patch series for the O_SYNC rewrite goes in it will also call into ->fsync for 2.6.32. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-29 14:14:04 +01:00
Jens Axboe	b9d128f108	block: move bdi/address_space unplug functions to backing-dev.h There's nothing block related about them, the backing device is used by things like NFS etc as well. This gets rid of the need to protect such calls by CONFIG_BLOCK. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-29 13:59:26 +01:00
Jens Axboe	592b09a42f	backing-dev: ensure that a removed bdi no longer has super_block referencing it When the bdi is being removed, we have to ensure that no super_blocks currently have that cached in sb->s_bdi. Normally this is ensured by the sb having a longer life span than the bdi, but if the device is suddenly yanked, we have to kill this reference. sb->s_bdi is pointed to freed memory at that point. This fixes a problem with sync(1) hanging when a USB stick is pulled without cleanly umounting it first. Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-29 11:46:12 +01:00
Gabor Gombas	b5dd884e68	net: Fix 'Re: PACKET_TX_RING: packet size is too long' Currently PACKET_TX_RING forces certain amount of every frame to remain unused. This probably originates from an early version of the PACKET_TX_RING patch that in fact used the extra space when the (since removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current code does not make any use of this extra space. This patch removes the extra space reservation and lets userspace make use of the full frame size. Signed-off-by: Gabor Gombas <gombasg@sztaki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 03:19:11 -07:00
Mikulas Patocka	9bd7496f5d	ide: Serialize CMD643 and CMD646 to fix a hardware bug with SSD CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is connected to one of the channels. Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3. /dev/hda is 8GB Seagate ST38410A in MWDMA2 /dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2 - When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda are corrupted, there are twiddled single bits 1->0 and some full 32-bit words corrupted, sometimes commands fail (which switches /dev/hda to PIO mode but the corruptions happen even in PIO). - Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine). - When I connected normal rotating harddisk to /dev/hdd, there was no corruption, so the corruption is something specific to SSD. - I tried the same setup on a PCI card with CMD649 and saw no corruption. This patch serializes the operation for CMD646 and 643 (I didn't test CMD643 but it may have the same hw bug too because it's earlier design). CMD649 is good. I don't know anything about CMD 648. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 03:02:06 -07:00
Janusz Krzysztofik	06b71b657b	netdev: usb: dm9601.c can drive a device not supported yet, add support for it I found that the current version of drivers/net/usb/dm9601.c can be used to successfully drive a low-power, low-cost network adapter with USB ID 0a46:9000, based on a DM9000E chipset. As no device with this ID is yet present in the kernel, I have created a patch that adds support for the device to the dm9601 driver. Created and tested against linux-2.6.32-rc5. Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 01:19:48 -07:00
Ron Mercer	da03945140	qlge: Fix firmware mailbox command timeout. The mailbox command process would only process a maximum of 5 unrelated firmware events while waiting for it's command completion status. It should process an unlimited number of events while waiting for a maximum of 5 seconds. Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 01:17:15 -07:00
Ron Mercer	6d190c6edf	qlge: Fix EEH handling. Clean up driver resources without touch the hardware. Add pci save/restore state. Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 01:17:14 -07:00
Neil Horman	55888dfb6b	AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2) Augment raw_send_hdrinc to correct for incorrect ip header length values A series of oopses was reported to me recently. Apparently when using AF_RAW sockets to send data to peers that were reachable via ipsec encapsulation, people could panic or BUG halt their systems. I've tracked the problem down to user space sending an invalid ip header over an AF_RAW socket with IP_HDRINCL set to 1. Basically what happens is that userspace sends down an ip frame that includes only the header (no data), but sets the ip header ihl value to a large number, one that is larger than the total amount of data passed to the sendmsg call. In raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr that was passed in, but assume the data is all valid. Later during ipsec encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff to provide headroom for the ipsec headers. During this operation, the skb->transport_header is repointed to a spot computed by skb->network_header + the ip header length (ihl). Since so little data was passed in relative to the value of ihl provided by the raw socket, we point transport header to an unknown location, resulting in various crashes. This fix for this is pretty straightforward, simply validate the value of of iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer size, drop the frame and return -EINVAL. I just confirmed this fixes the reported crashes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 01:09:58 -07:00
David S. Miller	f552ce5fc2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2009-10-29 01:05:38 -07:00
Jiri Bohac	d9d5283228	bonding: fix a race condition in calls to slave MII ioctls In mii monitor mode, bond_check_dev_link() calls the the ioctl handler of slave devices. It stores the ndo_do_ioctl function pointer to a static (!) ioctl variable and later uses it to call the handler with the IOCTL macro. If another thread executes bond_check_dev_link() at the same time (even with a different bond, which none of the locks prevent), a race condition occurs. If the two racing slaves have different drivers, this may result in one driver's ioctl handler being called with a pointer to a net_device controlled with a different driver, resulting in unpredictable breakage. Unless I am overlooking something, the "static" must be a copy'n'paste error (?). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-28 22:23:54 -07:00
Rusty Russell	3c7d76e371	param: fix setting arrays of bool We create a dummy struct kernel_param on the stack for parsing each array element, but we didn't initialize the flags word. This matters for arrays of type "bool", where the flag indicates if it really is an array of bools or unsigned int (old-style). Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org	2009-10-29 08:56:20 +10:30
Rusty Russell	d553ad864e	param: fix NULL comparison on oom kp->arg is always true: it's the contents of that pointer we care about. Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org	2009-10-29 08:56:18 +10:30
Rusty Russell	65afac7d80	param: fix lots of bugs with writing charp params from sysfs, by leaking mem. `e180a6b775` "param: fix charp parameters set via sysfs" fixed the case where charp parameters written via sysfs were freed, leaving drivers accessing random memory. Unfortunately, storing a flag in the kparam struct was a bad idea: it's rodata so setting it causes an oops on some archs. But that's not all: 1) module_param_array() on charp doesn't work reliably, since we use an uninitialized temporary struct kernel_param. 2) there's a fundamental race if a module uses this parameter and then it's changed: they will still access the old, freed, memory. The simplest fix (ie. for 2.6.32) is to never free the memory. This prevents all these problems, at cost of a memory leak. In practice, there are only 18 places where a charp is writable via sysfs, and all are root-only writable. Reported-by: Takashi Iwai <tiwai@suse.de> Cc: Sitsofe Wheeler <sitsofe@yahoo.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org	2009-10-29 08:56:17 +10:30
Michael S. Tsirkin	2d61ba9503	virtio: order used ring after used index read On SMP guests, reads from the ring might bypass used index reads. This causes guest crashes because host writes to used index to signal ring data readiness. Fix this by inserting rmb before used ring reads. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org	2009-10-29 08:50:37 +10:30
Michael S. Tsirkin	0b22bd0ba0	virtio-pci: fix per-vq MSI-X request logic Commit `f68d24082e` in 2.6.32-rc1 broke requesting IRQs for per-VQ MSI-X vectors: - vector number was used instead of the vector itself - we try to request an IRQ for VQ which does not have a callback handler This is a regression that causes warnings in kernel log, potentially lower performance as we need to scan vq list, and might cause system failure if the interrupt requested is in fact needed by another system. This was not noticed earlier because in most cases we were falling back on shared interrupt for all vqs. The warnings often look like this: virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X IRQ handler type mismatch for IRQ 1 current handler: i8042 Pid: 2400, comm: modprobe Tainted: G W 2.6.32-rc3-11952-gf3ed8d8-dirty #1 Call Trace: [<ffffffff81072aed>] ? __setup_irq+0x299/0x304 [<ffffffff81072ff3>] ? request_threaded_irq+0x144/0x1c1 [<ffffffff813455af>] ? vring_interrupt+0x0/0x30 [<ffffffff81346598>] ? vp_try_to_find_vqs+0x583/0x5c7 [<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net] [<ffffffff81346609>] ? vp_find_vqs+0x2d/0x83 [<ffffffff81345d00>] ? vp_get+0x3c/0x4e [<ffffffffa0016373>] ? virtnet_probe+0x2f1/0x428 [virtio_net] [<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net] [<ffffffffa00150d8>] ? skb_xmit_done+0x0/0x39 [virtio_net] [<ffffffff8110ab92>] ? sysfs_do_create_link+0xcb/0x116 [<ffffffff81345cc2>] ? vp_get_status+0x14/0x16 [<ffffffff81345464>] ? virtio_dev_probe+0xa9/0xc8 [<ffffffff8122b11c>] ? driver_probe_device+0x8d/0x128 [<ffffffff8122b206>] ? __driver_attach+0x4f/0x6f [<ffffffff8122b1b7>] ? __driver_attach+0x0/0x6f [<ffffffff8122a9f9>] ? bus_for_each_dev+0x43/0x74 [<ffffffff8122a374>] ? bus_add_driver+0xea/0x22d [<ffffffff8122b4a3>] ? driver_register+0xa7/0x111 [<ffffffffa001a000>] ? init+0x0/0xc [virtio_net] [<ffffffff81009051>] ? do_one_initcall+0x50/0x148 [<ffffffff8106e117>] ? sys_init_module+0xc5/0x21a [<ffffffff8100af02>] ? system_call_fastpath+0x16/0x1b virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-10-29 08:50:36 +10:30
Jiri Kosina	4a6cc4bd32	sched: move rq_weight data array out of .percpu Commit `34d76c41` introduced percpu array update_shares_data, size of which being proportional to NR_CPUS. Unfortunately this blows up ia64 for large NR_CPUS configuration, as ia64 allows only 64k for .percpu section. Fix this by allocating this array dynamically and keep only pointer to it percpu. The per-cpu handling doesn't impose significant performance penalty on potentially contented path in tg_shares_up(). ... ffffffff8104337c: 65 48 8b 14 25 20 cd mov %gs:0xcd20,%rdx ffffffff81043383: 00 00 ffffffff81043385: 48 c7 c0 00 e1 00 00 mov $0xe100,%rax ffffffff8104338c: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp) ffffffff81043393: 00 ffffffff81043394: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp) ffffffff8104339b: 00 ffffffff8104339c: 48 01 d0 add %rdx,%rax ffffffff8104339f: 49 8d 94 24 08 01 00 lea 0x108(%r12),%rdx ffffffff810433a6: 00 ffffffff810433a7: b9 ff ff ff ff mov $0xffffffff,%ecx ffffffff810433ac: 48 89 45 b0 mov %rax,-0x50(%rbp) ffffffff810433b0: bb 00 04 00 00 mov $0x400,%ebx ffffffff810433b5: 48 89 55 c0 mov %rdx,-0x40(%rbp) ... After: ... ffffffff8104337c: 65 8b 04 25 28 cd 00 mov %gs:0xcd28,%eax ffffffff81043383: 00 ffffffff81043384: 48 98 cltq ffffffff81043386: 49 8d bc 24 08 01 00 lea 0x108(%r12),%rdi ffffffff8104338d: 00 ffffffff8104338e: 48 8b 15 d3 7f 76 00 mov 0x767fd3(%rip),%rdx # ffffffff817ab368 <update_shares_data> ffffffff81043395: 48 8b 34 c5 00 ee 6d mov -0x7e921200(,%rax,8),%rsi ffffffff8104339c: 81 ffffffff8104339d: 48 c7 45 a0 00 00 00 movq $0x0,-0x60(%rbp) ffffffff810433a4: 00 ffffffff810433a5: b9 ff ff ff ff mov $0xffffffff,%ecx ffffffff810433aa: 48 89 7d c0 mov %rdi,-0x40(%rbp) ffffffff810433ae: 48 c7 45 a8 00 00 00 movq $0x0,-0x58(%rbp) ffffffff810433b5: 00 ffffffff810433b6: bb 00 04 00 00 mov $0x400,%ebx ffffffff810433bb: 48 01 f2 add %rsi,%rdx ffffffff810433be: 48 89 55 b0 mov %rdx,-0x50(%rbp) ... Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-10-29 00:26:00 +09:00
Jiri Kosina	403a91b165	percpu: allow pcpu_alloc() to be called with IRQs off pcpu_alloc() and pcpu_extend_area_map() perform a series of spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe with respect to being called from contexts which have IRQs off. This patch converts the code to perform save/restore of flags instead, making pcpu_alloc() (or __alloc_percpu() respectively) to be called from early kernel startup stage, where IRQs are off. This is needed for proper initialization of per-cpu rq_weight data from sched_init(). tj: added comment explaining why irqsave/restore is used in alloc path. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Tejun Heo <tj@kernel.org>	2009-10-29 00:25:59 +09:00
Michael S. Tsirkin	03f191bab7	virtio-net: fix data corruption with OOM virtio net used to unlink skbs from send queues on error, but ever since `48925e372f` we do not do this. This causes guest data corruption and crashes with vhost since net core can requeue the skb or free it without it being taken off the list. This patch fixes this by queueing the skb after successful transmit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-28 04:03:38 -07:00
Ben Hutchings	345056af41	sfc: Set ip_summed correctly for page buffers passed to GRO Page buffers containing packets with an incorrect checksum or using a protocol not handled by hardware checksum offload were previously not passed to LRO. The conversion to GRO changed this, but did not set the ip_summed value accordingly. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-28 03:43:49 -07:00
Michael Chan	d0549382da	cnic: Fix L2CTX_STATUSB_NUM offset in context memory. The BNX2_L2CTX_STATUSB_NUM definition needs to be changed to match the recent firmware update: commit `078b073588` bnx2: Update firmware to 5.0.0.j3. Without the fix, bnx2 can crash intermittently in bnx2_rx_int() when iSCSI is enabled. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-28 03:41:59 -07:00
Jens Axboe	a870a3a485	drbd: fix in_flight rw indexing Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-28 09:30:27 +01:00
Jeff Moyer	cfb1e33eed	aio: implement request batching Hi, Some workloads issue batches of small I/O, and the performance is poor due to the call to blk_run_address_space for every single iocb. Nathan Roberts pointed this out, and suggested that by deferring this call until all I/Os in the iocb array are submitted to the block layer, we can realize some impressive performance gains (up to 30% for sequential 4k reads in batches of 16). Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-28 09:29:25 +01:00
Jeff Moyer	1af60fbd75	block: get rid of the WRITE_ODIRECT flag Hi, The WRITE_ODIRECT flag is only used in one place, and that code path happens to also call blk_run_address_space. The introduction of this flag, then, could result in the device being unplugged twice for every I/O. Further, with the batching changes in the next patch, we don't want an O_DIRECT write to imply a queue unplug. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-28 09:29:25 +01:00
Jens Axboe	5869619cb5	cfq-iosched: fix style issue in cfq_get_avg_queues() Line breaks and bad brace placement. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-28 09:27:07 +01:00
Corrado Zoccolo	718eee0579	cfq-iosched: fairness for sync no-idle queues Currently no-idle queues in cfq are not serviced fairly: even if they can only dispatch a small number of requests at a time, they have to compete with idling queues to be serviced, experiencing large latencies. We should notice, instead, that no-idle queues are the ones that would benefit most from having low latency, in fact they are any of: * processes with large think times (e.g. interactive ones like file managers) * seeky (e.g. programs faulting in their code at startup) * or marked as no-idle from upper levels, to improve latencies of those requests. This patch improves the fairness and latency for those queues, by: * separating sync idle, sync no-idle and async queues in separate service_trees, for each priority * service all no-idle queues together * and idling when the last no-idle queue has been serviced, to anticipate for more no-idle work * the timeslices allotted for idle and no-idle service_trees are computed proportionally to the number of processes in each set. Servicing all no-idle queues together should have a performance boost for NCQ-capable drives, without compromising fairness. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-28 09:23:26 +01:00
Corrado Zoccolo	a6d44e982d	cfq-iosched: enable idling for last queue on priority class cfq can disable idling for queues in various circumstances. When workloads of different priorities are competing, if the higher priority queue has idling disabled, lower priority queues may steal its disk share. For example, in a scenario with an RT process performing seeky reads vs a BE process performing sequential reads, on an NCQ enabled hardware, with low_latency unset, the RT process will dispatch only the few pending requests every full slice of service for the BE process. The patch solves this issue by always performing idle on the last queue at a given priority class > idle. If the same process, or one that can pre-empt it (so at the same priority or higher), submits a new request within the idle window, the lower priority queue won't dispatch, saving the disk bandwidth for higher priority ones. Note: this doesn't touch the non_rotational + NCQ case (no hardware to test if this is a benefit in that case). Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-10-28 09:23:26 +01:00

... 2 3 4 5 6 ...

168111 commits