Commit graph

742 commits

Author SHA1 Message Date
Paul Clements
48f15b93b2 NBD: make nbd default to deadline I/O scheduler
NBD doesn't work well with CFQ (or AS) schedulers, so let's default to
something else.

The two problems I have experienced with nbd and cfq are:

1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been
   fixed

   There's a similar debian bug that has been filed as well:

   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

   There have been posts to nbd-general mailing list about problems with
   cfq and nbd also.

2) nbd performs about 10% better (the last time I tested) with deadline
   vs.  cfq (the overhead of cfq doesn't provide much advantage to nbd [not
   being a real disk], and you end up going through the I/O scheduler on
   the nbd server anyway, so it makes sense that deadline is better with
   nbd)

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:15 -08:00
Ian Campbell
597592d951 xen: Implement getgeo for Xen virtual block device.
The below implements the getgeo hook for Xen block devices. Extracted
from the xen-unstable tree where it has been used for ages.

It is useful to have because it allows things like grub2 (used by the
Debian installer images) to work in a guest domain without having to
sprinkle Xen specific hacks around the place.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-21 16:19:13 -08:00
Tony Breeds
2ebda63b09 Fix compile of swim3 as module
The current pmac32_defconfig fails to build with the following error:

  Building modules, stage 2.
ERROR: "check_media_bay" [drivers/block/swim3.ko] undefined!
WARNING: modpost: Found 23 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'
make[2]: *** [__modpost] Error 1

This patch fixes that.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 20:58:04 -08:00
Pete Zaitcev
541645be8b ub: fix up the conversion to sg_init_table()
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Cc: "Oliver Pinter" <oliver.pntr@gmail.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-09 11:08:33 -08:00
Linus Torvalds
03054de1e0 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Enhanced partition statistics: documentation update
  Enhanced partition statistics: remove old partition statistics
  Enhanced partition statistics: procfs
  Enhanced partition statistics: sysfs
  Enhanced partition statistics: aoe fix
  Enhanced partition statistics: update partition statitics
  Enhanced partition statistics: core statistics
  block: fixup rq_init() a bit

Manually fixed conflict in drivers/block/aoe/aoecmd.c due to statistics
support.
2008-02-08 09:42:46 -08:00
Paul Clements
20a8143eaa NBD: remove limit on max number of nbd devices
Remove the arbitrary 128 device limit for NBD.  nbds_max can now be set to
any number.  In certain scenarios where devices are used sparsely we have
run into the 128 device limit.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:41 -08:00
Andrew Morton
476aed3870 aoe: statically initialise devlist_lock
I guess aoedev_init() can go away now.

Cc: Greg KH <greg@kroah.com>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
52e112b3ab aoe: update copyright date
Update the year in the copyright notices.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
578c4aa0b4 aoe: make error messages more specific
Andrew Morton pointed out that the "too many targets" message in patch 2 could
be printed for failing GFP_ATOMIC allocations.  This patch makes the messages
more specific.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
1d75981a80 aoe: the aoeminor doesn't need a long format
The aoedev aoeminor member doesn't need a long format.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
7df620d852 aoe: add module parameter for users who need more outstanding I/O
An AoE target provides an estimate of the number of outstanding commands that
the AoE initiator can send before getting a response.  The aoe_maxout
parameter provides a way to set an even lower limit.  It will not allow a user
to use more outstanding commands than the target permits.  If a user discovers
a problem with a large setting, this parameter provides a way for us to work
with them to debug the problem.  We expect to improve the dynamic window
sizing algorithm and drop this parameter.  For the time being, it is a
debugging aid.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
6b9699bbd2 aoe: only install new AoE device once
An aoe driver user who had about 70 AoE targets found that he was hitting a
BUG in sysfs_create_file because the aoe driver was trying to tell the kernel
about an AoE device more than once.  Each AoE device was reachable by several
local network interfaces, and multiple ATA device indentify responses were
returning from that single device.

This patch eliminates a race condition so that aoe always informs the block
layer of a new AoE device once in the presence of multiple incoming ATA device
identify responses.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
9bb237b6a6 aoe: dynamically allocate a capped number of skbs when necessary
What this Patch Does

  Even before this recent series of 12 patches to 2.6.22-rc4, the aoe
  driver was reusing a small set of skbs that were allocated once and
  were only used for outbound AoE commands.

  The network layer cannot be allowed to put_page on the data that is
  still associated with a bio we haven't returned to the block layer,
  so the aoe driver (even before the patch under discussion) is still
  the owner of skbs that have been handed to the network layer for
  transmission.  We need to keep track of these skbs so that we can
  free them, but by tracking them, we can also easily re-use them.

  The new patch was a response to the behavior of certain network
  drivers.  We cannot reuse an skb that the network driver still has
  in its transmit ring.  Network drivers can defer transmit ring
  cleanup and then use the state in the skb to determine how many data
  segments to clean up in its transmit ring.  The tg3 driver is one
  driver that behaves in this way.

  When the network driver defers cleanup of its transmit ring, the aoe
  driver can find itself in a situation where it would like to send an
  AoE command, and the AoE target is ready for more work, but the
  network driver still has all of the pre-allocated skbs.  In that
  case, the new patch just calls alloc_skb, as you'd expect.

  We don't want to get carried away, though.  We try not to do
  excessive allocation in the write path, so we cap the number of skbs
  we dynamically allocate.

  Probably calling it a "dynamic pool" is misleading.  We were already
  trying to use a small fixed-size set of pre-allocated skbs before
  this patch, and this patch just provides a little headroom (with a
  ceiling, though) to accomodate network drivers that hang onto skbs,
  by allocating when needed.  The d->skbpool_hd list of allocated skbs
  is necessary so that we can free them later.

  We didn't notice the need for this headroom until AoE targets got
  fast enough.

Alternatives

  If the network layer never did a put_page on the pages in the bio's
  we get from the block layer, then it would be possible for us to
  hand skbs to the network layer and forget about them, allowing the
  network layer to free skbs itself (and thereby calling our own
  skb->destructor callback function if we needed that).  In that case
  we could get rid of the pre-allocated skbs and also the
  d->skbpool_hd, instead just calling alloc_skb every time we wanted
  to transmit a packet.  The slab allocator would effectively maintain
  the list of skbs.

  Besides a loss of CPU cache locality, the main concern with that
  approach the danger that it would increase the likelihood of
  deadlock when VM is trying to free pages by writing dirty data from
  the page cache through the aoe driver out to persistent storage on
  an AoE device.  Right now we have a situation where we have
  pre-allocation that corresponds to how much we use, which seems
  ideal.

  Of course, there's still the separate issue of receiving the packets
  that tell us that a write has successfully completed on the AoE
  target.  When memory is low and VM is using AoE to flush dirty data
  to free up pages, it would be perfect if there were a way for us to
  register a fast callback that could recognize write command
  completion responses.  But I don't think the current problems with
  the receive side of the situation are a justification for
  exacerbating the problem on the transmit side.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
Ed L. Cashin
262bf54144 aoe: user can ask driver to forget previously detected devices
When an AoE device is detected, the kernel is informed, and a new block device
is created.  If the device is unused, the block device corresponding to remote
device that is no longer available may be removed from the system by telling
the aoe driver to "flush" its list of devices.

Without this patch, software like GPFS and LVM may attempt to read from AoE
devices that were discovered earlier but are no longer present, blocking until
the I/O attempt times out.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Ed L. Cashin
cf446f0dba aoe: eliminate goto and improve readability
Adam Richter suggested eliminating this goto.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Ed L. Cashin
1eb0da4cea aoe: mac_addr: avoid 64-bit arch compiler warnings
By returning unsigned long long, mac_addr does not generate compiler warnings
on 64-bit architectures.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Ed L. Cashin
68e0d42f39 aoe: handle multiple network paths to AoE device
A remote AoE device is something can process ATA commands and is identified by
an AoE shelf number and an AoE slot number.  Such a device might have more
than one network interface, and it might be reachable by more than one local
network interface.  This patch tracks the available network paths available to
each AoE device, allowing them to be used more efficiently.

Andrew Morton asked about the call to msleep_interruptible in the revalidate
function.  Yes, if a signal is pending, then msleep_interruptible will not
return 0.  That means we will not loop but will call aoenet_xmit with a NULL
skb, which is a noop.  If the system is too low on memory or the aoe driver is
too low on frames, then the user can hit control-C to interrupt the attempt to
do a revalidate.  I have added a comment to the code summarizing that.

Andrew Morton asked whether the allocation performed inside addtgt could use a
more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev
lock has been locked with spin_lock_irqsave.  It would be nice to allocate the
memory under fewer restrictions, but targets are only added when the device is
being discovered, and if the target can't be added right now, we can try again
in a minute when then next AoE config query broadcast goes out.

Andrew Morton pointed out that the "too many targets" message could be printed
for failing GFP_ATOMIC allocations.  The last patch in this series makes the
messages more specific.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Ed L. Cashin
8911ef4dc9 aoe: bring driver version number to 47
Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Nick Piggin
75acb9cd2e rd: support XIP
Support direct_access XIP method with brd.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:30 -08:00
Nick Piggin
9db5579be4 rewrite rd
This is a rewrite of the ramdisk block device driver.

The old one is really difficult because it effectively implements a block
device which serves data out of its own buffer cache.  It relies on the dirty
bit being set, to pin its backing store in cache, however there are non
trivial paths which can clear the dirty bit (eg.  try_to_free_buffers()),
which had recently lead to data corruption.  And in general it is completely
wrong for a block device driver to do this.

The new one is more like a regular block device driver.  It has no idea about
vm/vfs stuff.  It's backing store is similar to the buffer cache (a simple
radix-tree of pages), but it doesn't know anything about page cache (the pages
in the radix tree are not pagecache pages).

There is one slight downside -- direct block device access and filesystem
metadata access goes through an extra copy and gets stored in RAM twice.
However, this downside is only slight, because the real buffercache of the
device is now reclaimable (because we're not playing crazy games with it), so
under memory intensive situations, footprint should effectively be the same --
maybe even a slight advantage to the new driver because it can also reclaim
buffer heads.

The fact that it now goes through all the regular vm/fs paths makes it
much more useful for testing, too.

   text    data     bss     dec     hex filename
   2837     849     384    4070     fe6 drivers/block/rd.o
   3528     371      12    3911     f47 drivers/block/brd.o

Text is larger, but data and bss are smaller, making total size smaller.

A few other nice things about it:
- Similar structure and layout to the new loop device handlinag.
- Dynamic ramdisk creation.
- Runtime flexible buffer head size (because it is no longer part of the
  ramdisk code).
- Boot / load time flexible ramdisk size, which could easily be extended
  to a per-ramdisk runtime changeable size (eg. with an ioctl).
- Can use highmem for the backing store.

[akpm@linux-foundation.org: fix build]
[byron.bbradley@gmail.com: make rd_size non-static]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Byron Bradley <byron.bbradley@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:30 -08:00
Jerome Marchand
a890d62b9e Enhanced partition statistics: aoe fix
Updates the enhanced partition statistics in ATA over Ethernet driver
(not tested).

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2008-02-08 12:41:57 +01:00
Linus Torvalds
3796958130 Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (69 commits)
  [POWERPC] Add SPE registers to core dumps
  [POWERPC] Use regset code for compat PTRACE_*REGS* calls
  [POWERPC] Use generic compat_sys_ptrace
  [POWERPC] Use generic compat_ptrace_request
  [POWERPC] Use generic ptrace peekdata/pokedata
  [POWERPC] Use regset code for PTRACE_*REGS* requests
  [POWERPC] Switch to generic compat_binfmt_elf code
  [POWERPC] Switch to using user_regset-based core dumps
  [POWERPC] Add user_regset compat support
  [POWERPC] Add user_regset_view definitions
  [POWERPC] Use user_regset accessors for GPRs
  [POWERPC] ptrace accessors for special regs MSR and TRAP
  [POWERPC] Use user_regset accessors for SPE regs
  [POWERPC] Use user_regset accessors for altivec regs
  [POWERPC] Use user_regset accessors for FP regs
  [POWERPC] mpc52xx: fix compile error introduce when rebasing patch
  [POWERPC] 4xx: PCIe indirect DCR spinlock fix.
  [POWERPC] Add missing native dcr dcr_ind_lock spinlock
  [POWERPC] 4xx: Fix offset value on Warp board
  [POWERPC] 4xx: Add 440EPx Sequoia ehci dts entry
  ...
2008-02-07 09:02:26 -08:00
Josh Boyer
256ae6a720 Merge branch 'virtex-for-2.6.25' of git://git.secretlab.ca/git/linux-2.6-virtex into for-2.6.25 2008-02-06 21:06:45 -06:00
Geert Uytterhoeven
5ceadd2a2a Atari floppy: Rename disk_type to atari_disk_type
Commit edfaa7c365

    Driver core: convert block from raw kobjects to core devices

    This moves the block devices to /sys/class/block. It will create a
    flat list of all block devices, with the disks and partitions in one
    directory. For compatibility /sys/block is created and contains symlinks
    to the disks.

introduced a global disk_type variable in <linux/genhd.h>, causing the
following compile error on Atari:

    drivers/block/ataflop.c:93: error: conflicting types for 'disk_type'
    include/linux/genhd.h:21: error: previous declaration of 'disk_type' was here

Rename the local disk_type variable in drivers/block/ataflop.c to
atari_disk_type, to avoid the conflict.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:10 -08:00
Randy Dunlap
582539e5a0 cciss: use upper_32_bits() macro to eliminate warnings
Use upper_32_bits(x) macro to handle shifts that may be >= the width of
the data type.

drivers/block/cciss.c: In function 'do_cciss_request':
drivers/block/cciss.c:2655: warning: right shift count >= width of type
drivers/block/cciss.c:2656: warning: right shift count >= width of type
drivers/block/cciss.c:2657: warning: right shift count >= width of type
drivers/block/cciss.c:2658: warning: right shift count >= width of type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:03 -08:00
Robert P. J. Day
67a3b2b6ce rd: use is_power_of_2() in drivers/block/rd.c.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:03 -08:00
David Woodhouse
96c5865559 Allow auto-destruction of loop devices
This allows a flag to be set on loop devices so that when they are
closed for the last time, they'll self-destruct.

In general, so that we can automatically allocate loop devices (as with
losetup -f) and have them disappear when we're done with them.

In particular, right now, so that we can stop relying on the hackish
special-case in umount(8) which kills off loop devices which were set up by
'mount -oloop'.  That means we can stop putting crap in /etc/mtab which
doesn't belong there, which means it can be a symlink to /proc/mounts, which
means yet another writable file on the root filesystem is eliminated and the
'stateless' folks get happier...  and OLPC trac #356 can be closed.

The mount(8) side of that is at
http://marc.info/?l=util-linux-ng&m=119362955431694&w=2

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Bernardo Innocenti <bernie@codewiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:01 -08:00
Alexey Dobriyan
eaa0ff15c3 fix ! versus & precedence in various places
Fix various instances of

	if (!expr & mask)

which should probably have been

	if (!(expr & mask))

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:40:59 -08:00
Stephen Neuendorffer
0e349b0e2d [POWERPC] Xilinx: Update compatible to use values generated by BSP generator.
Mainly, this involves two changes:
1) xilinx->xlnx (recognized standard is to use the stock ticker)
2) In order to have the device tree focus on describing what the
hardware is as exactly as possible, the compatible strings contain the
full IP name and IP version.

Signed-off-by: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-02-06 10:23:21 -07:00
Grant Likely
911a317599 [POWERPC] Fix incorrectly tagged __devinitdata structures
Fix compile errors in the xilinxfb, xsysace and uartlite drivers used
by the Xilinx Virtex platform

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
2008-02-06 10:23:12 -07:00
Linus Torvalds
93890b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (25 commits)
  virtio: balloon driver
  virtio: Use PCI revision field to indicate virtio PCI ABI version
  virtio: PCI device
  virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
  virtio_blk: Dont waste major numbers
  virtio_blk: provide getgeo
  virtio_net: parametrize the napi_weight for virtio receive queue.
  virtio: free transmit skbs when notified, not on next xmit.
  virtio: flush buffers on open
  virtnet: remove double ether_setup
  virtio: Allow virtio to be modular and used by modules
  virtio: Use the sg_phys convenience function.
  virtio: Put the virtio under the virtualization menu
  virtio: handle interrupts after callbacks turned off
  virtio: reset function
  virtio: populate network rings in the probe routine, not open
  virtio: Tweak virtio_net defines
  virtio: Net header needs hdr_len
  virtio: remove unused id field from struct virtio_blk_outhdr
  virtio: clarify NO_NOTIFY flag usage
  ...
2008-02-04 08:00:54 -08:00
Christian Borntraeger
d50ed907dc virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
Am Freitag, 1. Februar 2008 schrieb Christian Borntraeger:
> Right. I will fix that with an additional patch.

This patch goes on top of the minor number patch. Please let me know if
you want a merged patch:

Currently virtio_blk creates the disk name combinging "vd"  with 'a'++.
This will give strange names after vdz. I have implemented names up to
vdzzz - inspired by the sd.c code. That should be sufficient for now.

There is one driver in the kernel (driver/s390/block/dasd_genhd.c) that
implements names from dasda-dasdzzzz allowing even more disks. Maybe
a janitor can come up with a common implementation usable for all kind
of block device drivers.

I have tested this patch with 100 disks - seems to work.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:11 +11:00
Christian Borntraeger
4f3bf19c6e virtio_blk: Dont waste major numbers
Rusty,

currently virtio_blk uses one major number per device. While this works
quite well on most systems it is wasteful and will exhaust major numbers
on larger installations.

This patch allocates a major number on init and will use 16 minor numbers
for each disk. That will allow ~64k virtio_blk disks.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:10 +11:00
Christian Borntraeger
135da0b037 virtio_blk: provide getgeo
Rusty,

I currently try to make my guest boot from an virtio root device
without having an external kernel. Some of the tools that I tried
expect HDIO_GETGEO to work. The most interesting value is likely
the geo.start value to get the offset of a partition. This value
is filled by block/ioctl.c if fops->getgeo is set. This patch also
fills in some standard values for heads, sectors and cylinders.

Makes sense?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:09 +11:00
Anthony Liguori
0ad07ec1fd virtio: Put the virtio under the virtualization menu
This patch moves virtio under the virtualization menu and changes virtio
devices to not claim to only be for lguest.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:05 +11:00
Rusty Russell
6e5aa7efb2 virtio: reset function
A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell
18445c4d50 virtio: explicit enable_cb/disable_cb rather than callback return.
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:58 +11:00
Rusty Russell
a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Joe Perches
f66083c376 drivers/block/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:09:38 +02:00
Pete Zaitcev
eedffd12e0 USB: Remove unnecessary zeroing from ub
These zeroings were taken from usb-storage long time ago. I examined
the submission paths and usb_fill_bulk_urb and found them unnecessary.

Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-01 14:34:47 -08:00
Adrian Bunk
e30f98fcac block/sunvdc.c:print_version() must be __devinit
This patch fixes the following section mismatches:

<--  snip  -->

...
WARNING: drivers/block/sunvdc.o(.text+0xf0): Section mismatch in reference from the function print_version() to the variable .devinit.data:version
WARNING: drivers/block/sunvdc.o(.text+0xf8): Section mismatch in reference from the function print_version() to the variable .devinit.data:version
...

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:26:32 +01:00
Jens Axboe
e7d9dc9cfd cciss: fix bug in overriding ->data_len before completion
For BLOCK_PC requests, we need that length for completing the request.
Andrew Vasquez <andrew.vasquez@qlogic.com> reported the following
oops

Hitting a consistent BUG() with recent Linus' linux-2.6.git:

	[   12.941428] ------------[ cut here ]------------
	[   12.944874] kernel BUG at drivers/block/cciss.c:1260!
	[   12.944874] invalid opcode: 0000 [1] SMP
	[   12.944874] CPU 0
	[   12.944874] Modules linked in:
	[   12.944874] Pid: 0, comm: swapper Not tainted 2.6.24 #43
	[   12.944874] RIP: 0010:[<ffffffff8039e43d>]  [<ffffffff8039e43d>] cciss_softirq_done+0xbc/0x1bf
	[   12.944874] RSP: 0018:ffffffff8063aed0  EFLAGS: 00010202
	[   12.944874] RAX: 0000000000000001 RBX: ffff8100cf800010 RCX: ffff81042f1253b0
	[   12.944874] RDX: ffff81042de398f0 RSI: ffff81042de398f0 RDI: 0000000000000001
	[   12.944874] RBP: ffff81042daa0000 R08: ffff81042f1253b0 R09: 0000000000000001
	[   12.944874] R10: 00000000000000fe R11: 0000000000000000 R12: 0000000000000002
	[   12.944874] R13: 0000000000000001 R14: ffff8100cf800000 R15: ffff81042de398f0
	[   12.944874] FS:  0000000000000000(0000) GS:ffffffff805bb000(0000) knlGS:0000000000000000
	[   12.944874] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
	[   12.944874] CR2: 00002afed7eea340 CR3: 000000042dbba000 CR4: 00000000000006e0
	[   12.944874] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	[   12.944874] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
	[   12.944874] Process swapper (pid: 0, threadinfo ffffffff805f4000, task ffffffff805624a0)
	[   12.944874] Stack:  0000000000000000 ffffffff8063af10 0000000000000001 ffffffff80632d60
	[   12.944874]  0000000000000000 000000000000000a ffffffff805bb900 ffffffff8032038f
	[   12.944874]  ffffffff8063af10 ffffffff8063af10 ffffffff805bb940 ffffffff802346b4
	[   12.944874] Call Trace:
	[   12.944874]  <IRQ>  [<ffffffff8032038f>] blk_done_softirq+0x69/0x78
	[   12.944874]  [<ffffffff802346b4>] __do_softirq+0x6f/0xd8
	[   12.944874]  [<ffffffff8020c45c>] call_softirq+0x1c/0x30
	[   12.944874]  [<ffffffff8020e347>] do_softirq+0x30/0x80
	[   12.944874]  [<ffffffff8020e409>] do_IRQ+0x72/0xd9
	[   12.944874]  [<ffffffff8020a50a>] mwait_idle+0x0/0x46
	[   12.944874]  [<ffffffff8020a3da>] default_idle+0x0/0x3d
	[   12.944874]  [<ffffffff8020b7e1>] ret_from_intr+0x0/0xa
	[   12.944874]  <EOI>  [<ffffffff8020a54c>] mwait_idle+0x42/0x46
	[   12.944874]  [<ffffffff8020a481>] cpu_idle+0x6a/0xae
	[   12.944874]
	[   12.944874]
	[   12.944874] Code: 0f 0b eb fe 48 8d 85 d8 c0 00 00 48 89 04 24 48 89 c7 e8 e5
	[   12.944874] RIP  [<ffffffff8039e43d>] cciss_softirq_done+0xbc/0x1bf
	[   12.944874]  RSP <ffffffff8063aed0>
	[   12.944903] ---[ end trace e9c631603f90d22f ]---

which is caused by blk_end_request() returning 'not done' for a request,
since it gets asked to complete zero bytes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:18 +01:00
Jens Axboe
9bf722598f xsysace: end request handling fix
In ace_fsm_dostate(), the variable 'i' was used only for passing
sector size of the request to end_that_request_first().
So I removed it and changed the code to pass the size in bytes
directly to __blk_end_request()

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:54:53 +01:00
Linus Torvalds
e189f3495c Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (197 commits)
  sh: add spi header and r2d platform data V3
  sh: update r7780rp interrupt code
  sh: remove consistent alloc stuff from the machine vector
  sh: use declared coherent memory for dreamcast pci ethernet adapter
  sh: declared coherent memory support V2
  sh: Add support for SDK7780 board.
  sh: constify function pointer tables
  sh: Kill off -traditional for linker script.
  cdrom: Add support for Sega Dreamcast GD-ROM.
  sh: Kill off hs7751rvoip reference from arch/sh/Kconfig.
  sh: Drop r7780rp_defconfig, use r7780mp_defconfig as kbuild default.
  sh: Kill off dead HS771RVoIP board support.
  sh: r7785rp: Fix up DECLARE_INTC_DESC() arg mismatch.
  sh: r7785rp: Hook up the rest of the HL7785 FPGA IRQ vectors.
  sh: r2d - enable sm501 usb host function
  sh: remove voyagergx
  sh: r2d - add lcd planel timings to sm501 platform data
  sh: Add OHCI and UDC platform devices for SH7720.
  sh: intc - remove default interrupt priority tables
  sh: Correct pte size mismatch for X2 TLB.
  ...
2008-01-29 08:52:50 +11:00
Kiyoshi Ueda
a65b58663d blk_end_request: changing xsysace (take 4)
This patch converts xsysace to use blk_end_request interfaces.
Related 'uptodate' arguments are converted to 'error'.

xsysace is a little bit different from "normal" drivers.
xsysace driver has a state machine in it.
It calls end_that_request_first() and end_that_request_last()
from different states. (ACE_FSM_STATE_REQ_TRANSFER and
ACE_FSM_STATE_REQ_COMPLETE, respectively.)

However, those states are consecutive and without any interruption
inbetween.
So we can just follow the standard conversion rule (b) mentioned in
the patch subject "[PATCH 01/30] blk_end_request: add new request
completion interface".

Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:20 +01:00
Kiyoshi Ueda
7d699bafe2 blk_end_request: changing ub (take 4)
This patch converts ub to use blk_end_request interfaces.
Related 'uptodate' arguments are converted to 'error'.

Cc: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:17 +01:00
Kiyoshi Ueda
ea6f06f416 blk_end_request: changing cpqarray (take 4)
This patch converts cpqarray to use blk_end_request interfaces.
Related 'ok' arguments are converted to 'error'.

cpqarray is a little bit different from "normal" drivers.
cpqarray directly calls bio_endio() and disk_stat_add()
when completing request.  But those can be replaced with
__end_that_request_first().
After the replacement, request completion procedures of
those drivers become like the following:
    o end_that_request_first()
    o add_disk_randomness()
    o end_that_request_last()
This can be converted to __blk_end_request() by following
the rule (b) mentioned in the patch subject
"[PATCH 01/30] blk_end_request: add new request completion interface".

Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:00 +01:00
Kiyoshi Ueda
3daeea29f9 blk_end_request: changing cciss (take 4)
This patch converts cciss to use blk_end_request interfaces.
Related 'uptodate' arguments are converted to 'error'.

cciss is a little bit different from "normal" drivers.
cciss directly calls bio_endio() and disk_stat_add()
when completing request.  But those can be replaced with
__end_that_request_first().
After the replacement, request completion procedures of
those drivers become like the following:
    o end_that_request_first()
    o add_disk_randomness()
    o end_that_request_last()
This can be converted to blk_end_request() by following
the rule (a) mentioned in the patch subject
"[PATCH 01/30] blk_end_request: add new request completion interface".

Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:36:58 +01:00
Kiyoshi Ueda
f530f03637 blk_end_request: changing xen-blkfront (take 4)
This patch converts xen-blkfront to use blk_end_request interfaces.
Related 'uptodate' arguments are converted to 'error'.

Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:36:46 +01:00
Kiyoshi Ueda
b2aec24ea4 blk_end_request: changing viodasd (take 4)
This patch converts viodasd to use blk_end_request interfaces.
Related 'uptodate' arguments are converted to 'error'.

As a result, the interface of internal function, viodasd_end_request(),
is changed.

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:36:44 +01:00