Commit Graph

2347 Commits (f935f3f8a567d3d2531886e901ed0db183092abe)

Author SHA1 Message Date
Lukas Czerner dfaf3c036c loop: Fix discard_alignment default setting
discard_alignment is not relevant to the loop driver since it is
supposed to be set as a workaround for the old sector 63 alignments.
So set it to zero rather than block size.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-12-02 14:47:03 +01:00
Stephen M. Cameron 59bd71a81b cciss: fix flush cache transfer length
We weren't filling in the transfer length of the
flush cache command (it transfers 4 bytes of zeroes).
Firmware didn't seem to be bothered by this, but it
should be fixed.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-28 20:12:05 +01:00
Stephen M. Cameron 6225da4815 cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler
IRQF_SHARED is required for older controllers that don't support MSI(X)
and which may end up sharing an interrupt.

Also remove deprecated IRQF_DISABLED.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-28 20:12:05 +01:00
Dave Young ae95757a90 loop: fix loop block driver discard and encryption comment
The loop driver does not support discard if encryption is enabled,
fix the comment.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-25 09:41:25 +01:00
Dan Carpenter 3e54a3d1b8 mtip32xx: uninitialized variable in mtip_quiesce_io()
We recently introduce new continue in the loop which make gcc complain.
In theory if MTIP_FLAG_SVC_THD_ACTIVE_BIT is set, we could hit continue
over and over until eventually we time out of the loop.  In that case
"active" should be set as true, but right now it's uninitialized.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-24 12:59:00 +01:00
Asai Thambi S P 60ec0eecfa mtip32xx: updates based on feedback
* queue ncq commands when a non-ncq is in progress or error handling is active
* merge variables 'internal_cmd_in_progress' and 'eh_active' into new variable 'flags'
* get rid of read/write semaphore 'internal_sem'
* new service thread to issue queued commands
* use macros from ata.h for command codes
* return ENOTTY for BLKFLSBUF ioctl
* style changes

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-23 08:29:24 +01:00
Li Dongyang ae18be11b5 xen-blkback: convert hole punching to discard request on loop devices
As of dfaa2ef68e, loop devices support
discard request now. We could just issue a discard request, and
the loop driver will punch the hole for us, so we don't need to touch
the internals of loop device and punch the hole ourselves, Thanks.

V0->V1: rebased on devel/for-jens-3.3

Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18 13:28:05 -05:00
Konrad Rzeszutek Wilk 421463526f xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io
.. and move it to its own function that will deal with the
discard operation.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18 13:28:03 -05:00
Konrad Rzeszutek Wilk 5ea4298669 xen/blk[front|back]: Enhance discard support with secure erasing support.
Part of the blkdev_issue_discard(xx) operation is that it can also
issue a secure discard operation that will permanantly remove the
sectors in question. We advertise that we can support that via the
'discard-secure' attribute and on the request, if the 'secure' bit
is set, we will attempt to pass in REQ_DISCARD | REQ_SECURE.

CC: Li Dongyang <lidongyang@novell.com>
[v1: Used 'flag' instead of 'secure:1' bit]
[v2: Use 'reserved' uint8_t instead of adding a new value]
[v3: Check for nseg when mapping instead of operation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18 13:28:01 -05:00
Konrad Rzeszutek Wilk 97e36834f5 xen/blk[front|back]: Squash blkif_request_rw and blkif_request_discard together
In a union type structure to deal with the overlapping
attributes in a easier manner.

Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18 13:27:59 -05:00
Dan Carpenter a2c2a0e668 paride: fix potential information leak in pg_read()
Smatch has a new check for Rosenberg type information leaks where structs
are copied to the user with uninitialized stack data in them.  i In this
case, the pg_write_hdr struct has a hole in it.

struct pg_write_hdr {
        char                       magic;                /*     0     1 */
        char                       func;                 /*     1     1 */
        /* XXX 2 bytes hole, try to pack */
        int                        dlen;                 /*     4     4 */

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tim Waugh <tim@cyberelk.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-16 09:21:50 +01:00
Stephen M. Cameron 0007a4c90a cciss: auto engage SCSI mid layer at driver load time
A long time ago, probably in 2002, one of the distros, or maybe more than
one, loaded block drivers prior to loading the SCSI mid layer.  This meant
that the cciss driver, being a block driver, could not engage the SCSI mid
layer at init time without panicking, and relied on being poked by a
userland program after the system was up (and the SCSI mid layer was
therefore present) to engage the SCSI mid layer.

This is no longer the case, and cciss can safely rely on the SCSI mid
layer being present at init time and engage the SCSI mid layer straight
away.  This means that users will see their tape drives and medium
changers at driver load time without need for a script in /etc/rc.d that
does this:

for x in /proc/driver/cciss/cciss*
do
	echo "engage scsi" > $x
done

However, if no tape drives or medium changers are detected, the SCSI mid
layer will not be engaged.  If a tape drive or medium change is later
hot-added to the system it will then be necessary to use the above script
or similar for the device(s) to be acceesible.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-16 09:21:49 +01:00
Dmitry Monakhov 7035b5df3c loop: cleanup set_status interface
1) Anyone who has read access to loopdev has permission to call set_status
   and may change important parameters such as lo_offset, lo_sizelimit and
   so on, which contradicts to read access pattern and definitely equals
   to write access pattern.
2) Add lo_offset over i_size check to prevent blkdev_size overflow.
   ##Testcase_bagin
   #dd if=/dev/zero of=./file bs=1k count=1
   #losetup /dev/loop0 ./file
   /* userspace_application */
   struct loop_info64 loinf;
   fd = open("/dev/loop0", O_RDONLY);
   ioctl(fd, LOOP_GET_STATUS64, &loinf);
   /* Set offset to any value which is bigger than i_size, and sizelimit
    * to nonzero value*/
   loinf.lo_offset = 4096*1024;
   loinf.lo_sizelimit = 1024;
   ioctl(fd, LOOP_SET_STATUS64, &loinf);
   /* After this loop device will have size similar to 0x7fffffffffxxxx */
   #blockdev --getsz /dev/loop0
   ##OUTPUT: 36028797018955968
   ##Testcase_end

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-16 09:21:49 +01:00
Dmitry Monakhov 3bb9068278 loop: prevent information leak after failed read
If read was not fully successful we have to fail whole bio to prevent
information leak of old pages

##Testcase_begin
dd if=/dev/zero of=./file bs=1M count=1
losetup /dev/loop0 ./file -o 4096
truncate -s 0 ./file
# OOps loop offset is now beyond i_size, so read will silently fail.
# So bio's pages would not be cleared, may which result in information leak.
hexdump -C /dev/loop0
##testcase_end

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-16 09:21:48 +01:00
Matthew Garrett 1937335856 The Windows driver .inf disables ASPM on all cciss devices. Do the same.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: iss_storagedev@hp.com
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-11 22:05:54 +01:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds 06d381484f Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  net: xen-netback: use API provided by xenbus module to map rings
  block: xen-blkback: use API provided by xenbus module to map rings
  xen: use generic functions instead of xen_{alloc, free}_vm_area()
2011-11-06 18:31:36 -08:00
Jens Axboe a71f483d79 mtip32xx: update to new ->make_request() API
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-05 08:36:21 +01:00
Jens Axboe 0e838c624e mtip32xx: add module.h include to avoid conflict with moduleh tree
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-05 08:35:10 +01:00
Jens Axboe 3ff147d3a8 mtip32xx: mark a few more items static
Missed two items: mtip_major, and mtip_pci_driver.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-05 08:35:10 +01:00
Jens Axboe 6316668fbc mtip32xx: ensure that all local functions are static
Kill the declarations in the header file and mark them as static.
Reshuffle a few functions to ensure that everything is properly
declared before being used.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-05 08:35:10 +01:00
Jens Axboe ef0f158734 mtip32xx: cleanup compat ioctl handling
Do the conversion/copy up front instead of passing in a compat flag
to the ioctl handler and subsequently to the exec_drive_taskfile()
function.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-05 08:35:10 +01:00
Jens Axboe 16d02c040b mtip32xx: fix warnings/errors on 32-bit compiles
We need to clean up the compat ioctl handling, but this makes it
work for now at least.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-11-05 08:35:10 +01:00
Sam Bradshaw 88523a6155 block: Add driver for Micron RealSSD pcie flash cards
This adds mtip32xx, a driver supporting Microns line of
pci-express flash storage cards.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-11-05 08:35:10 +01:00
Linus Torvalds 3d0a8d10cf Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
  virtio-blk: use ida to allocate disk index
  hpsa: add small delay when using PCI Power Management to reset for kump
  cciss: add small delay when using PCI Power Management to reset for kump
  xen/blkback: Fix two races in the handling of barrier requests.
  xen/blkback: Check for proper operation.
  xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
  xen/blkback: Report VBD_WSECT (wr_sect) properly.
  xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
  xen-blkfront: plug device number leak in xlblk_init() error path
  xen-blkfront: If no barrier or flush is supported, use invalid operation.
  xen-blkback: use kzalloc() in favor of kmalloc()+memset()
  xen-blkback: fixed indentation and comments
  xen-blkfront: fix a deadlock while handling discard response
  xen-blkfront: Handle discard requests.
  xen-blkback: Implement discard requests ('feature-discard')
  xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
  drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
  drivers/block/loop.c: emit uevent on auto release
  drivers/block/cpqarray.c: use pci_dev->revision
  loop: always allow userspace partitions and optionally support automatic scanning
  ...

Fic up trivial header file includsion conflict in drivers/block/loop.c
2011-11-04 17:22:14 -07:00
Linus Torvalds b4fdcb02f1 Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
  block: don't call blk_drain_queue() if elevator is not up
  blk-throttle: use queue_is_locked() instead of lockdep_is_held()
  blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
  blk-throttle: Free up policy node associated with deleted rule
  block: warn if tag is greater than real_max_depth.
  block: make gendisk hold a reference to its queue
  blk-flush: move the queue kick into
  blk-flush: fix invalid BUG_ON in blk_insert_flush
  block: Remove the control of complete cpu from bio.
  block: fix a typo in the blk-cgroup.h file
  block: initialize the bounce pool if high memory may be added later
  block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
  block: drop @tsk from attempt_plug_merge() and explain sync rules
  block: make get_request[_wait]() fail if queue is dead
  block: reorganize throtl_get_tg() and blk_throtl_bio()
  block: reorganize queue draining
  block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
  block: pass around REQ_* flags instead of broken down booleans during request alloc/free
  block: move blk_throtl prototypes to block/blk.h
  block: fix genhd refcounting in blkio_policy_parse_and_set()
  ...

Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
 - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
 - drivers/staging/zram/zram_drv.c
2011-11-04 17:06:58 -07:00
Matthew Wilcox f1938f6e1e NVMe: Implement doorbell stride capability
The doorbell stride allows devices to spread out their doorbells instead
of packing them tightly.  This feature was added as part of ECN 003.

This patch also enables support for more than 512 queues :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:05 -04:00
Matthew Wilcox ce38c14957 NVMe: Version 0.7
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:05 -04:00
Matthew Wilcox 2b2c189687 NVMe: Don't probe namespace 0
ECN 001 documented that namespace 0 is not valid.  Sending an Identify
with CNS of 0 and Namespace of 0 is an undefined command.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Nisheeth Bhat 0d1bc91258 Fix calculation of number of pages in a PRP List
The existing calculation underestimated the number of pages required
as it did not take into account the pointer at the end of each page.
The replacement calculation may overestimate the number of pages required
if the last page in the PRP List is entirely full.  By using ->npages
as a counter as we fill in the pages, we ensure that we don't try to
free a page that was never allocated.

Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox bc5fc7e4b2 NVMe: Create nvme_identify and nvme_get_features functions
Instead of open-coding calls to nvme_submit_admin_cmd, these
small wrappers are simpler to use (the patch removes 14 lines from
nvme_dev_add() for example).

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox 684f5c2025 NVMe: Fix memory leak in nvme_dev_add()
The driver was allocating 8k of memory, then freeing 4k of it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Nisheeth Bhat d1a490e026 NVMe: Fix calls to dma_unmap_sg
dma_unmap_sg() must be called with the same 'nents' passed to
dma_map_sg(), not the number returned from dma_map_sg().

Signed-off-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox d0ba1e497b NVMe: Correct sg list setup in nvme_map_user_pages
Our SG list was constructed to always fill the entire first page, even
if that was more than the length of the I/O.  This is probably harmless,
but some IOMMUs might do something bad.

Correcting the first call to sg_set_page() made it look a lot closer to
the sg_set_page() in the loop, so fold the first call to sg_set_page()
into the loop.

Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2011-11-04 15:53:04 -04:00
Matthew Wilcox 6413214c5d Fix bug in NVME_IOCTL_SUBMIT_IO
Missing 'break' in the switch statement meant that we'd fall through
to the 'return -EINVAL' case.
2011-11-04 15:53:04 -04:00
Matthew Wilcox 6bbf1acdde NVMe: Rework ioctls
Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE
and ACTIVATE_FIRMWARE commands.  Replace them with a generic ADMIN_CMD
ioctl that can submit any admin command.

Add a new ID ioctl that returns the namespace ID of the queried device.
It corresponds to the SCSI Idlun ioctl.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox eac623ba7a NVMe: Add the nvme thread to the wait queue before waking it up
If the I/O was not completed by a single NVMe command, we add the
bio to the congestion list and wake up the kthread to resubmit it.
But the kthread calls remove_wait_queue() unconditionally, which
will oops if it's not on the wait queue.  So add the kthread to
the wait queue before waking it up.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox 6f0f54499f NVMe: Return real error from nvme_create_queue
nvme_setup_io_queues() was assuming that a NULL return from
nvme_create_queue() was an out-of-memory error.  That's not necessarily
true; the adapter might return -EIO, for example.  Change the calling
convention to return an ERR_PTR on failure instead of NULL.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox be5e094840 NVMe: Version 0.6
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox 184d2944cb NVMe: Add a few calling convention notes
For the benefit of reviewers, add comments to a few functions describing
their calling context

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox b77954cbdd NVMe: Handle failures from memory allocations in nvme_setup_prps
If any of the memory allocations in nvme_setup_prps fail, handle it by
modifying the passed-in data length to reflect the number of bytes we are
actually able to send.  Also allow the caller to specify the GFP flags
they need; for user-initiated commands, we can use GFP_KERNEL allocations.

The various callers are updated to handle this possibility; the main
I/O path is already prepared for this possibility (as it may happen
due to nvme_map_bio being unable to map all the segments of the I/O).
The other callers return -ENOMEM instead of doing partial I/Os.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox 5aff9382dd NVMe: Use an IDA to allocate minor numbers
The current approach of using the namespace ID as the minor number
doesn't work when there are multiple adapters in the machine.  Rather
than statically partitioning the number of namespaces between adapters,
dynamically allocate minor numbers to namespaces as they are detected.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:03 -04:00
Matthew Wilcox fd63e9ceee NVMe: Add include of delay.h for msleep
Previously it was being implicitly included through some other header file

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox 8de055350f NVMe: Add support for timing out I/Os
In the kthread, walk the list of outstanding I/Os and check they've not
hit the timeout.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox 21075bdee0 NVMe: Rename cancel_cmdid_data to cancel_cmdid
The trailing '_data' on the end was annoying and inconsistent.  Also, make
it actually return the data since this is needed for timing out commands.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox 09a58f5364 NVMe: Fix bug in error handling
When an I/O completed with an error, we would call bio_endio twice
(once with -EIO and once with 0).  Found by inspection.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox 22605f9681 NVMe: Time out initialisation after a few seconds
THe device reports (in its capability register) how long it will take
to initialise.  If that time elapses before the ready bit becomes set,
conclude the device is broken and refuse to initialise it.  Log a nice
error message so the user knows why we did nothing.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox aba2080f3f NVMe: Fix warning in free_irq
We need to clear the affinity mask before calling free_irq()

Reported-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:02 -04:00
Matthew Wilcox 7f53f9d242 NVMe: Correct the Controller Configuration settings
The arbitration field was extended by one bit, shifting the shutdown
notification bits by one.  Also, the SQ/CQ entry size was made
configurable for future extensions.

Reported-by: Paul Luse <paul.e.luse@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox 8ef700678f NVMe: Version 0.5
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox 6c7d49455c NVMe: Change the definition of nvme_user_io
The read and write commands don't define a 'result', so there's no need
to copy it back to userspace.

Remove the ability of the ioctl to submit commands to a different
namespace; it's just asking for trouble, and the use case I have in mind
will be addressed througha  different ioctl in the future.  That removes
the need for both the block_shift and nsid arguments.

Check that the opcode is one of 'read' or 'write'.  Future opcodes may
be added in the future, but we will need a different structure definition
for them.

The nblocks field is redefined to be 0-based.  This allows the user to
request the full 65536 blocks.

Don't byteswap the reftag, apptag and appmask.  Martin Petersen tells
me these are calculated in big-endian and are transmitted to the device
in big-endian.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox 4948168280 NVMe: Add compat_ioctl
Make ioctls work for 32-bit applications on 64-bit kernels.  The structures
are defined to be the same for both 32- and 64-bit applications, so
we can use the same handler for both.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox 9ecdc94621 NVMe: Simplify queue lookup
Fill in all the num_possible_cpus() entries with duplicate pointers.
This reduces the complexity of the frequently-called get_nvmeq(), as
well as avoiding a bug in it when there are fewer queues than CPUs.

Reported-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:01 -04:00
Matthew Wilcox 3cb967c039 NVMe: Remove the kthread from the wait queue
Once there are no more bios on the congestion list, we can stop waking
up the nvme kthread every time a completion happens.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox 7523d834dd NVMe: Fix off-by-one when filling in PRP lists
If the last element in the PRP list fits on the end of the page, there's
no need to allocate an extra page to put that single element in.  It can
fit on the end of the page.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox ac88c36a38 NVMe: Fix interpretation of 'Number of Namespaces' field
The spec says this is a 0s based value.  We don't need to handle the
maximal value because it's reserved to mean "every namespace".

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox 19e899b2f9 NVMe: Remove outdated comments
The head can never overrun the tail since we won't allocate enough command
IDs to let that happen.  The status codes are in sync with the spec.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox fa92282149 NVMe: Fix comment formatting
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox 714a7a2288 NVMe: Convert comments to kernel-doc notation
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:53:00 -04:00
Matthew Wilcox b57ab0fada NVMe: Version 0.4
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:59 -04:00
Matthew Wilcox e6d15f79f9 NVMe: Reduce maximum queue depth by 1
The spec says we're not allowed to completely fill the submission queue.
Solve this by reducing the number of allocatable cmdids by 1.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:59 -04:00
Matthew Wilcox d8ee9d69f2 NVMe: Fix discontiguous accesses
When we submit subsequent portions of the I/O, we need to access the
updated block, not start reading again from the original position.
This was showing up as miscompares in the XFS randholes testcase.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:59 -04:00
Matthew Wilcox 1ad2f8932a NVMe: Handle bios that contain non-virtually contiguous addresses
NVMe scatterlists must be virtually contiguous, like almost all I/Os.
However, when the filesystem lays out files with a hole, it can be that
adjacent LBAs map to non-adjacent virtual addresses.  Handle this by
submitting one NVMe command at a time for each virtually discontiguous
range.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:59 -04:00
Matthew Wilcox 00df5cb4eb NVMe: Implement Flush
Linux implements Flush as a bit in the bio.  That means there may also be
data associated with the flush; if so the flush should be sent before the
data.  To avoid completing the bio twice, I add CMD_CTX_FLUSH to indicate
the completion routine should do nothing.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:59 -04:00
Matthew Wilcox c42705592b NVMe: Mark CMD_CTX_CANCELLED as being unlikely
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:59 -04:00
Matthew Wilcox 7547881d09 NVMe: Correct SQ doorbell semantics
The value written to the doorbell needs to be the first free index in
the queue, not the most recently used index in the queue.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox 740216fc59 NVMe: Let the kthread take care of devices earlier
If interrupts are misconfigured, the kthread will be needed to process
admin queue completions.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox b348b7d543 NVMe: Rename nr_queues to nr_io_queues
I got confused about whether this included the admin queue or not, and
had to resort to reading the spec.  It doesn't include the admin queue,
so make that clear in the name.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox ca1615424c NVMe: Remove setting of 'flags' in rw command
This was the data transfer bit until spec rev 0.92

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox ad8a5df97c NVMe: Release 0.3
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox 1fa6aeadf1 NVMe: Add a kthread to handle the congestion list
Instead of trying to resubmit I/Os in the I/O completion path (in
interrupt context), wake up a kthread which will resubmit I/O from
user context.  This allows mke2fs to run to completion.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox eeee322647 NVMe: Handle failures differently in nvme_submit_bio_queue()
Return -EBUSY if the queue is full or -ENOMEM if we failed to allocate
memory (or map a scatterlist).  Also use GFP_ATOMIC to allocate the
nvme_bio and move the locking to the callers of nvme_submit_bio_queue().

In nvme_make_request(), don't permit an I/O to jump the queue -- if the
congestion list already has an entry, just add to the tail, rather than
trying to submit.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:58 -04:00
Matthew Wilcox 768308400f NVMe: Handle physical merging of bvec entries
In order to not overrun the sg array, we have to merge physically
contiguous pages into a single sg entry.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:57 -04:00
Matthew Wilcox 1974b1ae88 NVMe: Check for DMA mapping failure
If dma_map_sg returns 0 (failure), we need to fail the I/O.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:57 -04:00
Matthew Wilcox d567760c40 NVMe: Pass the nvme_dev to nvme_free_prps and nvme_setup_prps
We were passing the nvme_queue to access the q_dmadev for the
dma_alloc_coherent calls, but since we moved to the dma pool API,
we really only need the nvme_dev.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:57 -04:00
Matthew Wilcox 99802a7aee NVMe: Optimise memory usage for I/Os between 4k and 128k
Add a second memory pool for smaller I/Os.  We can pack 16 of these on a
single page instead of using an entire page for each one.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:57 -04:00
Matthew Wilcox 091b609258 NVMe: Switch to use DMA Pool API
Calling dma_free_coherent from interrupt context causes warnings.
Using the DMA pools delays freeing until pool destruction, so avoids
the problem.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:57 -04:00
Matthew Wilcox d534df3c73 NVMe: Rename nvme_req_info to nvme_bio
There are too many things called 'info' in this driver.  This data
structure is auxiliary information for a struct bio, so call it nvme_bio,
or nbio when used as a variable.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Shane Michael Matthews e025344c56 NVMe: Initial PRP List support
Add a pointer to the nvme_req_info to hold a new data structure
(nvme_prps) which contains a list of the pages allocated to this
particular request for holding PRP list entries.  nvme_setup_prps()
now returns this pointer.

To allocate and free the memory used for PRP lists, we need a struct
device, so we need to pass the nvme_queue pointer to many functions
which didn't use to need it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox 51882d00f0 NVMe: Advance the sg pointer when filling in an sg list
For multipage BIOs, we were always using sg[0] instead of advancing
through the list.  Oops :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox d2d8703481 NVMe: Renumber the special context values
If POISON_POINTER_DELTA isn't defined, ensure they're in page 0 which
should never be mapped.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox 9294bbed78 NVMe: Handle the congestion list a little better
In the bio completion handler, check for bios on the congestion list
for this NVM queue.  Also, lock the congestion list in the make_request
function as the queue may end up being shared between multiple CPUs.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox e85248e516 NVMe: Record the timeout for each command
In addition to recording the completion data for each command, record
the anticipated completion time.  Choose a timeout of 5 seconds for
normal I/Os and 60 seconds for admin I/Os.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox ec6ce618d6 NVMe: Need to lock queue during interrupt handling
If we're sharing a queue between multiple CPUs and we cancel a sync I/O,
we must have the queue locked to avoid corrupting the stack of the thread
that submitted the I/O.  It turns out this is the same locking that's needed
for the threaded irq handler, so share that code.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox 48e3d39816 NVMe: Detect command IDs completing that are out of range
If the adapter completes a command ID that is outside the bounds of
the array, return CMD_CTX_INVALID instead of random data, and print a
message in the sync_completion handler (which is rapidly becoming the
misc completion handler :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox b36235df01 NVMe: Detect commands that are completed twice
Set the context value to CMD_CTX_COMPLETED, and print a message in the
sync_completion handler if we see it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox be7b62754e NVMe: Use a symbolic name to represent cancelled commands instead of 0
I have plans for other special values in sync_completion.  Plus, this
is more self-documenting, and lets us detect bogus usages.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox 58ffacb545 NVMe: Add a module parameter to use a threaded interrupt
We're currently calling bio_endio from hard interrupt context.  This is
not a good idea for preemptible kernels as it will cause longer latencies.
Using a threaded interrupt will run the entire queue processing mechanism
(including bio_endio) in a thread, which can be preempted.  Unfortuantely,
it also adds about 7us of latency to the single-I/O case, so make it a
module parameter for the moment.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox b1ad37efca NVMe: Call put_nvmeq() before calling nvme_submit_sync_cmd()
We can't have preemption disabled when we call schedule().  Accept the
possibility that we'll get preempted, and it'll cost us some cacheline
bounces.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox 3c0cf138d7 NVMe: Allow fatal signals to interrupt I/O
If the user sends a fatal signal, sleeping in the TASK_KILLABLE state
permits the task to be aborted.  The only wrinkle is making sure that
if/when the command completes later that it doesn't upset anything.
Handle this by setting the data pointer to 0, and checking the value
isn't NULL in the sync completion path.  Eventually, bios can be cancelled
through this path too.  Note that the cmdid isn't freed to prevent reuse.

We should also abort the command in the future, but this is a good start.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox db5d0c198d NVMe: Release 0.2
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox 6ee44cdced NVMe: Add download / activate firmware ioctls
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox 388f037f4e NVMe: Move sysfs entries to the right place
Because I wasn't setting driverfs_dev, the devices were showing up under
/sys/devices/virtual/block.  Now they appear underneath the PCI device
which they belong to.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Shane Michael Matthews 5911f20039 NVMe: Disable the device before we write the admin queues
In case the card has been left in a partially-configured state,
write 0 to the Enable bit.

Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox 574e8b95bc NVMe: Request I/O regions
Calling pci_request_selected_regions() reserves these regions for our use.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox 2930353f9f NVMe: Allow queues to be allocated above 4GB
Need to call dma_set_coherent_mask() to allow queues to be allocated
above 4GB.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox f64d3365a3 NVMe: Enable device DMA
Need to call pci_set_master() to enable device DMA

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Shane Michael Matthews 0ee5a7d7cb NVMe: Enable and disable the PCI device
Call pci_enable_device_mem() at initialisation and pci_disable_device
at exit.

Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox 3f85d50b60 NVMe: Check returns from nvme_alloc_queue()
It can return NULL, so handle that.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox 8e9f0e7115 NVMe: Remove 'node' from nvme_dev
We don't keep a list of nvme_dev any more

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox 51814232ec NVMe: Read the model, serial & firmware rev from the controller
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox a53295b699 NVMe: Add NVME_IOCTL_SUBMIT_IO
Allow userspace to submit synchronous I/O like the SCSI sg interface does.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox 7fc3cdabba NVMe: Create nvme_map_user_pages() and nvme_unmap_user_pages()
These are generalisations of the code that was in
nvme_submit_user_admin_command().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox bd38c5557c NVMe: Change NVME_IOCTL_GET_RANGE_TYPE to return all the ranges
Factor out most of nvme_identify() into a new nvme_submit_user_admin_command()
function.  Change nvme_get_range_type() to call it and change nvme_ioctl to
realise that it's getting back all 64 ranges.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox b8deb62cf2 NVMe: Zero the command before we send it
Make sure there's no left-over bits set from previous commands that used
this slot.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox ff22b54fda NVMe: Add nvme_setup_prps()
Generalise the code from nvme_identify() that sets PRP1 & PRP2 so that
it's usable for commands sent by nvme_submit_bio_queue().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox 36c14ed9ca NVMe: Use PRP2 for the nvme_identify ioctl
DMA the result straight to userspace instead of bounce-buffering in the
kernel.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox 53c9577e9c NVMe: Fix admin IRQ claim on real hardware
The admin IRQ is supposed to use the pin-based (or single message MSI)
interrupt.  Accomplish this by filling in entry[0]'s vector with the
INTx irq number.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox 821234603b NVMe: Rename 'cycle' to 'phase'
It's called the phase bit in the current draft

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox 1b23484bd0 NVMe: Implement per-CPU queues
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox b3b06812e1 NVMe: Reduce set_queue_count arguments by one
sq_count and cq_count are always the same, so just call it 'count'.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox 3001082cac NVMe: Factor out queue_request_irq()
Two callers with an almost identical long string of arguments, and
introducing a third soon.  Time to factor out the commonalities.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox b60503ba43 NVMe: New driver
This driver is for devices that follow the NVM Express standard

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Michael S. Tsirkin 5087a50e66 virtio-blk: use ida to allocate disk index
Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while.  It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:41:02 +10:30
Paul Gortmaker 0c8d44f239 block: Fix files that are modules and hence need module.h
We want to remove the implicit everywhere presence of module.h
so fix up the people relying on that implicit presence in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:13 -04:00
Paul Gortmaker d5decd3b95 block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros
These files were getting <linux/module.h> via an implicit include
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason.  Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:12 -04:00
Michael S. Tsirkin a0eda62552 virtio-blk: use ida to allocate disk index
Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while.  It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-31 08:05:36 +01:00
Linus Torvalds 97d2eb13a0 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client
* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
  libceph: fix double-free of page vector
  ceph: fix 32-bit ino numbers
  libceph: force resend of osd requests if we skip an osdmap
  ceph: use kernel DNS resolver
  ceph: fix ceph_monc_init memory leak
  ceph: let the set_layout ioctl set single traits
  Revert "ceph: don't truncate dirty pages in invalidate work thread"
  ceph: replace leading spaces with tabs
  libceph: warn on msg allocation failures
  libceph: don't complain on msgpool alloc failures
  libceph: always preallocate mon connection
  libceph: create messenger with client
  ceph: document ioctls
  ceph: implement (optional) max read size
  ceph: rename rsize -> rasize
  ceph: make readpages fully async
2011-10-28 16:42:18 -07:00
David Vrabel 2d073846b8 block: xen-blkback: use API provided by xenbus module to map rings
The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree().  Use these to map the ring pages granted by
the frontend.

Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-26 10:02:35 -04:00
Sage Weil 6ab00d465a libceph: create messenger with client
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Linus Torvalds 59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
Linus Torvalds 31018acd4c Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/p2m/debugfs: Make type_name more obvious.
  xen/p2m/debugfs: Fix potential pointer exception.
  xen/enlighten: Fix compile warnings and set cx to known value.
  xen/xenbus: Remove the unnecessary check.
  xen/irq: If we fail during msi_capability_init return proper error code.
  xen/events: Don't check the info for NULL as it is already done.
  xen/events: BUG() when we can't allocate our event->irq array.

* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix selfballooning and ensure it doesn't go too far
  xen/gntdev: Fix sleep-inside-spinlock
  xen: modify kernel mappings corresponding to granted pages
  xen: add an "highmem" parameter to alloc_xenballooned_pages
  xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
  xen/p2m: Make debug/xen/mmu/p2m visible again.
  Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
2011-10-25 09:17:47 +02:00
Jens Axboe 83157223de Merge branch 'for-linus' into for-3.2/core 2011-10-24 16:24:38 +02:00
Mike Miller ab5dbebe33 cciss: add small delay when using PCI Power Management to reset for kump
The P600 requires a small delay when changing states. Otherwise we may think
the board did not reset and we bail. This for kdump only and is particular
to the P600.

Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-20 22:21:52 +02:00
Jens Axboe b8d8bdfe31 Merge branch 'stable/for-jens-3.2' of git://oss.oracle.com/git/kwilk/xen into for-3.2/drivers 2011-10-20 15:10:59 +02:00
Jens Axboe 5c04b426f2 Merge branch 'v3.1-rc10' into for-3.2/core
Conflicts:
	block/blk-core.c
	include/linux/blkdev.h

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:30:42 +02:00
Konrad Rzeszutek Wilk 6927d92091 xen/blkback: Fix two races in the handling of barrier requests.
There are two windows of opportunity to cause a race when
processing a barrier request. This patch fixes this.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-17 14:28:57 -04:00
Christoph Hellwig 456be1484f loop: remove the incorrect write_begin/write_end shortcut
Currently the loop device tries to call directly into write_begin/write_end
instead of going through ->write if it can.  This is a fairly nasty shortcut
as write_begin and write_end are only callbacks for the generic write code
and expect to be called with filesystem specific locks held.

This code currently causes various issues for clustered filesystems as it
doesn't take the required cluster locks, and it also causes issues for XFS
as it doesn't properly lock against the swapext ioctl as called by the
defragmentation tools.  This in case causes data corruption if
defragmentation hits a busy loop device in the wrong time window, as
reported by RH QA.

The reason why we have this shortcut is that it saves a data copy when
doing a transformation on the loop device, which is the technical term
for using cryptoloop (or an XOR transformation).  Given that cryptoloop
has been deprecated in favour of dm-crypt my opinion is that we should
simply drop this shortcut instead of finding complicated ways to to
introduce a formal interface for this shortcut.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-17 12:57:20 +02:00
Konrad Rzeszutek Wilk dda1852802 xen/blkback: Check for proper operation.
The patch titled: "xen/blkback: Fix the inhibition to map pages
when discarding sector ranges." had the right idea except that
it used the wrong comparison operator. It had == instead of !=.

This fixes the bug where all (except discard) operations would
have been ignored.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14 12:29:55 -04:00
Konrad Rzeszutek Wilk 64391b2536 xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
The 'operation' parameters are the ones provided to the bio layer while
the req->operation are the ones passed in between the backend and
frontend. We used the wrong 'operation' value to squash the
call to map pages when processing the discard operation resulting
in an hypercall that did nothing. Lets guard against going in the
mapping function by checking for the proper operation type.

CC: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:38 -04:00
Konrad Rzeszutek Wilk 5c62cb4860 xen/blkback: Report VBD_WSECT (wr_sect) properly.
We did not increment the amount of sectors written to disk
b/c we tested for the == WRITE which is incorrect - as the
operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch
fixes it by doing a & WRITE check.

CC: stable@kernel.org
Reported-by: Andy Burns <xen.lists@burns.me.uk>
Suggested-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:37 -04:00
Konrad Rzeszutek Wilk 29bde09378 xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
We emulate the barrier requests by draining the outstanding bio's
and then sending the WRITE_FLUSH command. To drain the I/Os
we use the refcnt that is used during disconnect to wait for all
the I/Os before disconnecting from the frontend. We latch on its
value and if it reaches either the threshold for disconnect or when
there are no more outstanding I/Os, then we have drained all I/Os.

Suggested-by: Christopher Hellwig <hch@infradead.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:36 -04:00
Laszlo Ersek 469738e675 xen-blkfront: plug device number leak in xlblk_init() error path
... though after a failed xenbus_register_frontend() all may be lost.

Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:35 -04:00
Konrad Rzeszutek Wilk d11e615830 xen-blkfront: If no barrier or flush is supported, use invalid operation.
Guard against issuing BLKIF_OP_WRITE_BARRIER or BLKIF_OP_FLUSH_CACHE
by checking whether we successfully negotiated with the backend.
The negotiation with the backend also sets the q->flush_flags which
fortunately for us is also used when submitting an bio to us. If
we don't support barriers or flushes it would be set to zero so
we should never end up having to deal with REQ_FLUSH | REQ_FUA.

However, other third party implementations of __make_request that
might be stacked on top of us might not be so smart, so lets fix this up.

Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:35 -04:00
Jan Beulich 8e6dc6fe51 xen-blkback: use kzalloc() in favor of kmalloc()+memset()
This fixes the problem of three of those four memset()-s having
improper size arguments passed: Sizeof a pointer-typed expression
returns the size of the pointer, not that of the pointed to data.

It also reverts using kmalloc() instead of kzalloc() for the allocation
of the pending grant handles array, as that array gets fully
initialized in a subsequent loop.

Reported-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:34 -04:00
Joe Jin c555aab97d xen-blkback: fixed indentation and comments
This patch fixes belows:

1. Fix code style issue.
2. Fix incorrect functions name in comments.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:33 -04:00
Li Dongyang 69ef68cef9 xen-blkfront: fix a deadlock while handling discard response
When we get -EOPNOTSUPP response for a discard request, we will clear
the discard flag on the request queue so we won't attempt to send discard
requests to backend again, and this should be protected under rq->queue_lock.
However, when we setup the request queue, we pass blkif_io_lock to
blk_init_queue so rq->queue_lock is blkif_io_lock indeed, and this lock
is already taken when we are in blkif_interrpt, so remove the
spin_lock/spin_unlock when we clear the discard flag or we will end up
with deadlock here

Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Updated description a bit and removed comment from source]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:32 -04:00
Li Dongyang ed30bf317c xen-blkfront: Handle discard requests.
If the backend advertises 'feature-discard', then interrogate
the backend for alignment and granularity. Setup the request
queue with the appropiate values and send the discard operation
as required.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Amended commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:31 -04:00
Li Dongyang b3cb0d6adc xen-blkback: Implement discard requests ('feature-discard')
..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend
and used as appropiately by the backend. We also advertise
certain granulity parameters to the frontend so it can plug them in.
If the backend is a realy device - we just end up using
'blkdev_issue_discard' while for loopback devices - we just punch
a hole in the image file.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Fixed up pr_debug and commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13 09:48:30 -04:00
Stefano Stabellini 0930bba674 xen: modify kernel mappings corresponding to granted pages
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.

In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.

Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Removed GRANT_FRAME_BIT usage]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29 10:32:58 -04:00
Carsten Emde 6c4867f646 floppy: use del_timer_sync() in init cleanup
When no floppy is found the module code can be released while a timer
function is pending or about to be executed.

CPU0                                  CPU1
				      floppy_init()
timer_softirq()
   spin_lock_irq(&base->lock);
   detach_timer();
   spin_unlock_irq(&base->lock);
   -> Interrupt
					del_timer();
				        return -ENODEV;
                                      module_cleanup();
   <- EOI
   call_timer_fn();
   OOPS

Use del_timer_sync() to prevent this.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-21 10:22:11 +02:00
Ayan George 4c823cc3d5 drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
If the loop device is associated (lo->lo_state == Lo_bound), it will have
a valid bdev pointed to by lo->lo_device.  There is no reason to ever pass
an additional block_device pointer.

Signed-off-by: Ayan George <ayan.george@canonical.com>
Cc: Phillip Susi <psusi@cfl.rr.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-21 10:02:13 +02:00
Phillip Susi 8a9c594422 drivers/block/loop.c: emit uevent on auto release
The loopback driver failed to emit the change uevent when auto releasing
the device.  Fixed lo_release() to pass the bdev to loop_clr_fd() so it
can emit the event.

Signed-off-by: Phillip Susi <psusi@cfl.rr.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ayan George <ayan@ayan.net>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-21 10:02:13 +02:00
Sergei Shtylyov 5a3a76e6c3 drivers/block/cpqarray.c: use pci_dev->revision
This driver uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it
wasn't converted by commit 44c10138fd ("PCI: Change all drivers to
use pci_device->revision").

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: Chirag Kantharia <chirag.kantharia@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-09-21 10:02:13 +02:00
Jiri Kosina e060c38434 Merge branch 'master' into for-next
Fast-forward merge with Linus to be able to merge patches
based on more recent version of the tree.
2011-09-15 15:08:18 +02:00
Jesper Juhl e5de063016 Remove unneeded version.h includes from drivers/block/
It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in drivers/block/.
This patch removes them.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:57:06 +02:00
Justin P. Mattock 699324871f treewide: remove extra semicolons from various parts of the kernel
This is a resend from the original, changing the title from PATCH to
RFC(since this is a review for commit, and I should have put that the first go around).
and also removing some of the commit's with ia64 and bash since it is significant.
let me know if I might have missed anything etc..

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:50:49 +02:00
Joe Perches 1d273b929c drbd: Use angle brackets for system includes
Use the normal include style.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:02:57 +02:00
Joe Perches 57f3224c3f drbd: Convert vmalloc/memset to vzalloc
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 13:55:02 +02:00
Christoph Hellwig 5a7bbad27a block: remove support for bio remapping from ->make_request
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-09-12 12:12:01 +02:00
Kay Sievers e03c8dd149 loop: always allow userspace partitions and optionally support automatic scanning
Automatic partition scanning can be requested individually per loop
device during its setup by setting LO_FLAGS_PARTSCAN. By default, no
partition tables are scanned.

Userspace can now always add and remove partitions from all loop
devices, regardless if the in-kernel partition scanner is enabled or
not.

The needed partition minor numbers are allocated from the extended
minors space, the main loop device numbers will continue to match the
loop minors, regardless of the number of partitions used.

  # grep . /sys/class/block/loop1/loop/*
  /sys/block/loop1/loop/autoclear:0
  /sys/block/loop1/loop/backing_file:/home/kay/data/stuff/part.img
  /sys/block/loop1/loop/offset:0
  /sys/block/loop1/loop/partscan:1
  /sys/block/loop1/loop/sizelimit:0

  # ls -l /dev/loop*
  brw-rw---- 1 root disk   7,   0 Aug 14 20:22 /dev/loop0
  brw-rw---- 1 root disk   7,   1 Aug 14 20:23 /dev/loop1
  brw-rw---- 1 root disk 259,   0 Aug 14 20:23 /dev/loop1p1
  brw-rw---- 1 root disk 259,   1 Aug 14 20:23 /dev/loop1p2
  brw-rw---- 1 root disk   7,  99 Aug 14 20:23 /dev/loop99
  brw-rw---- 1 root disk 259,   2 Aug 14 20:23 /dev/loop99p1
  brw-rw---- 1 root disk 259,   3 Aug 14 20:23 /dev/loop99p2
  crw------T 1 root root  10, 237 Aug 14 20:22 /dev/loop-control

Cc: Karel Zak  <kzak@redhat.com>
Cc: Davidlohr Bueso <dave@gnu.org>
Acked-By: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23 20:12:04 +02:00
Jens Axboe 89c63a8ef3 Merge branch 'stable/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus 2011-08-23 15:09:13 +02:00
Joe Jin 1bc05b0ae6 xen-blkback: fixed indentation and comments
This patch fixes belows:

1. Fix code style issue.
2. Fix incorrect functions name in comments.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-22 11:35:36 -04:00
Joe Jin 6f5986bce5 xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
When do block-attach/block-detach test with below steps, umount hangs
in the guest. Furthermore shutdown ends up being stuck when umounting file-systems.

1. start guest.
2. attach new block device by xm block-attach in Dom0.
3. mount new disk in guest.
4. execute xm block-detach to detach the block device in dom0 until timeout
5. Any request to the disk will hung.

Root cause:
This issue is caused when setting backend device's state to
'XenbusStateClosing', which sends to the frontend the XenbusStateClosing
notification. When frontend receives the notification it tries to release
the disk in blkfront_closing(), but at that moment the disk is still in use
by guest, so frontend refuses to close. Specifically it sets the disk state to
XenbusStateClosing and sends the notification to backend - when backend receives the
event, it disconnects the vbd from real device, and sets the vbd device state to
XenbusStateClosing. The backend disconnects the real device/file, and any IO
requests to the disk in guest will end up in ether, leaving disk DEAD and set to
XenbusStateClosing. When the guest wants to disconnect the disk, umount will
hang on blkif_release()->xlvbd_release_gendisk() as it is unable to send any IO
to the disk, which prevents clean system shutdown.

Solution:
Don't disconnect backend until frontend state switched to XenbusStateClosed.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Daniel Stodden <daniel.stodden@citrix.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
[v1: Modified description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-22 11:35:35 -04:00
Lukas Czerner dfaa2ef68e loop: add discard support for loop devices
This commit adds discard support for loop devices. Discard is usually
supported by SSD and thinly provisioned devices as a method for
reclaiming unused space. This is no different than trying to reclaim
back space which is not used by the file system on the image, but it
still occupies space on the host file system.

We can do the reclamation on file system which does support hole
punching. So when discard request gets to the loop driver we can
translate that to punch a hole to the underlying file, hence reclaim
the free space.

This is very useful for trimming down the size of the image to only what
is really used by the file system on that image. Fstrim may be used for
that purpose.

It has been tested on ext4, xfs and btrfs with the image file systems
ext4, ext3, xfs and btrfs. ext4, or ext6 image on ext4 file system has
some problems but it seems that ext4 punch hole implementation is
somewhat flawed and it is unrelated to this commit.

Also this is a very good method of validating file systems punch hole
implementation.

Note that when encryption is used, discard support is disabled, because
using it might leak some information useful for possible attacker.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:50:46 +02:00
Andrew Morton 548ef6cc26 nbd-replace-some-printk-with-dev_warn-and-dev_info-checkpatch-fixes
ERROR: code indent should use tabs where possible
#30: FILE: drivers/block/nbd.c:578:
+^I        dev_info(disk_to_dev(lo->disk), "NBD_DISCONNECT\n");$

total: 1 errors, 0 warnings, 35 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/nbd-replace-some-printk-with-dev_warn-and-dev_info.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:48:28 +02:00
WANG Cong 5eedf5415c nbd: replace some printk with dev_warn() and dev_info()
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:48:28 +02:00
WANG Cong 7742ce4ab4 nbd: lower the loglevel of an error message
This is only an error, no need to use KERN_CRIT log level.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:48:28 +02:00
WANG Cong 7f1b90f99a nbd: replace printk KERN_ERR with dev_err()
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:48:22 +02:00
WANG Cong 1695b87f7d nbd: replace sysfs_create_file() with device_create_file()
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:48:21 +02:00
WANG Cong 25ac0c2b97 nbd: use task_pid_nr() to get current pid
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-19 14:48:17 +02:00
Jens Axboe 40bb96ade4 Merge branch 'stable/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus 2011-08-09 20:43:26 +02:00
Konrad Rzeszutek Wilk ea5e116162 xen/blkback: Make description more obvious.
With the frontend having Xen but the backend not, it just looks odd:

  <*>   Xen virtual block device support
  <*>   Block-device backend driver

Fix it to have the 'Xen' in front of it.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-09 11:12:14 -04:00
Joe Handzik f963d270cb cciss: add transport mode attribute to sys
Signed-off-by: Joseph Handzik <joseph.t.handzik@beardog.cce.hp.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-08 11:40:17 +02:00
Joseph Handzik 1304953700 cciss: Adds simple mode functionality
Signed-off-by: Joseph Handzik <joseph.t.handzik@beardog.cce.hp.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-08 11:40:15 +02:00
Axel Lin f41c53a569 block: swim3: fix unterminated of_device_id table
of_device_id structures need a NULL terminating entry, add it.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-03 15:02:55 +02:00
H Hartley Sweeten ddad9ef582 drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
The buffer 'sc.cpu_mask' is a kernel buffer.  If bitmap_parse is used
instead of __bitmap_parse the extra parameter that indicates a kernel
buffer is not needed.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-02 12:43:49 +02:00
Kay Sievers 05eb0f252b loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs
files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD
waits for show() to finish to remove the sysfs file.

  cat /sys/class/block/loop0/loop/backing_file
    mutex_lock_nested+0x176/0x350
    ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
    ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
    loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
    dev_attr_show+0x1b/0x60
    ? sysfs_read_file+0x86/0x1a0
    ? __get_free_pages+0x12/0x50
    sysfs_read_file+0xaf/0x1a0

  ioctl(LOOP_CLR_FD):
    wait_for_common+0x12c/0x180
    ? try_to_wake_up+0x2a0/0x2a0
    wait_for_completion+0x18/0x20
    sysfs_deactivate+0x178/0x180
    ? sysfs_addrm_finish+0x43/0x70
    ? sysfs_addrm_start+0x1d/0x20
    sysfs_addrm_finish+0x43/0x70
    sysfs_hash_and_remove+0x85/0xa0
    sysfs_remove_group+0x59/0x100
    loop_clr_fd+0x1dc/0x3f0 [loop]
    lo_ioctl+0x223/0x7a0 [loop]

Instead of taking the lo_ctl_mutex from sysfs code, take the inner
lo->lo_lock, to protect the access to the backing_file data.

Thanks to Tejun for help debugging and finding a solution.

Cc: Milan Broz <mbroz@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31 22:21:35 +02:00
Kay Sievers d134b00b9a loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
Instead of unconditionally creating a fixed number of dead loop
devices which need to be investigated by storage handling services,
even when they are never used, we allow distros start with 0
loop devices and have losetup(8) and similar switch to the dynamic
/dev/loop-control interface instead of searching /dev/loop%i for free
devices.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31 22:08:04 +02:00
Kay Sievers 770fe30a46 loop: add management interface for on-demand device allocation
Loop devices today have a fixed pre-allocated number of usually 8.
The number can only be changed at module init time. To find a free
device to use, /dev/loop%i needs to be scanned, and all devices need
to be opened until a free one is possibly found.

This adds a new /dev/loop-control device node, that allows to
dynamically find or allocate a free device, and to add and remove loop
devices from the running system:
 LOOP_CTL_ADD adds a specific device. Arg is the number
 of the device. It returns the device i or a negative
 error code.

 LOOP_CTL_REMOVE removes a specific device, Arg is the
 number the device. It returns the device i or a negative
 error code.

 LOOP_CTL_GET_FREE finds the next unbound device or allocates
 a new one. No arg is given. It returns the device i or a
 negative error code.

The loop kernel module gets automatically loaded when
/dev/loop-control is accessed the first time. The alias
specified in the module, instructs udev to create this
'dead' device node, even when the module is not loaded.

Example:
 cfd = open("/dev/loop-control", O_RDWR);

 # add a new specific loop device
 err = ioctl(cfd, LOOP_CTL_ADD, devnr);

 # remove a specific loop device
 err = ioctl(cfd, LOOP_CTL_REMOVE, devnr);

 # find or allocate a free loop device to use
 devnr = ioctl(cfd, LOOP_CTL_GET_FREE);

 sprintf(loopname, "/dev/loop%i", devnr);
 ffd = open("backing-file", O_RDWR);
 lfd = open(loopname, O_RDWR);
 err = ioctl(lfd, LOOP_SET_FD, ffd);

Cc: Tejun Heo <tj@kernel.org>
Cc: Karel Zak  <kzak@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31 22:08:04 +02:00
Kay Sievers 34dd82afd2 loop: replace linked list of allocated devices with an idr index
Replace the linked list, that keeps track of allocated devices, with an
idr index to allow a more efficient lookup of devices.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-31 22:08:04 +02:00
Arun Sharma 60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Linus Torvalds ba5b56cb3e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
  ceph: document unlocked d_parent accesses
  ceph: explicitly reference rename old_dentry parent dir in request
  ceph: document locking for ceph_set_dentry_offset
  ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
  ceph: protect d_parent access in ceph_d_revalidate
  ceph: protect access to d_parent
  ceph: handle racing calls to ceph_init_dentry
  ceph: set dir complete frag after adding capability
  rbd: set blk_queue request sizes to object size
  ceph: set up readahead size when rsize is not passed
  rbd: cancel watch request when releasing the device
  ceph: ignore lease mask
  ceph: fix ceph_lookup_open intent usage
  ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
  ceph: fix bad parent_inode calc in ceph_lookup_open
  ceph: avoid carrying Fw cap during write into page cache
  libceph: don't time out osd requests that haven't been received
  ceph: report f_bfree based on kb_avail rather than diffing.
  ceph: only queue capsnap if caps are dirty
  ceph: fix snap writeback when racing with writes
  ...
2011-07-26 13:38:50 -07:00
Josh Durgin 029bcbd8b0 rbd: set blk_queue request sizes to object size
This improves performance since more requests can be merged.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-07-26 11:29:35 -07:00
Yehuda Sadeh 79e3057c4c rbd: cancel watch request when releasing the device
We were missing this cleanup, so when a device was released
the osd didn't clean up its watchers list, so following notifications
could be slow as osd needed to timeout on the client.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-07-26 11:29:04 -07:00
Linus Torvalds 8ded371f81 Merge branch 'for-3.1/drivers' of git://git.kernel.dk/linux-block
* 'for-3.1/drivers' of git://git.kernel.dk/linux-block:
  cciss: do not attempt to read from a write-only register
  xen/blkback: Add module alias for autoloading
  xen/blkback: Don't let in-flight requests defer pending ones.
  bsg: fix address space warning from sparse
  bsg: remove unnecessary conditional expressions
  bsg: fix bsg_poll() to return POLLOUT properly
2011-07-25 10:38:18 -07:00
Linus Torvalds bbd9d6f7fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
  vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
  isofs: Remove global fs lock
  jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
  fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
  mm/truncate.c: fix build for CONFIG_BLOCK not enabled
  fs:update the NOTE of the file_operations structure
  Remove dead code in dget_parent()
  AFS: Fix silly characters in a comment
  switch d_add_ci() to d_splice_alias() in "found negative" case as well
  simplify gfs2_lookup()
  jfs_lookup(): don't bother with . or ..
  get rid of useless dget_parent() in btrfs rename() and link()
  get rid of useless dget_parent() in fs/btrfs/ioctl.c
  fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
  drivers: fix up various ->llseek() implementations
  fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
  Ext4: handle SEEK_HOLE/SEEK_DATA generically
  Btrfs: implement our own ->llseek
  fs: add SEEK_HOLE and SEEK_DATA flags
  reiserfs: make reiserfs default to barrier=flush
  ...

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
2011-07-22 19:02:39 -07:00
Linus Torvalds a99a7d1436 Merge branch 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mips: Fix i8253 clockevent fallout
  i8253: Cleanup outb/inb magic
  arm: Footbridge: Use common i8253 clockevent
  mips: Use common i8253 clockevent
  x86: Use common i8253 clockevent
  i8253: Create common clockevent implementation
  i8253: Export i8253_lock unconditionally
  pcpskr: MIPS: Make config dependencies finer grained
  pcspkr: Cleanup Kconfig dependencies
  i8253: Move remaining content and delete asm/i8253.h
  i8253: Consolidate definitions of PIT_LATCH
  x86: i8253: Consolidate definitions of global_clock_event
  i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
  alpha: i8253: Cleanup remaining users of i8253pit.h
  i8253: Remove I8253_LOCK config
  i8253: Make pcsp sound driver use the shared i8253_lock
  i8253: Make pcspkr input driver use the shared i8253_lock
  i8253: Consolidate all kernel definitions of i8253_lock
  i8253: Unify all kernel declarations of i8253_lock
  i8253: Create linux/i8253.h and use it in all 8253 related files
2011-07-22 16:51:56 -07:00
Linus Torvalds 8181780c16 Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6:
  dt: include linux/errno.h in linux/of_address.h
  of/address: Add of_find_matching_node_by_address helper
  dt: remove extra xsysace platform_driver registration
  tty/serial: Add devicetree support for nVidia Tegra serial ports
  dt: add empty of_property_read_u32[_array] for non-dt
  dt: bindings: move SEC node under new crypto/
  dt: add helper function to read u32 arrays
  tty/serial: change of_serial to use new of_property_read_u32() api
  dt: add 'const' for of_property_read_string parameter **out_string
  dt: add helper functions to read u32 and string property values
  tty: of_serial: support for 32 bit accesses
  dt: document the of_serial bindings
  dt/platform: allow device name to be overridden
  drivers/amba: create devices from device tree
  dt: add of_platform_populate() for creating device from the device tree
  dt: Add default match table for bus ids
2011-07-22 14:53:38 -07:00
Linus Torvalds 111ad119d1 Merge branch 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI
  xen/pciback: Remove the DEBUG option.
  xen/pciback: Drop two backends, squash and cleanup some code.
  xen/pciback: Print out the MSI/MSI-X (PIRQ) values
  xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices.
  xen: rename pciback module to xen-pciback.
  xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases.
  xen/pciback: Allocate IRQ handler for device that is shared with guest.
  xen/pciback: Disable MSI/MSI-X when reseting a device
  xen/pciback: guest SR-IOV support for PV guest
  xen/pciback: Register the owner (domain) of the PCI device.
  xen/pciback: Cleanup the driver based on checkpatch warnings and errors.
  xen/pciback: xen pci backend driver.
  xen: tmem: self-ballooning and frontswap-selfshrinking
  xen: Add module alias to autoload backend drivers
  xen: Populate xenbus device attributes
  xen: Add __attribute__((format(printf... where appropriate
  xen: prepare tmem shim to handle frontswap
  xen: allow enable use of VGA console on dom0
2011-07-22 13:45:15 -07:00
Al Viro e7f5909707 kill useless checks for sb->s_op == NULL
never is...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:44:21 -04:00
Grant Likely 8c11642a50 Merge commit 'v3.0-rc7' into devicetree/next 2011-07-15 20:11:34 -06:00
Stefan Bader 89153b5cae xen-blkfront: Fix one off warning about name clash
Avoid telling users to use xvde and onwards when using xvde.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-14 14:19:51 -04:00
Stefan Bader 196cfe2ae8 xen-blkfront: Drop name and minor adjustments for emulated scsi devices
These were intended to avoid the namespace clash when representing
emulated IDE and SCSI devices. However that seems to confuse users
more than expected (a disk defined as sda becomes xvde).
So for now go back to the scheme which does no adjustments. This
will break when mixing IDE and SCSI names in the configuration of
guests but should be by now expected.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-14 14:19:33 -04:00
Grant Likely 5d10302f46 dt: remove extra xsysace platform_driver registration
After commit 1c48a5c93, "dt: Eliminate
of_platform_{,un}register_driver", the xsysace driver attempts to
register two platform_drivers with the same name, which a) doesn't
work, and b) isn't necessary.  This patch merges the two
platform_drivers.

Reported-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-14 05:33:52 -06:00
Stephen M. Cameron 07d0c38e7d cciss: do not attempt to read from a write-only register
Most smartarrays will tolerate it, but some new ones don't.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Note: this is a regression caused by commit 1ddd5049
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-09 09:04:12 +02:00
Bastian Blank a7e9357f10 xen/blkback: Add module alias for autoloading
Add xen-backend:vbd module alias to the xen-blkback module. This allows
automatic loading of the module.

Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-30 12:48:25 -04:00
Daniel Stodden b4726a9df2 xen/blkback: Don't let in-flight requests defer pending ones.
Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
idea. It means that in-flight I/O is essentially blocking continued
batches. This essentially kills throughput on frontends which unplug
(or even just notify) early and rightfully assume addtional requests
will be picked up on time, not synchronously.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
[v1: Rebased and fixed compile problems]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-30 12:48:06 -04:00
Joe Perches 08b8bfc1c6 xen: Add __attribute__((format(printf... where appropriate
Use the compiler to verify printf formats and arguments.

Fix fallout.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-30 12:14:40 -04:00
Jens Axboe 7b28afe01a Merge branch 'for-3.0-important' of git://git.drbd.org/linux-2.6-drbd into for-linus 2011-06-30 10:10:50 +02:00
Lars Ellenberg 86e1e98e5c drbd: we should write meta data updates with FLUSH FUA
We used to write these with BIO_RW_BARRIER aka REQ_HARDBARRIER (unless
disabled in the configuration). The correct semantic now would be to
write with FLUSH/FUA.
For example, with activity log transactions, FUA alone is not enough, we
need the corresponding bitmap update (and all related application
updates) on stable storage as well.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-06-30 09:23:46 +02:00
Lars Ellenberg cb6518cbef drbd: when receive times out on meta socket, also check last receive time on data socket
If we have an asymetrically congested network, we may send P_PING,
but due to congestion, the corresponding P_PING_ACK would time out,
and we would drop a (congested, but otherwise) healthy connection
("PingAck did not arrive in time.")

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-06-30 09:23:44 +02:00
Lars Ellenberg 5a8b424276 drbd: account bitmap IO during resync as resync-(related-)-io
If we have a good resync rate, we will frequently update the on-disk
bitmap, which, if not accounted for as resync io, may let an otherwise
idle device appear to be "busy", and cause us to throttle resync.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-06-30 09:23:43 +02:00
Lars Ellenberg 8ccee20e3e drbd: don't cond_resched_lock with IRQs disabled
The last commit, drbd: add missing spinlock to bitmap receive,
introduced a cond_resched_lock(), where the lock in question is taken
with irqs disabled.

As we must not schedule with IRQs disabled,
and cond_resched_lock_irq() does not exist, yet,
we re-aquire the spin_lock_irq() for each bitmap page processed in turn.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-06-30 09:23:42 +02:00
Lars Ellenberg 829c608786 drbd: add missing spinlock to bitmap receive
During bitmap exchange, when using the RLE bitmap compression scheme,
we have a code path that can set the whole bitmap at once.

To avoid holding spin_lock_irq() for too long, we used to lock out other
bitmap modifications during bitmap exchange by other means, and then,
knowing we have exclusive access to the bitmap, modify it without
the spinlock, and with IRQs enabled.

Since we now allow local IO to continue, potentially setting additional
bits during the bitmap receive phase, this is no longer true, and we get
uncoordinated updates of bitmap members, causing bm_set to no longer
accurately reflect the total number of set bits.

To actually see this, you'd need to have a large bitmap, use RLE bitmap
compression, and have busy IO during sync handshake and bitmap exchange.

Fix this by taking the spin_lock_irq() in this code path as well, but
calling cond_resched_lock() after each page worth of bits processed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-06-30 09:23:41 +02:00
Philipp Reisner 0cfdd247d1 drbd: Use the correct max_bio_size when creating resync requests
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-06-30 09:23:40 +02:00
Ralf Baechle 334955ef96 i8253: Create linux/i8253.h and use it in all 8253 related files
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20110601180610.054254048@duck.linux-mips.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 arch/arm/mach-footbridge/isa-timer.c |    2 +-
 arch/mips/cobalt/time.c              |    2 +-
 arch/mips/jazz/irq.c                 |    2 +-
 arch/mips/kernel/i8253.c             |    2 +-
 arch/mips/mti-malta/malta-time.c     |    2 +-
 arch/mips/sgi-ip22/ip22-time.c       |    2 +-
 arch/mips/sni/time.c                 |    2 +-
 arch/x86/kernel/apic/apic.c          |    2 +-
 arch/x86/kernel/apm_32.c             |    2 +-
 arch/x86/kernel/hpet.c               |    2 +-
 arch/x86/kernel/i8253.c              |    2 +-
 arch/x86/kernel/time.c               |    2 +-
 drivers/block/hd.c                   |    2 +-
 drivers/clocksource/i8253.c          |    2 +-
 drivers/input/gameport/gameport.c    |    2 +-
 drivers/input/joystick/analog.c      |    2 +-
 drivers/input/misc/pcspkr.c          |    2 +-
 include/linux/i8253.h                |   11 +++++++++++
 sound/drivers/pcsp/pcsp.h            |    2 +-
 19 files changed, 29 insertions(+), 18 deletions(-)
2011-06-09 15:01:37 +02:00
Linus Torvalds 4f1ba49efa Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block:
  block: Use hlist_entry() for io_context.cic_list.first
  cfq-iosched: Remove bogus check in queue_fail path
  xen/blkback: potential null dereference in error handling
  xen/blkback: don't call vbd_size() if bd_disk is NULL
  block: blkdev_get() should access ->bd_disk only after success
  CFQ: Fix typo and remove unnecessary semicolon
  block: remove unwanted semicolons
  Revert "block: Remove extra discard_alignment from hd_struct."
  nbd: adjust 'max_part' according to part_shift
  nbd: limit module parameters to a sane value
  nbd: pass MSG_* flags to kernel_recvmsg()
  block: improve the bio_add_page() and bio_add_pc_page() descriptions
2011-06-04 08:11:26 +09:00
Linus Torvalds 0f48f26009 block: fix mismerge of the DISK_EVENT_MEDIA_CHANGE removal
Jens' back-merge commit 698567f3fa ("Merge commit 'v2.6.39' into
for-2.6.40/core") was incorrectly done, and re-introduced the
DISK_EVENT_MEDIA_CHANGE lines that had been removed earlier in commits

 - 9fd097b149 ("block: unexport DISK_EVENT_MEDIA_CHANGE for
   legacy/fringe drivers")

 - 7eec77a181 ("ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd
   and ide-cd")

because of conflicts with the "g->flags" updates near-by by commit
d4dc210f69 ("block: don't block events on excl write for non-optical
devices")

As a result, we re-introduced the hanging behavior due to infinite disk
media change reports.

Tssk, tssk, people! Don't do back-merges at all, and *definitely* don't
do them to hide merge conflicts from me - especially as I'm likely
better at merging them than you are, since I do so many merges.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-02 05:29:19 +09:00
Dan Carpenter 9b83c77121 xen/blkback: potential null dereference in error handling
blkbk->pending_pages can be NULL here so I added a check for it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
[v1: Redid the loop a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-01 09:28:21 -04:00