Commit graph

1070 commits

Author SHA1 Message Date
Jonathan Brassow
a159c1ac5f dm snapshot: extend exception store functions
Supply dm_add_exception as a callback to the read_metadata function.
Add a status function ready for a later patch and name the functions
consistently.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:19 +00:00
Alasdair G Kergon
4db6bfe02b dm snapshot: split out exception store implementations
Move the existing snapshot exception store implementations out into
separate files.  Later patches will place these behind a new
interface in preparation for alternative implementations.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:17 +00:00
Jonathan Brassow
1ae25f9c93 dm snapshot: rename struct exception_store
Rename struct exception_store to dm_exception_store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:16 +00:00
Jonathan Brassow
aea53d92f7 dm snapshot: separate out exception store interface
Pull structures that bridge the gap between snapshot and
exception store out of dm-snap.h and put them in a new
.h file - dm-exception-store.h.  This file will define the
API for new exception stores.

Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
should be using it.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:15 +00:00
Alasdair G Kergon
fe9cf30eb8 dm mpath: move trigger_event to system workqueue
The same workqueue is used both for sending uevents and processing queued I/O.
Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting
for the queued I/O to be processed.  Use scheduled_work() for the asynchronous
uevents instead.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:13 +00:00
Milan Broz
784aae735d dm: add name and uuid to sysfs
Implement simple read-only sysfs entry for device-mapper block device.

This patch adds a simple sysfs directory named "dm" under block device
properties and implements
	- name attribute (string containing mapped device name)
	- uuid attribute (string containing UUID, or empty string if not set)

The kobject is embedded in mapped_device struct, so no additional
memory allocation is needed for initializing sysfs entry.

During the processing of sysfs attribute we need to lock mapped device
which is done by a new function dm_get_from_kobj, which returns the md
associated with kobject and increases the usage count.

Each 'show attribute' function is responsible for its own locking.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:12 +00:00
Mikulas Patocka
d58168763f dm table: rework reference counting
Rework table reference counting.

The existing code uses a reference counter. When the last reference is
dropped and the counter reaches zero, the table destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.

This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
  destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
  same process.
* stale reference from some other kernel code keeps the table
  constructed, which keeps some devices open, even after successful
  return from "dmsetup remove". This can confuse lvm and prevent closing
  of underlying devices or reusing device minor numbers.

The patch changes reference counting so that the table destructor can be
called only at predetermined places.

The table has always exactly one reference from either mapped_device->map
or hash_cell->new_map. After this patch, this reference is not counted
in table->holders.  A pair of dm_create_table/dm_destroy_table functions
is used for table creation/destruction.

Temporary references from the other code increase table->holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.

When the table is about to be destroyed, we wait for table->holders to
reach 0. Then, we call the table destructor.  We use active waiting with
msleep(1), because the situation happens rarely (to one user in 5 years)
and removing the device isn't performance-critical task: the user doesn't
care if it takes one tick more or not.

This way, the destructor is called only at specific points
(dm_table_destroy function) and the above problems associated with lazy
destruction can't happen.

Finally remove the temporary protection added to dm_any_congested().

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:10 +00:00
Andi Kleen
ab4c142488 dm: support barriers on simple devices
Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

 NB. Any DM device might cease to support barriers if it gets
     reconfigured so code must continue to allow for a possible
     -EOPNOTSUPP on every barrier bio submitted.  - agk

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:09 +00:00
Kiyoshi Ueda
8fbf26ad5b dm request: add caches
This patch prepares some kmem_caches for request-based dm.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:06 +00:00
Milan Broz
23d39f63aa dm ioctl: allow dm_copy_name_and_uuid to return only one field
Allow NULL buffer in dm_copy_name_and_uuid if you only want to return one of
the fields.

(Required by a following patch that adds these fields to sysfs.)

Signed-off-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:04 +00:00
Milan Broz
ac1f0ac22c dm log: ensure log bitmap fits on log device
Check that the log bitmap will fit within the log device.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:02 +00:00
Milan Broz
2045e88edb dm log: move region_size validation
Move log size validation from mirror target to log constructor.

Removed PAGE_SIZE restriction we no longer think necessary.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:05:01 +00:00
Takahiro Yasui
6f3af01cb0 dm log: avoid reinitialising io_req on every operation
rw_header function updates three members of io_req data every time
when I/O is processed. bi_rw and notify.fn are never modified once
they get initialized, and so they can be set in advance.

header_to_disk() can also be pulled out of write_header() since only one
caller needs it and write_header() can be replaced by rw_header()
directly.

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:59 +00:00
Mikulas Patocka
10d3bd09a3 dm: consolidate target deregistration error handling
Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:58 +00:00
Jonathan Brassow
d460c65a6a dm raid1: fix error count
Always increase the error count when I/O on a leg of a mirror fails.

The error count is used to decide whether to select an alternative
mirror leg.  If the target doesn't use the "handle_errors" feature, the
error count is not updated and the bio can get requeued forever by the
read callback.

Fix it by increasing error_count before the handle_errors feature
checking.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:57 +00:00
Takahiro Yasui
c7a2bd19b7 dm log: fix dm_io_client leak on error paths
In create_log_context function, dm_io_client_destroy function needs
to be called, when memory allocation of disk_header, sync_bits and
recovering_bits failed, but dm_io_client_destroy is not called.

Cc: stable@kernel.org
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Acked-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:56 +00:00
Mikulas Patocka
90fa1527bd dm snapshot: change yield to msleep
Change yield() to msleep(1). If the thread had realtime priority,
yield() doesn't really yield, so the yielding process would loop
indefinitely and cause machine lockup.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:54 +00:00
Mikulas Patocka
a1b51e9867 dm table: drop reference at unbind
Move one dm_table_put() so that the last reference in the thread
gets dropped in __unbind().

This is required for a following patch,
dm-table-rework-reference-counting.patch, which will change the logic in
such a way that table destructor is called only at specific points in
the code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06 03:04:53 +00:00
Jens Axboe
bb799ca020 bio: allow individual slabs in the bio_set
Instead of having a global bio slab cache, add a reference to one
in each bio_set that is created. This allows for personalized slabs
in each bio_set, so that they can have bios of different sizes.

This means we can personalize the bios we return. File systems may
want to embed the bio inside another structure, to avoid allocation
more items (and stuffing them in ->bi_private) after the get a bio.
Or we may want to embed a number of bio_vecs directly at the end
of a bio, to avoid doing two allocations to return a bio. This is now
possible.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:29:23 +01:00
Ingo Molnar
db8862eafe Merge branch 'linus' into tracing/hw-branch-tracing 2008-12-24 21:08:26 +01:00
NeilBrown
a2ed9615e3 md: Don't read past end of bitmap when reading bitmap.
When we read the write-intent-bitmap off the device, we currently
read a whole number of pages.
When PAGE_SIZE is 4K, this works due to the alignment we enforce
on the superblock and bitmap.
When PAGE_SIZE is 64K, this case read past the end-of-device
which causes an error.

When we write the superblock, we ensure to clip the last page
to just be the required size.  Copy that code into the read path
to just read the required number of sectors.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: stable@kernel.org
2008-12-19 16:25:01 +11:00
Ingo Molnar
970987beb9 Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core 2008-12-05 14:45:22 +01:00
Milan Broz
0e435ac26e block: fix setting of max_segment_size and seg_boundary mask
Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
devices.

When stacking devices (LVM over MD over SCSI) some of the request queue
parameters are not set up correctly in some cases by default, namely
max_segment_size and and seg_boundary mask.

If you create MD device over SCSI, these attributes are zeroed.

Problem become when there is over this mapping next device-mapper mapping
- queue attributes are set in DM this way:

request_queue   max_segment_size  seg_boundary_mask
SCSI                65536             0xffffffff
MD RAID1                0                      0
LVM                 65536                 -1 (64bit)

Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
physical segments according to these parameters.

During the generic_make_request() is segment cout recalculated and can
increase bio->bi_phys_segments count over the allowed limit.  (After
bio_clone() in stack operation.)

Thi is specially problem in CCISS driver, where it produce OOPS here

    BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);

(MAXSEGENTRIES is 31 by default.)

Sometimes even this command is enough to cause oops:

  dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10

This command generates bios with 250 sectors, allocated in 32 4k-pages
(last page uses only 1024 bytes).

For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
unfortunatelly on lower layer it is recalculated to 32 segments and this
violates CCISS restriction and triggers BUG_ON().

The patch tries to fix it by:

 * initializing attributes above in queue request constructor
   blk_queue_make_request()

 * make sure that blk_queue_stack_limits() inherits setting

 (DM uses its own function to set the limits because it
 blk_queue_stack_limits() was introduced later.  It should probably switch
 to use generic stack limit function too.)

 * sets the default seg_boundary value in one place (blkdev.h)

 * use this mask as default in DM (instead of -1, which differs in 64bit)

Bugs related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=471639
http://bugzilla.kernel.org/show_bug.cgi?id=8672

Signed-off-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-03 12:55:55 +01:00
Ingo Molnar
0bfc24559d blktrace: port to tracepoints, update
Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
sites. Spread them out to the usage sites, as suggested by
Mathieu Desnoyers.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2008-11-26 13:04:35 +01:00
Arnaldo Carvalho de Melo
5f3ea37c77 blktrace: port to tracepoints
This was a forward port of work done by Mathieu Desnoyers, I changed it to
encode the 'what' parameter on the tracepoint name, so that one can register
interest in specific events and not on classes of events to then check the
'what' parameter.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 12:13:34 +01:00
Chandra Seetharaman
8a57dfc6f9 dm: avoid destroying table in dm_any_congested
dm_any_congested() just checks for the DMF_BLOCK_IO and has no
code to make sure that suspend waits for dm_any_congested() to
complete.  This patch adds such a check.

Without it, a race can occur with dm_table_put() attempting to
destroying the table in the wrong thread, the one running
dm_any_congested() which is meant to be quick and return
immediately.

Two examples of problems:
1. Sleeping functions called from congested code, the caller
   of which holds a spin lock.
2. An ABBA deadlock between pdflush and multipathd. The two locks
   in contention are inode lock and kernel lock.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:39:14 +00:00
Mikulas Patocka
d221d2e776 dm: move pending queue wake_up end_io_acct
This doesn't fix any bug, just moves wake_up immediately after decrementing
md->pending, for better code readability.

It must be clear to anyone manipulating md->pending to wake up
the queue if md->pending reaches zero, so move the wakeup as close to
the decrementing as possible.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:39:10 +00:00
Chandra Seetharaman
14e98c5ca8 dm mpath: warn if args ignored
Currently dm ignores the parameters provided to hardware handlers
without providing any notifications to the user.

This patch just prints a warning message so that the user knows that
the arguments are ignored.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:39:06 +00:00
Chandra Seetharaman
b81aa1c792 dm mpath: avoid attempting to activate null path
Path activation code is called even when the pgpath is NULL. This could
lead to a panic in activate_path(). Such a panic is seen in -rt kernel.

This problem has been there before the pg_init() was moved to a
workqueue.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:39:00 +00:00
Heinz Mauelshagen
6edebdee48 dm stripe: fix init failure
Don't proceed if dm_stripe_init() fails to register itself as a dm target.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-11-13 23:38:56 +00:00
Mikulas Patocka
18776c7316 dm raid1: flush workqueue before destruction
We queue work on keventd queue --- so this queue must be flushed in the
destructor. Otherwise, keventd could access mirror_set after it was freed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
2008-11-13 23:38:52 +00:00
Andre Noll
f1cd14ae52 md: linear: Fix a division by zero bug for very small arrays.
We currently oops with a divide error on starting a linear software
raid array consisting of at least two very small (< 500K) devices.

The bug is caused by the calculation of the hash table size which
tries to compute sector_div(sz, base) with "base" being zero due to
the small size of the component devices of the array.

Fix this by requiring the hash spacing to be at least one which
implies that also "base" is non-zero.

This bug has existed since about 2.6.14.

Cc: stable@kernel.org
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-06 19:41:24 +11:00
NeilBrown
a53a6c8575 md: fix bug in raid10 recovery.
Adding a spare to a raid10 doesn't cause recovery to start.
This is due to an silly type in
  commit 6c2fce2ef6
and so is a bug in 2.6.27 and .28-rc.

Thanks to Thomas Backlund for bisecting to find this.

Cc: Thomas Backlund <tmb@mandriva.org>
Cc: stable@kernel.org

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-06 17:28:20 +11:00
NeilBrown
cb3ac42b8a md: revert the recent addition of a call to the BLKRRPART ioctl.
It turns out that it is only safe to call blkdev_ioctl when the device
is actually open (as ->bd_disk is set to NULL on last close).  And it
is quite possible for do_md_stop to be called when the device is not
open.  So discard the call to blkdev_ioctl(BLKRRPART) which was
added in
   commit 934d9c23b4

It is just as easy to call this ioctl from userspace when needed (on
mdadm -S) so leave it out of the kernel

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-06 17:28:01 +11:00
Linus Torvalds
721d5dfe7e Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: destroy partitions and notify udev when md array is stopped.
2008-10-30 18:36:16 -07:00
Mikulas Patocka
879129d208 dm snapshot: wait for chunks in destructor
If there are several snapshots sharing an origin and one is removed
while the origin is being written to, the snapshot's mempool may get
deleted while elements are still referenced.

Prior to dm-snapshot-use-per-device-mempools.patch the pending
exceptions may still have been referenced after the snapshot was
destroyed, but this was not a problem because the shared mempool
was still there.

This patch fixes the problem by tracking the number of mempool elements
in use.

The scenario:
- You have an origin and two snapshots 1 and 2.
- Someone writes to the origin.
- It creates two exceptions in the snapshots, snapshot 1 will be primary
exception, snapshot 2's pending_exception->primary_pe will point to the
exception in snapshot 1.
- The exceptions are being relocated, relocation of exception 1 finishes
(but it's pending_exception is still allocated, because it is referenced
by an exception from snapshot 2)
- The user lvremoves snapshot 1 --- it calls just suspend (does nothing)
and destructor. md->pending is zero (there is no I/O submitted to the
snapshot by md layer), so it won't help us.
- The destructor waits for kcopyd jobs to finish on snapshot 1 --- but
there are none.
- The destructor on snapshot 1 cleans up everything.
- The relocation of exception on snapshot 2 finishes, it drops reference
on primary_pe. This frees its primary_pe pointer. Primary_pe points to
pending exception created for snapshot 1. So it frees memory into
non-existing mempool.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-30 13:33:16 +00:00
Mikulas Patocka
60c856c8e2 dm snapshot: fix register_snapshot deadlock
register_snapshot() performs a GFP_KERNEL allocation while holding
_origins_lock for write, but that could write out dirty pages onto a
device that attempts to acquire _origins_lock for read, resulting in
deadlock.

So move the allocation up before taking the lock.

This path is not performance-critical, so it doesn't matter that we
allocate memory and free it if we find that we won't need it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-30 13:33:12 +00:00
Ilpo Jarvinen
b34578a484 dm raid1: fix do_failures
Missing braces.  Commit 1f965b1943 (dm raid1: separate region_hash interface
part1) broke it.

Signed-off-by: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Heinz Mauelshagen <hjm@redhat.com>
2008-10-30 13:33:07 +00:00
NeilBrown
934d9c23b4 md: destroy partitions and notify udev when md array is stopped.
md arrays are not currently destroyed when they are stopped - they
remain in /sys/block.  Last time I tried this I tripped over locking
too much.

A consequence of this is that udev doesn't remove anything from /dev.
This is rather ugly.

As an interim measure until proper device removal can be achieved,
make sure all partitions are removed using the BLKRRPART ioctl, and
send a KOBJ_CHANGE when an md array is stopped.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-10-28 17:01:23 +11:00
Linus Torvalds
f8d56f1771 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: allow extended partitions on md devices.
  md: use sysfs_notify_dirent to notify changes to md/dev-xxx/state
  md: use sysfs_notify_dirent to notify changes to md/array_state
2008-10-26 16:42:18 -07:00
Linus Torvalds
2248485640 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
  [PATCH] kill the rest of struct file propagation in block ioctls
  [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
  [PATCH] get rid of blkdev_locked_ioctl()
  [PATCH] get rid of blkdev_driver_ioctl()
  [PATCH] sanitize blkdev_get() and friends
  [PATCH] remember mode of reiserfs journal
  [PATCH] propagate mode through swsusp_close()
  [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
  [PATCH] pass fmode_t to blkdev_put()
  [PATCH] kill the unused bsize on the send side of /dev/loop
  [PATCH] trim file propagation in block/compat_ioctl.c
  [PATCH] end of methods switch: remove the old ones
  [PATCH] switch sr
  [PATCH] switch sd
  [PATCH] switch ide-scsi
  [PATCH] switch tape_block
  [PATCH] switch dcssblk
  [PATCH] switch dasd
  [PATCH] switch mtd_blkdevs
  [PATCH] switch mmc
  ...
2008-10-23 10:23:07 -07:00
Linus Torvalds
5ed487bc2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
  [PATCH] fs: add a sanity check in d_free
  [PATCH] i_version: remount support
  [patch] vfs: make security_inode_setattr() calling consistent
  [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
  [PATCH] move executable checking into ->permission()
  [PATCH] fs/dcache.c: update comment of d_validate()
  [RFC PATCH] touch_mnt_namespace when the mount flags change
  [PATCH] reiserfs: add missing llseek method
  [PATCH] fix ->llseek for more directories
  [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
  [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
  [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
  [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
  [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
  [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
  [PATCH] get rid of on-stack dentry in udf
  [PATCH 2/2] anondev: switch to IDA
  [PATCH 1/2] anondev: init IDR statically
  [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
  [PATCH] Optimise NFS readdir hack slightly.
  ...
2008-10-23 10:22:40 -07:00
Christoph Hellwig
72e8264eda [PATCH] dm: kill lookup_device wrapper
Now that lookup_bdev is exported and used by dm just use it directly
instead of through a trivial wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:12:57 -04:00
Kiyoshi Ueda
51157b4ab4 dm: tidy local_init
This patch tidies local_init() in preparation for request-based dm.
No functional change.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:08 +01:00
Kiyoshi Ueda
f431d9666f dm: remove unused flush_all
This patch removes the DM_WQ_FLUSH_ALL state that is unnecessary.

The dm_queue_flush(md, DM_WQ_FLUSH_ALL, NULL) in dm_suspend()
is never invoked because:
  - 'goto flush_and_out' is the same as 'goto out' because
    the 'goto flush_and_out' is called only when '!noflush'
  - If r is non-zero, then the code above will invoke 'goto out'
    and skip this code.

No functional change.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:07 +01:00
Heinz Mauelshagen
1f965b1943 dm raid1: separate region_hash interface part1
Separate the region hash code from raid1 so it can be shared by forthcoming
targets.  Use BUG_ON() for failed async dm_io() calls.

Signed-off-by: Heinz Mauelshagen <hjm@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:06 +01:00
Martin K. Petersen
f3e1d26ede dm: mark split bio as cloned
When a bio gets split, mark its fragments with the BIO_CLONED flag.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:04 +01:00
Milan Broz
0a4a1047a4 dm crypt: remove waitqueue
Remove waitqueue no longer needed with the async crypto interface.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:03 +01:00
Milan Broz
393b47ef23 dm crypt: fix async split
When writing io, dm-crypt has to allocate a new cloned bio
and encrypt the data into newly-allocated pages attached to this bio.
In rare cases, because of hw restrictions (e.g. physical segment limit)
or memory pressure, sometimes more than one cloned bio has to be used,
each processing a different fragment of the original.

Currently there is one waitqueue which waits for one fragment to finish
and continues processing the next fragment.

But when using asynchronous crypto this doesn't work, because several
fragments may be processed asynchronously or in parallel and there is
only one crypt context that cannot be shared between the bio fragments.
The result may be corruption of the data contained in the encrypted bio.

The patch fixes this by allocating new dm_crypt_io structs (with new
crypto contexts) and running them independently.

The fragments contains a pointer to the base dm_crypt_io struct to
handle reference counting, so the base one is properly deallocated
after all the fragments are finished.

In a low memory situation, this only uses one additional object from the
mempool.  If the mempool is empty, the next allocation simple waits for
previous fragments to complete.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:02 +01:00
Milan Broz
b635b00e0e dm crypt: tidy sector
Prepare local sector variable (offset) for later patch.
Do not update io->sector for still-running I/O.

No functional change.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21 17:45:00 +01:00