Commit Graph

177 Commits (0b56306e56784d0513e1193d58c05a6bd97bd1a9)

Author SHA1 Message Date
Adrian Bunk 0b56306e56 [PATCH] drivers/md/kcopyd.c: #if 0 kcopyd_cancel()
This patch #if 0's the not yet implemented global function kcopyd_cancel().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Alasdair G Kergon 6da487dcc0 [PATCH] device-mapper ioctl: add skip lock_fs flag
Add ioctl DM_SKIP_LOCKFS_FLAG for userspace to request that lock_fs is
bypassed when suspending a device.

There's no change to the behaviour of existing code that doesn't know about
the new flag.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Alasdair G Kergon aa8d7c2fbe [PATCH] device-mapper: make lock_fs optional
Devices only needs syncing when creating snapshots, so make this optional when
suspending a device.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Alasdair G Kergon e39e2e95eb [PATCH] device-mapper: rename frozen_bdev
Rename frozen_bdev to suspended_bdev and move the bdget outside lockfs.  (This
prepares for making lockfs optional.)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Jonathan E Brassow a1a1908070 [PATCH] device-mapper raid1: add default mirror
This patch introduces a new field to the mirror_set (default_mirror) to store
the default mirror.

(A subsequent patch will allow us to change the default mirror in the event of
a failure.)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Alasdair G Kergon 2d5fe68987 [PATCH] device-mapper: scanf sector format change
Use %llu not %Lu in sscanf/printf format strings.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Andrew Stribblehill e6c276159c [PATCH] device-mapper: remove unused definition
This patch removes an unused #define.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Alasdair G Kergon 2d38fe2044 [PATCH] device-mapper snapshot: metadata reading separation
More snapshot metadata reading into separate function, to prepare for changing
the place it gets called from.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
goggin, edward 81f1777a55 [PATCH] device-mapper ioctl: event on rename
After changing the name of a mapped device, trigger a dm event.  (For
userspace multipath tools.)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
David Teigland d229a9589f [PATCH] device-mapper: add dm_get_md
Add dm_get_dev() to get a mapped device given its dev_t.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
David Teigland 637842cfdb [PATCH] device-mapper: add dm_find_md
Abstract dm_find_md() from dm_get_mdptr() to allow use elsewhere.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
Linus Torvalds 25c862cc9e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild 2006-01-04 16:36:52 -08:00
Linus Torvalds f61ea1b0c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2006-01-04 16:30:12 -08:00
Brian Gerst 352dd1df32 gitignore: misc files
Ignore all files generated from *_shipped files, plus a few others.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-01-01 22:21:50 +01:00
Neil Brown bcb97940f3 [PATCH] md: Change case of raid level reported in sys/mdX/md/level
I had thought that keeping the reported tail level clearly different
from the module name was a good idea, but I've changed my mind.

'raid5' is better and probably less confusing than 'RAID-5'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-19 16:47:50 -08:00
Mike Christie defd94b754 [SCSI] seperate max_sectors from max_hw_sectors
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.

Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15 15:11:40 -08:00
NeilBrown 5036805be7 [PATCH] md: use correct size of raid5 stripe cache when measuring how full it is
The raid5 stripe cache was recently changed from fixed size (NR_STRIPES) to
variable size (conf->max_nr_stripes).  However there are two places that still
use the constant and as a result, reducing the size of the stripe cache can
result in a deadlock.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 09:06:04 -08:00
NeilBrown 3795bb0fc5 [PATCH] md: fix a use-after-free bug in raid1
Who would submit code with a FIXME like that in it !!!!

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 09:06:04 -08:00
NeilBrown 6aea114a72 [PATCH] md: fix --re-add for raid1 and raid6
If you have an array with a write-intent-bitmap, and you remove a device, then
re-add it, a full recovery isn't needed.  We detect a re-add by looking at
saved_raid_disk.  For raid1, it doesn't matter which disk it was, only whether
or not it was an active device.  The old code being removed set a value of
'mirror' which was then ignored, so it can go.  The changed code performs the
correct check.

For raid6, if there are two missing devices, make sure we chose the right slot
on --re-add rather than always the first slot.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
NeilBrown b2a2703c28 [PATCH] md: set default_bitmap_offset properly in set_array_info
If an array is created using set_array_info, default_bitmap_offset isn't set
properly meaning that an internal bitmap cannot be hot-added until the array
is stopped and re-assembled.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown b5ab28a3b8 [PATCH] md: fix problem with raid6 intent bitmap
When doing a recovery, we need to know whether the array will still be
degraded after the recovery has finished, so we can know whether bits can be
clearred yet or not.  This patch performs the required check.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown 700e432d83 [PATCH] md: fix locking problem in r5/r6
bitmap_unplug actually writes data (bits) to storage, so we shouldn't be
holding a spinlock...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown 22dfdf5212 [PATCH] md: improve read speed to raid10 arrays using 'far copies'
raid10 has two different layouts.  One uses near-copies (so multiple
copies of a block are at the same or similar offsets of different
devices) and the other uses far-copies (so multiple copies of a block
are stored a greatly different offsets on different devices).  The point
of far-copies is that it allows the first section (normally first half)
to be layed out in normal raid0 style, and thus provide raid0 sequential
read performance.

Unfortunately, the read balancing in raid10 makes some poor decisions
for far-copies arrays and you don't get the desired performance.  So
turn off that bad bit of read_balance for far-copies arrays.

With this patch, read speed of an 'f2' array is comparable with a raid0
with the same number of devices, though write speed is ofcourse still
very slow.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
Jonathan E Brassow 7692c5dd48 [PATCH] device-mapper raid1: drop mark_region spinlock fix
The spinlock region_lock is held while calling mark_region which can sleep.
Drop the spinlock before calling that function.

A region's state and inclusion in the clean list are altered by rh_inc and
rh_dec.  The state variable is set to RH_CLEAN in rh_dec, but only if
'pending' is zero.  It is set to RH_DIRTY in rh_inc, but not if it is already
so.  The changes to 'pending', the state, and the region's inclusion in the
clean list need to be atomicly.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
jblunck@suse.de 233886dd32 [PATCH] device-mapper snapshot: bio_list fix
bio_list_merge() should do nothing if the second list is empty - not oops.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Stefan Bader 640eb3b045 [PATCH] device-mapper dm-mpath: endio spinlock fix
do_end_io() can be called without interrupts blocked.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Alasdair G Kergon 0e56822d30 [PATCH] device-mapper: mirror log bitset fix
The linux bitset operators (test_bit, set_bit etc) work on arrays of "unsigned
long".  dm-log uses such bitsets but treats them as arrays of uint32_t, only
allocating and zeroing a multiple of 4 bytes (as 'clean_bits' is a uint32_t).

The patch below fixes this problem.

The problem is specific to 64-bit big endian machines such as s390x or ppc-64
and can prevent pvmove terminating.

In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
   0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).

   0 0 0 0 X X X X
                 ^
                 here

which hasn't been cleared properly.

As the dm-raid1 code only syncs and counts regions which have a 0 in the
'sync_bits' bitset, and only finishes when it has counted high enough, a large
number of 1's among those 'X's will cause the sync to not complete.

It is worth noting that the code uses the same bitsets for in-memory and
on-disk logs.  As these bitsets are host-endian and host-sized, this means
that they cannot safely be moved between computers with

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Alasdair G Kergon c4cc66351a [PATCH] device-mapper: list_versions fix
In some circumstances the LIST_VERSIONS output is truncated because the size
calculation forgets about a 'uint32_t' in each structure - but the inclusion
of the whole of ALIGN_MASK frequently compensates for the omission.

This is a quick workaround to use an upper bound.  (The code ought to be fixed
to supply the actual size.)

Running 'dmsetup targets' may demonstrate the problem: when I run it, the last
line comes out as 'erro' instead of 'error'.  Consequently, 'lvcreate --type
error' doesn't work.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Kiyoshi Ueda b6fcc80d03 [PATCH] device-mapper dm-ioctl: missing put in table load error case
An error path in table_load() forgets to release a table that won't now be
referenced.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:30 -08:00
NeilBrown c0e485216d [PATCH] md: fix is_mddev_idle calculation now that disk/sector accounting happens when request completes
md needs to monitor the rate of requests to its devices when doing
resync/recovery so that it can back-off when there is non-resync IO.  It
does this by comparing resync IO, which it counts, with total IO which is
taken from disk_stats.

disk_stats were recently changed to account sectors when a request
completes instead of when it is queued.  This upsets md's calculations.

We could do the sync_io accounting at the end of requests too, but that has
problems.  If an underlying device is an md array, the accounting will
still be done when the request is submitted.  This could be changed for
some raid levels, but it cannot be changed for raid0 or linear without
substantial code changes.

So instead, we increase the error that is_mddev_idle allows, up to the
maximum amount of resync IO that can be in flight at any time.  The
calculation is current fragile as each personality as different limits for
in-flight resync.  This should be fixed up.

For now, this simple patch fixes the problem.

Increasing the error margin decreases the sensitivity to non-resync IO.  To
partially compensate for this, the time to wait when non-resync IO is
detected is increased so that less steady IO is required to keep the resync
at bay.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-18 07:49:46 -08:00
Neil Brown 34ef75f09f [PATCH] md: don't pass a NULL file* into ->prepare_write()
Some filesystems go oops.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-18 07:49:46 -08:00
NeilBrown 93588e2284 [PATCH] md: make md threads interruptible again
Despite the fact that md threads don't need to be signalled, and won't
respond to signals anyway, we need to have an 'interruptible' wait, else
they stay in 'D' state and add to the load average.

(akpm: the signal_pending() test is unneeded - we'll fix that up in the next
round.  For now, leave it there because that's how the code used to be).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
NeilBrown e8a0033451 [PATCH] md: mark START_ARRAY deprecated with a date
This was marked deprecated "after 2.6" back in the 2.5 days.  But now it
seems there isn't going to be any "after 2.6", and we deprecate by date
now.  So set a date.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
NeilBrown bb636547b0 [PATCH] md: document sysfs usage of md, and make a couple of small refinements
Document in Documentation/md.txt the files that now appear in sysfs, and make
a couple of small refinements to exactly when 'level' and 'raid_disks' are
empty, to make it match the documentation.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown 7eec314d75 [PATCH] md: improve 'scan_mode' and rename it to 'sync_action'
The current sync_action for an array can be one of

   idle  - nothing happening
   resync - reduncancy being recalcualted
   recover - missing device being recoverred to spare
   check   - user initiated check of redundancy
   repair  - like resync but user-initiated and ignores
             bitmap optimisation.

Each of these strings can also be written to the 'sync_action' file to cause
that action to happen (if appropriate).

While 'sync' is not technically correct, as a recovery is *not* a 'sync', I
think it is the most servicable word here.  Also 'action' is a strong word
than 'mode'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown 787453c239 [PATCH] md: complete conversion of md to use kthreads
There are a few loose ends following the conversion of md to use kthreads:

- Some fields in mdk_thread_t that aren't needed (kthreads does it's own
  completion and manages it's own name).

- thread->run is now never NULL, so no need to check

- Some tests for signal_pending that aren't needed (As we don't use signals
  to stop threads any more)

- Some flush_signals are not needed

- Some waits are interruptible and don't need to be.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown fd9d49cac4 [PATCH] md: ignore auto-readonly flag for arrays where it isn't meaningful
The 'auto-readonly' flag (which suppresses resync and superblock updates until
the first write) is not meaningful for personalities that don't support resync
or superblock writes (raid0, linear, etc).

So clear the setting early to avoid it confusing anything - e.g.  appearing in
/proc/mdstat

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown 8e1b39d623 [PATCH] md: only try to print recovery/resync status for personalities that support recovery
The introduction of 'resync=PENDING' (for read-only devices) caused that
message to appear for non-syncable arrays like raid0 and linear.  Simplest
thing is to not try to print any resync info unless the personality clearly
supports it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown 411036fa19 [PATCH] md: split off some md attributes in sysfs to a separate group
Some, but not all, md array support data redundancy and hence support checking
and restoring that redundancy (resync, rebuild).

Some attributes apply specifically to functions involving this redundancy, and
so should only appear for md arrays for which they are meaningful.  i.e.  they
should not appear for raid0, linear, multpath, faulty.

This patch separates these into a distinct group and creates the group only if
the personality supports sync_request.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown 96de1e663c [PATCH] md: fix some locking and module refcounting issues with md's use of sysfs
1/ I really should be using the __ATTR macros for defining attributes, so
   that the .owner field get set properly, otherwise modules can be removed
   while sysfs files are open.  This also involves some name changes of _show
   routines.

2/ Always lock the mddev (against reconfiguration) for all sysfs attribute
   access.  This easily avoid certain races and is completely consistant with
   other interfaces (ioctl and /proc/mdstat both always lock against
   reconfiguration).

3/ raid5 attributes must check that the 'conf' structure actually exists
   (the array could have been stopped while an attribute file was open).

4/ A missing 'kfree' from when the raid5_conf_t was converted to have a
   kobject embedded, and then converted back again.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown 3855ad9f39 [PATCH] md: make sure a user-request sync of raid5 ignores intent bitmap
A sync of raid5 usually ignore blocks which the bitmap says are in-sync.  But
a user-request check or repair should not ignore these.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown e5de485f00 [PATCH] md: make manual repair work for raid1
Raid1 currently optimises resync using the intent bitmap etc.  This
optimisation is not wanted when we explicitly request a repair through sysfs,
so add appropriate checks.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown f637b9f9fc [PATCH] md: make sure /block link in /sys/.../md/ goes to correct devices
If a block_device is a partition, then it's kobject is
  bdev->bd_part->kobj
otherwise (if it is a full device), the kobject is
  bdev->bd_disk->kobj

As md wants back-links to the correct object (whether partition or not), we
need to respect this difference...  (Thus current code shows a link to the
whole device, whether we are using a partition or not, which is wrong).

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown f91de92ed6 [PATCH] md: allow md arrays to be started read-only (module parameter).
When an md array is started, the superblock will be written, and resync may
commense.  This is not good if you want to be completely read-only as, for
example, when preparing to resume from a suspend-to-disk image.

So introduce a module parameter "start_ro" which can be set
to '1' at boot, at module load, or via
  /sys/module/md_mod/parameters/start_ro

When this is set, new arrays get an 'auto-ro' mode, which disables all
internal io (superblock updates, resync, recovery) and is automatically
switched to 'rw' when the first write request arrives.

The array can be set to true 'ro' mode using 'mdadm -r' before the first
write request, or resync can be started without a write using 'mdadm -w'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown 19133a4298 [PATCH] md: Remove attempt to use dynamic names in sysfs for component devices on an MD array.
With version-0.90 superblock, component devices on an md device to not have
any stable name related to the array -(version-1 assigns a fixed index when
a device is added to an array, and this remains despit any hot-swap).

The intial code for making these devices appear in sysfs used dynamic
names, which would change whenever a hot-spare was swapped for a failed or
missing device.  This turns out not to be practical in sysfs for a number
of reasons.

This patch changes then naming of component devices to be based on the
result of 'bdevname'.  This is stable and should be unique.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown a9701a3047 [PATCH] md: support BIO_RW_BARRIER for md/raid1
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....

So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.

We initially assumes barriers are OK.

When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers.  This will usually clear the flags fairly quickly.

If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.

When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried.  So raid1d is enhanced to do this,
and when any bio write completes (i.e.  no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.

We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
  1/ the device used to support BARRIER, but now doesn't.  Few devices
     change like this, though raid1 can!
or
  2/ the array has no persistent superblock, so there was no opportunity to
     pre-test for barriers when writing the superblock.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown bd926c63b7 [PATCH] md: make md on-disk bitmaps not host-endian
Current bitmaps use set_bit et.al and so are host-endian, which means
not-portable.  Oops.

Define a new version number (4) for which bitmaps are little-endian.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown b2d444d7ad [PATCH] md: convert 'faulty' and 'in_sync' fields to bits in 'flags' field
This has the advantage of removing the confusion caused by 'rdev_t' and
'mddev_t' both having 'in_sync' fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown ba22dcbf10 [PATCH] md: improvements to raid5 handling of read errors
Two refinements to the 'attempt-overwrite-on-read-error' mechanism.
1/ If the array is read-only, don't attempt an over-write.
2/ If there are more than max_nr_stripes read errors on a device with
   no success, fail the drive.  This will make sure a dead
   drive will be eventually kicked even when we aren't trying
   to rewrite (which would normally kick a dead drive more quickly.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown 007583c925 [PATCH] md: change raid5 sysfs attribute to not create a new directory
There isn't really a need for raid5 attributes to be an a subdirectory,
so this patch moves them from
  /sys/block/mdX/md/raid5/attribute
to
  /sys/block/mdX/md/attribute

This suggests that all md personalities should co-operate about
namespace usage, but that shouldn't be a problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00