Commit graph

293 commits

Author SHA1 Message Date
Philipp Reisner
f91ab6282d drbd: Implemented side-stepping in drbd_res_begin_io()
Before:
  drbd_rs_begin_io() locked app-IO out of an RS extent, and
  waited then until all previous app-IO in that area finished.
  (But not only until the disk-IO was finished but until the
   barrier/epoch ack came in for that == round trip time latency ++)

After:
  As soon as a new app-IO waits wants to start new IO on that
  RS extent, drbd_rs_begin_io() steps aside (clearing the
  BME_NO_WRITES flag again). It retries after 100ms.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:56 +01:00
Philipp Reisner
9d77a5fee9 drbd: Make some functions static
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:54 +01:00
Philipp Reisner
e3555d8545 drbd: Implemented priority inheritance for resync requests
We only issue resync requests if there is no significant application IO
going on. = Application IO has higher priority than resnyc IO.

If application IO can not be started because the resync process locked
an resync_lru entry, start the IO operations necessary to release the
lock ASAP.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:53 +01:00
Philipp Reisner
59817f4fab drbd: Do not cleanup resync LRU for the Ahead/Behind SyncSource/SyncTarget transitions
This one should be replaced with moving this cleanup to the
'right' position.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:51 +01:00
Philipp Reisner
c4752ef128 drbd: When proxy's buffer drained off go into regular resync mode
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:49 +01:00
Philipp Reisner
73a01a18b9 drbd: New packet for Ahead/Behind mode: P_OUT_OF_SYNC
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:48 +01:00
Philipp Reisner
67531718d8 drbd: Implemented two new connection states Ahead/Behind
In this connection mode, the ahead node no longer replicates
application IO. The behind's disk becomes out dated.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:46 +01:00
Philipp Reisner
422028b1ca drbd: New configuration parameters for dealing with network congestion
net {
    on_congestion {block|pull-ahead|disconnect};
    congestion-fill {sectors};
    congestion-extents {al-extents};
}

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:45 +01:00
Philipp Reisner
759fbdfba6 drbd: Track the numbers of sectors in flight
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:43 +01:00
Lars Ellenberg
688593c5a8 drbd: Renamed write_flags_to_bio() to wire_flags_to_bio()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:34:32 +01:00
Lars Ellenberg
4896e8c1b8 drbd: restore compatibility with 32bit kernels
With commit
drbd: further converge progress display of resync and online-verify
accidentally an u64/u64 div was introduced, causing an unresolvable
symbol __udivdi3 to be reference. Actually for that division, 32bit are
still suficient for now, so we can revert to unsigned long instead.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:13 +01:00
Lars Ellenberg
1816a2b47a drbd: properly use max_hw_sectors to limit the our bio size
To ease tracking of bios in some hash tables, we want it to
not cross certain boundaries (128k, used to be 32k).
We limit the maximum bio size using queue parameters.

Historically some defines and variables we use there have been named
max_segment_size, which was misguided. Rename them to max_bio_size,
and use [blk_]queue_max_hw_sectors where appropriate.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:11 +01:00
Lars Ellenberg
3129b1b9ae drbd: debug: limit nelink-broadcast of request on digest mismatch to 32k
We used to be limited to 32k requests,
but have increased that limit to 128k now.

This part of the code can only deal with 32k,
it would scramble arbitrary pages for larger requests.

As it is used for debugging only anyways,
it is ok to simply truncate the dumped data here.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:09 +01:00
Lars Ellenberg
470be44ab1 drbd: detect modification of in-flight buffers
With data-integrity digest enabled, double-check on the sending side
for modifications by upper layers of buffers under write back,
so we can tell it appart from corruption on the "wire".

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:08 +01:00
Lars Ellenberg
5f9915bbb8 drbd: further converge progress display of resync and online-verify
Show progressbar and ETA always, with proc_details >= 1 also show the
current sector position for both resync and online-verify on both nodes.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:06 +01:00
Lars Ellenberg
18edc0b9d7 drbd: fix potential wrap of 32bit oos:%lu display in /proc/drbd
When converting bits (4k resolution, still) to kB, we shift left.  If it
was a large number of bits on a 32bit box (>= 4 TiB storage), we may
wrap the 32bit unsigned long base type, resulting in incorrect display.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:04 +01:00
Lars Ellenberg
2649f0809f drbd: use the resync controller for online-verify requests as well
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:03 +01:00
Lars Ellenberg
e65f440d47 drbd: factor out drbd_rs_number_requests
Preparation patch to be able to use the auto-throttling resync controller
for online-verify requests as well.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:19:01 +01:00
Lars Ellenberg
9bd28d3c90 drbd: factor out drbd_rs_controller_reset
Preparation patch to be able to use the auto-throttling resync controller
for online-verify requests as well.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:59 +01:00
Lars Ellenberg
439d595379 drbd: show progress bar and ETA for online-verify
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:58 +01:00
Lars Ellenberg
ea5442aff6 drbd: advance progress step marks for online-verify
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:56 +01:00
Lars Ellenberg
c6ea14dfa3 drbd: factor out advancement of resync marks for progress reporting
This is in preparation to unify progress reporting of
online-verify and resync requests.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:54 +01:00
Lars Ellenberg
de228bba67 drbd: initialize online-verify progress tracking on verify target
For partial (resumed) online verify, initialize the resync step marks
once we know what the online verify start sector is.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:53 +01:00
Lars Ellenberg
30b743a2d5 drbd: improve online-verify progress tracking
For a partial (resumed) online-verify, initialize rs_total not to total
bits, but to number of bits to check in this run, to match the meaning
rs_total has for actual resync.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:51 +01:00
Lars Ellenberg
2652561886 drbd: only reset online-verify start sector if verify completed
For network hickups during online-verify, on the next verify
triggered, we by default want to resume where it left off.

After any replication link interruption, there will be a (possibly
empty) resync.  Do not reset online-verify start sector if some resync
completed, that would defeats the purpose.

Only reset the start sector once a verify run is completed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10 11:18:49 +01:00
Jens Axboe
721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe
7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Linus Torvalds
275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Lars Ellenberg
a115413de1 drbd: fix for spin_lock_irqsave in endio callback
In commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb,
 Author: Lars Ellenberg <lars.ellenberg@linbit.com>
 Date:   Wed Aug 11 23:40:24 2010 +0200

    drbd: new configuration parameter c-min-rate

a bad chunk slipped through, which is now reverted as well,
restoring the correct irqsave for the endio callback.

This patch also add comments at both req_mod()
and in the endio callback so it should not happen again.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-11-27 19:50:43 +01:00
Lars Ellenberg
c13f7e1a94 drbd: don't recvmsg with zero length
This should fix a performance degradation we observed recently.

If we don't expect any subheader, we should not call into the tcp stack,
as that may add considerable latency if there is no data available at
this point.

For a synthetic synchronous write load with single outstanding writes,
this additional latency when processing the "unplug remote" packet
added up to a performance degradation factor >= 10.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-11-27 19:50:43 +01:00
Jens Axboe
f30195c502 Merge branch 'cleanup-bd_claim' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core 2010-11-27 19:49:18 +01:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Tejun Heo
d4d7762995 block: clean up blkdev_get() wrappers and their users
After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.

All users are converted.  Most conversions are mechanical and don't
introduce any behavior difference.  There are several exceptions.

* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
  reason to OR it explicitly on blkdev_put().

* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
  sb->s_mode.

* With the above changes, sb->s_mode now always should contain
  FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
  errors.

The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Joern Engel <joern@lazybastard.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 11:55:18 +01:00
Tejun Heo
e525fd89d3 block: make blkdev_get/put() handle exclusive access
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.

* blkdev_get/put() are the primary open and close functions.

* bd_claim/release() deal with exclusive open.

* open/close_bdev_exclusive() are combination of open and claim and
  the other way around, respectively.

* bd_link/unlink_disk_holder() to create and remove holder/slave
  symlinks.

* open_by_devnum() wraps bdget() + blkdev_get().

The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access.  Reorganize the interface such that,

* blkdev_get() is extended to include exclusive access management.
  @holder argument is added and, if is @FMODE_EXCL specified, it will
  gain exclusive access atomically w.r.t. other exclusive accesses.

* blkdev_put() is similarly extended.  It now takes @mode argument and
  if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
  the last exclusive claim is released, the holder/slave symlinks are
  removed automatically.

* bd_claim/release() and close_bdev_exclusive() are no longer
  necessary and either made static or removed.

* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
  is no longer necessary and removed.

* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
  and blkdev_get().  It also has an unexpected extra bdev_read_only()
  test which probably should be moved into blkdev_get().

* open_by_devnum() is modified to take @holder argument and pass it to
  blkdev_get().

Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should).  This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.

open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features.  Well, let's leave them for another day.

Most conversions are straight-forward.  drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 11:55:17 +01:00
Jens Axboe
00e375e7e9 Merge branch 'for-2.6.37/drivers' into for-linus
Conflicts:
	drivers/block/cciss.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-11-10 14:51:27 +01:00
Mike Snitzer
77304d2aba block: read i_size with i_size_read()
Convert direct reads of an inode's i_size to using i_size_read().

i_size_{read,write} use a seqcount to protect reads from accessing
incomple writes.  Concurrent i_size_write()s require mutual exclussion
to protect the seqcount that is used by i_size_{read,write}.  But
i_size_read() callers do not need to use additional locking.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-11-10 14:40:53 +01:00
Nicolas Kaiser
2027ae1fa9 drivers/block/drbd/drbd_main.c: fix error path
Failure to create drbd_ee_mempool appears not to get checked.  Looks like
a copy-and-paste problem to me.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-28 06:15:26 -06:00
Jens Axboe
53c2eb24ff Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-2.6.37/drivers 2010-10-23 18:43:55 +02:00
Philipp Reisner
650789c87f drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-23 13:02:34 +02:00
Philipp Reisner
a8a4e51e69 drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-23 13:01:45 +02:00
Philipp Reisner
2451fc3b2b drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-23 13:00:48 +02:00
Linus Torvalds
a2887097f2 Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
  xen-blkfront: disable barrier/flush write support
  Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
  block: remove BLKDEV_IFL_WAIT
  aic7xxx_old: removed unused 'req' variable
  block: remove the BH_Eopnotsupp flag
  block: remove the BLKDEV_IFL_BARRIER flag
  block: remove the WRITE_BARRIER flag
  swap: do not send discards as barriers
  fat: do not send discards as barriers
  ext4: do not send discards as barriers
  jbd2: replace barriers with explicit flush / FUA usage
  jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
  jbd: replace barriers with explicit flush / FUA usage
  nilfs2: replace barriers with explicit flush / FUA usage
  reiserfs: replace barriers with explicit flush / FUA usage
  gfs2: replace barriers with explicit flush / FUA usage
  btrfs: replace barriers with explicit flush / FUA usage
  xfs: replace barriers with explicit flush / FUA usage
  block: pass gfp_mask and flags to sb_issue_discard
  dm: convey that all flushes are processed as empty
  ...
2010-10-22 17:07:18 -07:00
Linus Torvalds
8abfc6e7a4 Merge branch 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits)
  cciss: fix PCI IDs for new Smart Array controllers
  drbd: add race-breaker to drbd_go_diskless
  drbd: use dynamic_dev_dbg to optionally log uuid changes
  dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set
  drbd: cleanup: change "<= 0" to "== 0"
  drbd: relax the grace period of the md_sync timer again
  drbd: add some more explicit drbd_md_sync
  drbd: drop wrong debug asserts, fix recently introduced race
  drbd: cleanup useless leftover warn/error printk's
  drbd: add explicit drbd_md_sync to drbd_resync_finished
  drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
  drbd: fix for possible deadlock on IO error during resync
  drbd: fix unlikely access after free and list corruption
  drbd: fix for spurious fullsync (uuids rotated too fast)
  drbd: allow for explicit resync-finished notifications
  drbd: preparation commit, using full state in receive_state()
  drbd: drbd_send_ack_dp must not rely on header information
  drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
  drbd: Fixed a stupid copy and paste error
  drbd: Allow larger values for c-fill-target.
  ...

Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal
2010-10-22 17:03:12 -07:00
Linus Torvalds
e9dd2b6837 Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
  cfq-iosched: Fix a gcc 4.5 warning and put some comments
  block: Turn bvec_k{un,}map_irq() into static inline functions
  block: fix accounting bug on cross partition merges
  block: Make the integrity mapped property a bio flag
  block: Fix double free in blk_integrity_unregister
  block: Ensure physical block size is unsigned int
  blkio-throttle: Fix possible multiplication overflow in iops calculations
  blkio-throttle: limit max iops value to UINT_MAX
  blkio-throttle: There is no need to convert jiffies to milli seconds
  blkio-throttle: Fix link failure failure on i386
  blkio: Recalculate the throttled bio dispatch time upon throttle limit change
  blkio: Add root group to td->tg_list
  blkio: deletion of a cgroup was causes oops
  blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: revert bad fix for memory hotplug causing bounces
  Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: Prevent hang_check firing during long I/O
  cfq: improve fsync performance for small files
  ...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
2010-10-22 17:00:32 -07:00
Philipp Reisner
8825f7c3e5 drbd: Silenced an assert
That assertion's condition needed adjustment for today's semantics

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:55:22 +02:00
Lars Ellenberg
fb2c7a10ee drbd: rate limit an error message
If we don't rate limit it, and you happen to log err level messages via
serial console, an IO error on a disconnected Primary may cause serious
unresponsiveness.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:53:10 +02:00
Lars Ellenberg
bc571b8cb9 drbd: fix a misleading printk
This codepath used to be called only for failed kmalloc GFP_ATOMIC,
but is now also triggered by other things.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:51:22 +02:00
Lars Ellenberg
6719fb036c drbd: fix potential data divergence after multiple failures
If we get an IO-error during an activity log transaction,
if we failed to write the bitmap of the evicted extent,
we must not write the transaction itself.
If we failed to write the transaction,
we must not even submit the corresponding bio,
as its extent is not yet marked in the activity log.

Otherwise, if this was a disconneted Primary (degraded cluster), which
now lost its disk as well, and we later re-attach the same backend
storage, we possibly "forget" to resync some parts of the disk that
potentially have been changed.

On the receiving side, when receiving from a peer with unhealthy disk,
checking for pdsk == D_DISKLESS is not enough, we need to set out of
sync and do AL transactions for everything pdsk < D_INCONSISTENT on the
receiving side.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:50:27 +02:00
Lars Ellenberg
82f59cc635 drbd: fix potential deadlock on detach
If we have contention in drbd_al_begin_iod (heavy randon IO),
an administrative request to detach the disk may deadlock
for similar reasons as the recently fixed deadlock if detaching
because of IO-error.

The approach taken here is to either go through the intermediate
cleanup state D_FAILED, or first lock out application io,
don't just go directly to D_DISKLESS.

We need an additional state bit (WAS_IO_ERROR) to distinguish
the -> D_FAILED because of IO-error from other failures.

Sanitize D_ATTACHING -> D_FAILED to D_ATTACHING -> D_DISKLESS.
If only attaching, ldev may be missing still, but would be referenced
from within the after_state_ch for -> D_FAILED, potentially
dereferencing a NULL pointer.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:46:11 +02:00
Lars Ellenberg
3beec1d446 drbd: tag a few error messages with "assert failed"
If those messages ever get logged, clearly state that they are
actually failed ASSERTS, so our regression tests can pick them up
from the logs more easily.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:41:20 +02:00
Lars Ellenberg
aaa8e2b34c drbd: consolidate explicit drbd_md_sync into drbd_create_new_uuid
Every code path changing the current UUID needs to get it on stable
storage anyways. Flush it to disk right there, remove the now obsolte
explicit drbd_md_sync statements in the other code paths.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22 15:36:56 +02:00
Lars Ellenberg
5dbfe7aedf drbd: add race-breaker to drbd_go_diskless
This adds a necessary race breaker to these commits:
    drbd: fix for possible deadlock on IO error during resync
    drbd: drop wrong debug asserts, fix recently introduced race

What we do is get a refcount, check the state, then depending on the
state and the requested minimum disk state, either hold it (success),
or give it back immediately (failed "try lock").

Some code paths (flushing of drbd metadata) may still grab and hold a
refcount even if we are D_FAILED (application IO won't).
So even if we hit local_cnt == 0 once after being D_FAILED,
we still need to wait for that again after we changed to D_DISKLESS.
Once local_cnt reaches 0 while we are D_DISKLESS, we can be sure that
no one will look at the protected members anymore, so only then is it
safe to free them.

We cannot easily convert to standard locking primitives here, as we want
to be able to use it in atomic context (we always do a "try lock"),
as well as hold references for a "long time" (from IO submission to
completion callback).

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-15 14:06:53 +02:00
Lars Ellenberg
ac7241211d drbd: use dynamic_dev_dbg to optionally log uuid changes
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-15 10:52:42 +02:00
Dan Carpenter
2265769531 drbd: cleanup: change "<= 0" to "== 0"
dt is unsigned so it's never less than zero.  We are calculating the
elapsed time, and that's never less than zero (unless there is a bug or
we invent time travel).  The comparison here is just to guard against
divide by zero bugs.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2010-10-14 19:17:23 +02:00
Lars Ellenberg
ca0e6098aa drbd: relax the grace period of the md_sync timer again
Consolidate the ifdef's for the debug level, accidentally the used both
DEBUG and DRBD_DEBUG_MD_SYNC.  Default to off.

For production, we can safely reduce the grace period for this timer
again the the value we used to have.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 19:15:38 +02:00
Lars Ellenberg
856c50c7b6 drbd: add some more explicit drbd_md_sync
It sometimes may take a while for the after state change work to be
scheduled, which does drbd_md_sync. At convenient places, we should do
explicit drbd_md_sync to have the new state information on disk as soon
as possible.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 19:08:58 +02:00
Lars Ellenberg
9d282875d8 drbd: drop wrong debug asserts, fix recently introduced race
commit 2372c38caadeaebc68a5ee190782c2a0df01edc3
 drbd: fix for possible deadlock on IO error during resync

introduced a new ASSERT, which turns out to be wrong. Drop it.

Also serialize the state change to D_DISKLESS with the after state
change work of the -> D_FAILED transition, don't open a new race.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 19:08:32 +02:00
Lars Ellenberg
0f8488e160 drbd: cleanup useless leftover warn/error printk's
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:53 +02:00
Lars Ellenberg
13d42685be drbd: add explicit drbd_md_sync to drbd_resync_finished
As we usually update the generation UUIDs here, we should explicitly
sync them to disk.  So far this has been done only implicitly by related
code paths.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:52 +02:00
Philipp Reisner
b18b37befb drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
This might happen if on the VERIFY_S node the disk gets dropped.
Although this is an cluster wide state transition, the VERIFY_T node,
updates it connection state first. Then the ack packet for the
cluster wide state transition travels back, and the VERIFY_S node
stops to produce the P_OV_REQUEST packets.

There is absolutely nothing wrong with that.

Further, do not log "Can not satisfy peer's..." on the VERIFY_S
node in this case, but pretend that they had equal checksum.

[Bugz 327]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:51 +02:00
Lars Ellenberg
e9e6f3ec53 drbd: fix for possible deadlock on IO error during resync
Scenario:

Something (say, flush-147:0) is in drbd_al_begin_io,
holding a local_cnt, waiting for the resync to make progress.

Disk fails, worker in after_state_ch does drbd_rs_cancel_all,
then waits for local_cnt to drop to zero.

flush-147:0 is woken by drbd_rs_cancel_all, needs to write an AL
transaction, and queues that on the worker.

Deadlock.

Fix: do not wait in the worker, have put_ldev() trigger the
state change D_FAILED -> D_DISKLESS when necessary.
put_ldev() cannot do the state change directly, as it may or may not
already hold various spinlocks. We queue a short work instead.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:50 +02:00
Lars Ellenberg
22cc37a943 drbd: fix unlikely access after free and list corruption
Various cleanup paths have been incomplete, for the very unlikely case
that we cannot allocate enough bios from process context when submitting
on behalf of the peer or resync process.

Never observed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:49 +02:00
Lars Ellenberg
af85e8e83d drbd: fix for spurious fullsync (uuids rotated too fast)
If it was an "empty" resync, the SyncSource may have already "finished"
the resync and rotated the UUIDs, before noticing the connection loss
(and generating a new uuid, if Primary, rotating again), while the
SyncTarget did not change its uuids at all, or only got to the previous
sync-uuid.
This would then again lead to a full sync on next handshake
(see also Bug #251).

Fix:
Use explicit resync finished notification even for empty resyncs,
do not finish an empty resync implicitly on the SyncSource.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:48 +02:00
Lars Ellenberg
e9ef7bb6f9 drbd: allow for explicit resync-finished notifications
Preparation patch so more drbd_send_state() usage on the peer
will not confuse drbd in receive_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:47 +02:00
Lars Ellenberg
4ac4aadacb drbd: preparation commit, using full state in receive_state()
no functional change, just using full state instead of just the .conn
part of it for comparisons.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:46 +02:00
Lars Ellenberg
2b2bf2148f drbd: drbd_send_ack_dp must not rely on header information
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
drbd: receiving of big packets, for payloads between 64kByte and 4GByte
introduced a new on-the-wire packet header format.  We must no longer
assume either format, but use the result of whatever drbd_recv_header
has decoded.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:45 +02:00
Lars Ellenberg
004352fa60 drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
We used to be16_to_cpu the length field in our received packet header.
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
    drbd: receiving of big packets, for payloads between 64kByte and 4GByte
changed this, but forgot to adjust a few places where we relied on
h->length being in native byte order.

This broke the receiving side of the RLE compressed bitmap exchange.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:44 +02:00
Philipp Reisner
f10f262349 drbd: Fixed a stupid copy and paste error
This caused rs_planed to be not in sync with the content of the fifo.
That in turn could cause that the resync comes to a complete halt.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:43 +02:00
Philipp Reisner
00b425377d drbd: Allow larger values for c-fill-target.
Connections through a compressing proxy might have more bits
on the fly. 500MByte instead of 50MByte

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:42 +02:00
Lars Ellenberg
f65363cfa0 drbd: fix possible access after free
If we release the page pointed to by md_io_tmpp, we need to zero out the
pointer, too, as that may be used later to decide whether we need to
allocate a new page again.

Impact: a previously freed page may be used and clobbered.  Depending on
what that particular page is being used for meanwhile, this may result
in silent data corruption of completely unrelated things.

Only of concern on devices with logical_block_size != 512 byte,
if you re-attach after becoming diskless once.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:41 +02:00
Lars Ellenberg
8979d9c9e0 drbd: protocol compatibility for maximum packet sizes
Two missing corner cases to the "maximum packet size" handshake.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:41 +02:00
Philipp Reisner
fb22c402ff drbd: Track the reasons to suspend IO in dedicated state bits
There are three ways to get IO suspended:

 * Loss of any access to data
 * Fence-peer-handler running
 * User requested to suspend IO

Track those in different bits, so that one condition clearing its
state bit does not interfere with the other two conditions.

Only when the user resumes IO he overrules all three bits.

The fact is hidden from the user, he sees only a single suspend
bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:40 +02:00
Lars Ellenberg
78db89287c drbd: DIV_ROUND_UP not needed here
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:39 +02:00
Philipp Reisner
5a75cc7cfb drbd: Fixed compatibility with protocol versions smaller than 95
Forgot to consider the max size for the resync requests.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:38 +02:00
Lars Ellenberg
f2906e183f drbd: fix for spurious full sync (becoming sync target looked like invalidate)
If a synctarget lost connection while being WFSyncUUID,
due to "state sanitizing", the attempted state change to SyncTarget
looked like an "invalidate" to after_state_ch() later,
thus caused a full sync on next handshake (Bug #318).

drbd0: PingAck did not arrive in time.
drbd0: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

        from  : { cs:NetworkFailure ro:Secondary/Unknown ds:UpToDate/DUnknown r--- }
        to    : { cs:SyncTarget ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
        after sanizising, resulted in
        state: { cs:NetworkFailure ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
        drbd0: disk( UpToDate -> Inconsistent )

Fix:
don't mask state transition errors in "sanitizing",
so the requested state change to SyncTarget fails,
instead of being implicitly "remaped" to invalidate.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:37 +02:00
Lars Ellenberg
02bc7174ae drbd: cosmetic, don't report resync for online-verify
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:36 +02:00
Lars Ellenberg
a821cc4a9a drbd: fix spurious protocol error
If we cannot satisfy a request (because our disk just broke),
we still need to drain the payload.  Or we'll get a protocol error
when interpreting the payload as DRBD packet header.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:35 +02:00
Lars Ellenberg
1d53f09e17 drbd: fix potential kernel BUG (NULL deref)
BUG trace would look like:
 lc_find
 drbd_rs_complete_io
 got_OVResult
 drbd_asender

Could be triggered by explicit, or IO-error policy based,
detach during online-verify.

We may only dereference mdev->resync, if we first get_ldev(), as the
disk may break any time, causing mdev->resync to disappear once all
ldev references have been returned.
Already in flight online-verify requests or replies may still come in,
which we then need to ignore.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:34 +02:00
Lars Ellenberg
435f07402b drbd: don't count sendpage()d pages only referenced by tcp as in use
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:33 +02:00
Philipp Reisner
76d2e7eca8 drbd: Adding support for BIO/Request flags: REQ_FUA, REQ_FLUSH and REQ_DISCARD
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:32 +02:00
Lars Ellenberg
1090c056c5 drbd: drbd_md_sync before calling user space helpers
Just in case we have some pending meta data changes to sync, do it
before we call our userland helper, as that may take some time,
or even cause a hard reboot.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:31 +02:00
Lars Ellenberg
ee15b03816 drbd: fix race on meta-data update, addendum
addendum to baa33ae4eaa4477b60af7c434c0ddd1d182c1ae7

The race:
    drbd_md_sync()
	if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
		return;
    ==> RACE with drbd_md_mark_dirty() rearming the timer.
	del_timer(&mdev->md_sync_timer);

    Fixed by moving the del_timer before the test_and_clear_bit.

Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was
not already set, reduce the grace period from five to one second, and
add an ifdef'ed debuging aid to find code paths missing an explicit
drbd_md_sync, if any, as those are the only relevant ones for this race.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:30 +02:00
Philipp Reisner
63106d3c6c drbd: Removed a race that could cause unexpected execution of w_make_resync_request()
The actual race happened int the drbd_start_resync() function. Where
drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and
armed the timer.

If the timer fired before execution reaches the mod_timer statement
at the end of drbd_start_resync() the latter would cause an
unexpected call to w_make_resync_request().

Removed the STOP_SYNC_TIMER bit, and base it on the connection state.

The STOP_SYNC_TIMER bit probably originates probably the time before
the state engine.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:29 +02:00
Lars Ellenberg
ef50a3e34f drbd: implicitly create unconfigured devices on sync-after dependencies
If pacemaker (for example) decided to initialize minor devices not in
the exact sync-after dependency order, the configuration partially
failed with an error "The sync-after minor number is invalid". (Bugz. #322)

We can avoid that by implicitly creating unconfigured minor devices,
if others depend on them.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:28 +02:00
Lars Ellenberg
3f3a9b849d drbd: fix race on meta-data update
The race:
	drbd_md_mark_dirty()
	drbd_md_sync()
		if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
			return;
		drbd_md_sync_page_io(mdev, mdev->ldev, sector, WRITE)
  ==> RACE
		clear_bit(MD_DIRTY, &mdev->flags); <== spurious

Fixed by removing the spurious clear_bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:28 +02:00
Lars Ellenberg
c518d04fde drbd: fix race between deconfiguring and reconfiguring network
If a drbd_nl_net_conf hits the small window between the state change
to C_STANDALONE and the corresponding cleanup in after_state_ch,
that cleanup would throw away stuff we now need again,
and later trigger BUG_ON()s.

Fixed by properly serializing the new config request with
any pending cleanup.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:27 +02:00
Philipp Reisner
0778286a13 drbd: Disable activity log updates when the whole device is out of sync
When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.

As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.

BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:26 +02:00
Philipp Reisner
d53733893d drbd: Actually allow BIOs up to 128k (was 32k).
Now we have multiple BIOs per ee, packets with a 32 bit length field,
it gets time to use these goodies.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:25 +02:00
Philipp Reisner
02918be227 drbd: receiving of big packets, for payloads between 64kByte and 4GByte
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:24 +02:00
Philipp Reisner
0b70a13dac drbd: Sending of big packets, for payloads from 64KByte to 4GByte
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:23 +02:00
Philipp Reisner
204bba9965 drbd: Bugfix for regression introduced with f9bc8913c06022e
If we intent to use the block_id member of an epoch entry,
we may not use the digest member.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:22 +02:00
Philipp Reisner
48acf86898 drbd: Microfix: Assigning sector once is sufficient
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:21 +02:00
Lars Ellenberg
0f0601f4ea drbd: new configuration parameter c-min-rate
We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.

If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:20 +02:00
Lars Ellenberg
80a40e439e drbd: reduce code duplication when receiving data requests
also canonicalize the return values of read_for_csum
and drbd_rs_begin_io to return -ESOMETHING, or 0 for success.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:19 +02:00
Lars Ellenberg
1d7734a0df drbd: use rolling marks for resync speed calculation
The current resync speed as displayed in /proc/drbd fluctuates a lot.
Using an array of rolling marks makes this calculation much more stable.
We used to have this (a long time ago with 0.7), but it got lost somehow.

If "stalled", do not discard the rest of the information, just add a
" (stalled)" tag to the progress line.

This patch also shortens a spinlock critical section somewhat, and
reduces the number of atomic operations in put_ldev.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:18 +02:00
Lars Ellenberg
0bb70bf601 drbd: remove outdated comment and dead code
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:17 +02:00
Lars Ellenberg
c36c3ced69 drbd: let drbd_free_ee implicitly free any digest
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:16 +02:00
Philipp Reisner
85719573dd drbd: Replaced some casts by an union. Improved comments
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:15 +02:00
Philipp Reisner
d207450cf2 drbd: Bugfix: rs_in_flight could become wrong if read_for_csum() requested reschedule later
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:14 +02:00
Philipp Reisner
778f271dfe drbd: The new, smarter resync speed controller
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14 18:38:14 +02:00