Commit Graph

8421 Commits (2e4b7fcd926006531935a4c79a5e9349fe51125b)

Author SHA1 Message Date
Dave Hansen 2e4b7fcd92 [PATCH] r/o bind mounts: honor mount writer counts at remount
Originally from: Herbert Poetzl <herbert@13thfloor.at>

This is the core of the read-only bind mount patch set.

Note that this does _not_ add a "ro" option directly to the bind mount
operation.  If you require such a mount, you must first do the bind, then
follow it up with a 'mount -o remount,ro' operation:

If you wish to have a r/o bind mount of /foo on bar:

	mount --bind /foo /bar
	mount -o remount,ro /bar

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Dave Hansen 3d733633a6 [PATCH] r/o bind mounts: track numbers of writers to mounts
This is the real meat of the entire series.  It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time.  Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.

This uses a statically-allocated percpu variable.  All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances.  Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch.  It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely.  This worst-case scenario causes a 3%
degredation in the benchmark.

I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory.  In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.

(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable.  So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)

[AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Dave Hansen 2c463e9548 [PATCH] r/o bind mounts: check mnt instead of superblock directly
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.

This patches uses __mnt_want_write().  It does not guarantee that the mount
will stay writeable after the check.  But, this is OK for one of the checks
because it is just for a printk().

The other two are probably unnecessary and duplicate existing checks in the
VFS.  This won't make them better checks than before, but it will make them
detect r/o mounts.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Dave Hansen ec82687f29 [PATCH] r/o bind mounts: elevate count for xfs timestamp updates
Elevate the write count during the xfs m/ctime updates.

XFS has to do it's own timestamp updates due to an unfortunate VFS
design limitation, so it will have to track writers by itself aswell.

[hch: split out from the touch_atime patch as it's not related to it at all]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:26 -04:00
Dave Hansen 2f676cbc0d [PATCH] r/o bind mounts: make access() use new r/o helper
It is OK to let access() go without using a mnt_want/drop_write() pair because
it doesn't actually do writes to the filesystem, and it is inherently racy
anyway.  This is a rare case when it is OK to use __mnt_is_readonly()
directly.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:26 -04:00
Dave Hansen 9ac9b8474c [PATCH] r/o bind mounts: write counts for truncate()
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:25 -04:00
Dave Hansen 2af482a7ed [PATCH] r/o bind mounts: elevate write count for chmod/chown callers
chown/chmod,etc...  don't call permission in the same way that the normal
"open for write" calls do.  They still write to the filesystem, so bump the
write count during these operations.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:25 -04:00
Dave Hansen 4a3fd211cc [PATCH] r/o bind mounts: elevate write count for open()s
This is the first really tricky patch in the series.  It elevates the writer
count on a mount each time a non-special file is opened for write.

We used to do this in may_open(), but Miklos pointed out that __dentry_open()
is used as well to create filps.  This will cover even those cases, while a
call in may_open() would not have.

There is also an elevated count around the vfs_create() call in open_namei().
See the comments for more details, but we need this to fix a 'create, remount,
fail r/w open()' race.

Some filesystems forego the use of normal vfs calls to create
struct files.   Make sure that these users elevate the mnt
writer count because they will get __fput(), and we need
to make sure they're balanced.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:25 -04:00
Dave Hansen 42a74f206b [PATCH] r/o bind mounts: elevate write count for ioctls()
Some ioctl()s can cause writes to the filesystem.  Take these, and make them
use mnt_want/drop_write() instead.

[AV: updated]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:24 -04:00
Dave Hansen 20ddee2c75 [PATCH] r/o bind mounts: write count for file_update_time()
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:24 -04:00
Dave Hansen 74f9fdfa1f [PATCH] r/o bind mounts: elevate write count for do_utimes()
Now includes fix for oops seen by akpm.

"never let a libc developer write your kernel code" - hch

"nor, apparently, a kernel developer" - akpm

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:24 -04:00
Dave Hansen cdb70f3f74 [PATCH] r/o bind mounts: write counts for touch_atime()
Remove handling of NULL mnt while we are at it - that can't happen these days.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:23 -04:00
Dave Hansen a761a1c03a [PATCH] r/o bind mounts: elevate write count for ncp_ioctl()
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:23 -04:00
Dave Hansen 18f335aff8 [PATCH] r/o bind mounts: elevate write count for xattr_permission() callers
This basically audits the callers of xattr_permission(), which calls
permission() and can perform writes to the filesystem.

[AV: add missing parts - removexattr() and nfsd posix acls, plug for a leak
spotted by Miklos]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:15 -04:00
Dave Hansen 9079b1eb17 [PATCH] r/o bind mounts: get write access for vfs_rename() callers
This also uses the little helper in the NFS code to make an if() a little bit
less ugly.  We introduced the helper at the beginning of the series.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 75c3f29de7 [PATCH] r/o bind mounts: write counts for link/symlink
[AV: add missing nfsd pieces]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 463c319726 [PATCH] r/o bind mounts: get callers of vfs_mknod/create/mkdir()
This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().

So that we don't have to make three mnt_want/drop_write()
calls inside of the switch statement, we move some of its
logic outside of the switch and into a helper function
suggested by Christoph.

This also encapsulates a fix for mknod(S_IFREG) that Miklos
found.

[AV: merged mkdir handling, added missing nfsd pieces]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:34 -04:00
Dave Hansen 0622753b80 [PATCH] r/o bind mounts: elevate write count for rmdir and unlink.
Elevate the write count during the vfs_rmdir() and vfs_unlink().

[AV: merged rmdir and unlink parts, added missing pieces in nfsd]

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:33 -04:00
Dave Hansen 49e0d02cf0 [PATCH] r/o bind mounts: drop write during emergency remount
The emergency remount code forcibly removes FMODE_WRITE from
filps.  The r/o bind mount code notices that this was done
without a proper mnt_drop_write() and properly gives a
warning.

This patch does a mnt_drop_write() to keep everything
balanced.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:33 -04:00
Dave Hansen aceaf78da9 [PATCH] r/o bind mounts: create helper to drop file write access
If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().

NFS does just that, and will use this in the next
patch.

AV: drop write access in __fput() only after we evict from file list.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:32 -04:00
Dave Hansen 8366025eb8 [PATCH] r/o bind mounts: stub functions
This patch adds two function mnt_want_write() and mnt_drop_write().  These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.

Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair.  When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/w<->r/o transitions to occur.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:32 -04:00
Christoph Hellwig a70e65df88 [PATCH] merge open_namei() and do_filp_open()
open_namei() will, in the future, need to take mount write counts
over its creation and truncation (via may_open()) operations.  It
needs to keep these write counts until any potential filp that is
created gets __fput()'d.

This gets complicated in the error handling and becomes very murky
as to how far open_namei() actually got, and whether or not that
mount write count was taken.  That makes it a bad interface.

All that the current do_filp_open() really does is allocate the
nameidata on the stack, then call open_namei().

So, this merges those two functions and moves filp_open() over
to namei.c so it can be close to its buddy: do_filp_open().  It
also gets a kerneldoc comment in the process.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:32 -04:00
Dave Hansen d57999e152 [PATCH] do namei_flags calculation inside open_namei()
My end goal here is to make sure all users of may_open()
return filps.  This will ensure that we properly release
mount write counts which were taken for the filp in
may_open().

This patch moves the sys_open flags to namei flags
calculation into fs/namei.c.  We'll shortly be moving
the nameidata_to_filp() calls into namei.c, and this
gets the sys_open flags to a place where we can get
at them when we need them.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:25:31 -04:00
Linus Torvalds 334d094504 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
  [NET]: Fix and allocate less memory for ->priv'less netdevices
  [IPV6]: Fix dangling references on error in fib6_add().
  [NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
  [PKT_SCHED]: Fix datalen check in tcf_simp_init().
  [INET]: Uninline the __inet_inherit_port call.
  [INET]: Drop the inet_inherit_port() call.
  SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
  [netdrvr] forcedeth: internal simplifications; changelog removal
  phylib: factor out get_phy_id from within get_phy_device
  PHY: add BCM5464 support to broadcom PHY driver
  cxgb3: Fix __must_check warning with dev_dbg.
  tc35815: Statistics cleanup
  natsemi: fix MMIO for PPC 44x platforms
  [TIPC]: Cleanup of TIPC reference table code
  [TIPC]: Optimized initialization of TIPC reference table
  [TIPC]: Remove inlining of reference table locking routines
  e1000: convert uint16_t style integers to u16
  ixgb: convert uint16_t style integers to u16
  sb1000.c: make const arrays static
  sb1000.c: stop inlining largish static functions
  ...
2008-04-18 18:02:35 -07:00
Linus Torvalds e675349e2b Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (64 commits)
  ocfs2/net: Add debug interface to o2net
  ocfs2: Only build ocfs2/dlm with the o2cb stack module
  ocfs2/cluster: Get rid of arguments to the timeout routines
  ocfs2: Put tree in MAINTAINERS
  ocfs2: Use BUG_ON
  ocfs2: Convert ocfs2 over to unlocked_ioctl
  ocfs2: Improve rename locking
  fs/ocfs2/aops.c: test for IS_ERR rather than 0
  ocfs2: Add inode stealing for ocfs2_reserve_new_inode
  ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
  ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
  ocfs2: Enable cross extent block merge.
  ocfs2: Add support for cross extent block
  ocfs2: Move /sys/o2cb to /sys/fs/o2cb
  sysfs: Allow removal of symlinks in the sysfs root
  ocfs2:  Reconnect after idle time out.
  ocfs2/dlm: Cleanup lockres print
  ocfs2/dlm: Fix lockname in lockres print function
  ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
  ocfs2/dlm: Dumps the purgelist into a debugfs file
  ...
2008-04-18 10:15:22 -07:00
Linus Torvalds ef38ff9d37 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (49 commits)
  [GFS2] fix assertion in log_refund()
  [GFS2] fix GFP_KERNEL misuses
  [GFS2] test for IS_ERR rather than 0
  [GFS2] Invalidate cache at correct point
  [GFS2] fs/gfs2/recovery.c: suppress warnings
  [GFS2] Faster gfs2_bitfit algorithm
  [GFS2] Streamline quota lock/check for no-quota case
  [GFS2] Remove drop of module ref where not needed
  [GFS2] gfs2_adjust_quota has broken unstuffing code
  [GFS2] possible null pointer dereference fixup
  [GFS2] Need to ensure that sector_t is 64bits for GFS2
  [GFS2] re-support special inode
  [GFS2] remove gfs2_dev_iops
  [GFS2] fix file_system_type leak on gfs2meta mount
  [GFS2] Allow bmap to allocate extents
  [GFS2] Fix a page lock / glock deadlock
  [GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops
  [GFS2] gfs2/ops_file.c should #include "ops_inode.h"
  [GFS2] be*_add_cpu conversion
  [GFS2] Fix bug where we called drop_bh incorrectly
  ...
2008-04-18 10:02:46 -07:00
Sunil Mushran 2309e9e040 ocfs2/net: Add debug interface to o2net
This patch exposes o2net information via debugfs. The information includes
the list of sockets (sock_containers) as well as the list of outstanding
messages (send_tracking). Useful for o2dlm debugging.

(This patch is derived from an earlier one written by Zach Brown that
exposed the same information via /proc.)

[Mark: checkpatch fixes]

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:20 -07:00
Mark Fasheh 93b06edb51 ocfs2: Only build ocfs2/dlm with the o2cb stack module
fs/ocfs2/dlm/ocfs2_dlm.ko and fs/ocfs2/dlm/ocfs2_dlmfs.ko get built if
CONFIG_FS_OCFS2 is specified. This isn't quite how it should happen any more
- the "o2cb" dlm modules should only be built if CONFIG_FS_OCFS2_O2CB is
set, so update the dlm Makefile accordingly.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
2008-04-18 08:56:12 -07:00
Jeff Mahoney 409753bf6d ocfs2/cluster: Get rid of arguments to the timeout routines
We keep seeing bug reports related to NULL pointer derefs in
o2net_set_nn_state(). When I originally wrote up the configurable timeout
patch, I had tried to plan for multiple clusters. This was silly.

The timeout routines all use o2nm_single_cluster so there's no point in
passing an argument at all. This patch removes the arguments and kills those
bugs dead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:12 -07:00
Julia Lawall b1f3550fa1 ocfs2: Use BUG_ON
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:11 -07:00
Andi Kleen c9ec14884d ocfs2: Convert ocfs2 over to unlocked_ioctl
As far as I can see there is nothing in ocfs2_ioctl that requires the BKL,
so use unlocked_ioctl

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:11 -07:00
Jan Kara 5dabd69515 ocfs2: Improve rename locking
ocfs2_rename() was being too aggressive with the rename lock - we only need
it for certain forms of directory rename.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:11 -07:00
Julia Lawall 58dadcdbc2 fs/ocfs2/aops.c: test for IS_ERR rather than 0
The function ocfs2_start_trans always returns either a valid pointer or a
value made with ERR_PTR, so its result should be tested with IS_ERR, not
with a test for 0.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:11 -07:00
Tao Ma 4d0ddb2ce2 ocfs2: Add inode stealing for ocfs2_reserve_new_inode
Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Tao Ma a4a4891164 ocfs2: Add ac_alloc_slot in ocfs2_alloc_context
In inode stealing, we no longer restrict the allocation to
happen in the local node. So it is neccessary for us to add
a new member in ocfs2_alloc_context to indicate which slot
we are using for allocation. We also modify the process of
local alloc so that this member can be used there also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Tao Ma ffda89a3bf ocfs2: Add a new parameter for ocfs2_reserve_suballoc_bits
In some cases(Inode stealing from other nodes), we may not want
ocfs2_reserve_suballoc_bits to allocate new groups from the
global_bitmap since it may already be full. So add a new parameter
for this.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Tao Ma ad5a4d7093 ocfs2: Enable cross extent block merge.
In ocfs2_figure_merge_contig_type, we judge whether there exists
a cross extent block merge and enable it by setting CONTIG_LEFT
and CONTIG_RIGHT accordingly.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Tao Ma 677b975282 ocfs2: Add support for cross extent block
In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT"
with the first extent record of the next extent block, we will merge it to
the next extent block and change all the related extent blocks accordingly.

In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT"
with the last extent record of the previous extent block, we will merge
it to the prevoius extent block and change all the related extent blocks
accordingly.

As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when
the index is zero, the merge process will be more efficient and easier.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Mark Fasheh 52f7c21b61 ocfs2: Move /sys/o2cb to /sys/fs/o2cb
/sys/fs is where we really want file system specific sysfs objects.

Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain
backwards compatibility with old ocfs2-tools by using a sysfs symlink. After
some time (2 years), the symlink can be safely removed. This patch also adds
documentation to make it easier for people to figure out what /sys/fs/o2cb
is used for.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Mark Fasheh a839c5afcd sysfs: Allow removal of symlinks in the sysfs root
Allow callers of sysfs_remove_link() to pass a NULL kobj, in which case
sysfs_root will be used as the parent directory. This allows us to tear down
top level symlinks created via sysfs_create_link(), which already has
similar handling of a NULL parent object.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-18 08:56:10 -07:00
Tao Ma 5cc3bf2786 ocfs2: Reconnect after idle time out.
Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.

It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang in the process).

So in this updated scheme, when the network disconnects, we keep
attempting to reconnect till we succeed or we get a disk hb down
event.

If the other node is really dead, then we will eventually get a
node down event. If not, we should be able to connect again and
continue.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:10 -07:00
Sunil Mushran 8f50eb9789 ocfs2/dlm: Cleanup lockres print
A previous patch added KERN_NOTICE to printks printing the lockres that
cluttered the output. This patch removes the log level. For people concerned
with syslog clutter, please note we now use this facility to print lockres
only during an error.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran c834cdb157 ocfs2/dlm: Fix lockname in lockres print function
__dlm_print_one_lock_resource was printing lockname incorrectly.
Also, we now use printk directly instead of mlog as the latter prints
the line context which is not useful for this print.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran e5a0334cbd ocfs2/dlm: Move dlm_print_one_mle() from dlmmaster.c to dlmdebug.c
This patch helps in consolidating debugging related functions in dlmdebug.c.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran 7209300a9b ocfs2/dlm: Dumps the purgelist into a debugfs file
This patch dumps all the lockres' on the purgelist it can fit in one page
into a debugfs file. Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran d0129aceae ocfs2/dlm: Dumps the mles into a debugfs file
This patch dumps all mles it can fit in one page into a debugfs file.
Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran 751155a953 ocfs2/dlm: Move struct dlm_master_list_entry to dlmcommon.h
This patch moves some mle related definitions from dlmmaster.c
to dlmcommon.h. Future patches need these definitions to dump mle
debugging information.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.beckeroracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran 4e3d24ed1a ocfs2/dlm: Dumps the lockres' into a debugfs file
This patch dumps all the lockres' alongwith all the locks into
a debugfs file. Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:09 -07:00
Sunil Mushran 007dce53a2 ocfs2/dlm: Dump the dlm state in a debugfs file
This patch dumps the dlm state (dlm_ctxt) into a debugfs file.
Useful for debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:08 -07:00
Sunil Mushran 6325b4a22b ocfs2/dlm: Create debugfs dirs
This patch creates the debugfs directories that will hold the
files to be used to dump the dlm state.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18 08:56:08 -07:00