Commit graph

16720 commits

Author SHA1 Message Date
Al Viro
ac278a9c50 fix LOOKUP_FOLLOW on automount "symlinks"
Make sure that automount "symlinks" are followed regardless of LOOKUP_FOLLOW;
it should have no effect on them.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-19 03:56:42 -05:00
David S. Miller
2bb4646fce Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-02-16 22:09:29 -08:00
Eric W. Biederman
7c0ff870d1 sysfs: sysfs_sd_setattr set iattrs unconditionally
There is currently a bug in sysfs_sd_setattr inherited from
sysfs_setattr in 2.6.32 where the first time we set the attributes
on a sysfs file we allocate backing store but do not set the
backing store attributes.  Resulting in overly restrictive
permissions on sysfs files.

The fix is to simply modify the code so that it always executes
when we update the sysfs attributes, as we did in 2.6.31 and earlier.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Jean Delvare <khali@linux-fr.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-16 15:42:42 -08:00
Linus Torvalds
0813e22d4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: btrfs_mark_extent_written uses the wrong slot
2010-02-15 19:56:21 -08:00
Chuck Lever
65d269538a NFS: Too many GETATTR and ACCESS calls after direct I/O
The cached read and write paths initialize fattr->time_start in their
setup procedures.  The value of fattr->time_start is propagated to
read_cache_jiffies by nfs_update_inode().  Subsequent calls to
nfs_attribute_timeout() will then use a good time stamp when
computing the attribute cache timeout, and squelch unneeded GETATTR
calls.

Since the direct I/O paths erroneously leave the inode's
fattr->time_start field set to zero, read_cache_jiffies for that inode
is set to zero after any direct read or write operation.  This
triggers an otw GETATTR or ACCESS call to update the file's attribute
and access caches properly, even when the NFS READ or WRITE replies
have usable post-op attributes.

Make sure the direct read and write setup code performs the same fattr
initialization as the cached I/O paths to prevent unnecessary GETATTR
calls.

This was likely introduced by commit 0e574af1 in 2.6.15, which appears
to add new nfs_fattr_init() call sites in the cached read and write
paths, but not in the equivalent places in fs/nfs/direct.c.  A
subsequent commit in the same series, 33801147, introduces the
fattr->time_start field.

Interestingly, the direct write reschedule path already has a call to
nfs_fattr_init() in the right place.

Reported-by: Quentin Barnes <qbarnes@yahoo-inc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-15 19:53:43 -08:00
Linus Torvalds
0aa2ca9ae1 Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  reiserfs: Fix softlockup while waiting on an inode
2010-02-15 19:51:45 -08:00
Frederic Weisbecker
175359f89d reiserfs: Fix softlockup while waiting on an inode
When we wait for an inode through reiserfs_iget(), we hold
the reiserfs lock. And waiting for an inode may imply waiting
for its writeback. But the inode writeback path may also require
the reiserfs lock, which leads to a deadlock.

We just need to release the reiserfs lock from reiserfs_iget()
to fix this.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Chris Mason <chris.mason@oracle.com>
2010-02-14 19:07:56 +01:00
Jeremy Kerr
7c540d9e3d proc_devtree: fix THIS_MODULE without module.h
Commit e22f628395 introduced a build
breakage for ARM devtree work: the THIS_MODULE macro was added, but we
don't have module.h

This change adds the necessary #include to get THIS_MODULE defined.
While we could just replace it with NULL (PROC_FS is a bool, not a
tristate), using THIS_MODULE will prevent unexpected breakage if we
ever do compile this as a module.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
2010-02-14 07:13:41 -07:00
Julia Lawall
d67b1b0325 fs/xfs: Correct NULL test
Test the value that was just allocated rather than the previously tested one.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@

if (x == NULL || ...) {
    ... when forall
    return ...; }
... when != goto l;
    when != x = e
    when != &x
*x == NULL
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-02-13 13:22:53 -06:00
Shaohua Li
3f6fae9559 Btrfs: btrfs_mark_extent_written uses the wrong slot
My test do: fallocate a big file and do write. The file is 512M, but
after file write is done btrfs-debug-tree shows:
item 6 key (257 EXTENT_DATA 0) itemoff 3516 itemsize 53
                extent data disk byte 1103101952 nr 536870912
                extent data offset 0 nr 399634432 ram 536870912
                extent compression 0
Looks like a regression introducted by
6c7d54ac87, where we set wrong slot.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-12 16:47:19 -05:00
Christoph Hellwig
180040b89e xfs: optimize log flushing in xfs_fsync
If we have a pinned inode it must have a log item attached to it.
Usually that log item will have ili_last_lsn already set, in which
case we only need to flush the log up to that LSN instead of doing a
full log force.  This gives speedups of about 5% in some fsync heavy
workloads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-02-12 13:45:14 -06:00
Christoph Hellwig
87185517de xfs: only clear the suid bit once in xfs_write
file_remove_suid already calls into ->setattr to clear the suid and
sgid bits if needed, no need to start a second transaction to do it
ourselves.

Note that xfs_write_clear_setuid issues a sync transaction while the
path through ->setattr doesn't, but that is consistant with the
other filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-02-12 13:43:57 -06:00
Steven Whitehouse
07ccb7bf2c GFS2: Fix bmap allocation corner-case bug
This patch solves a corner case during allocation which occurs if both
metadata (indirect) and data blocks are required but there is an
obstacle in the filesystem (e.g. a resource group header or another
allocated block) such that when the allocation is requested only
enough blocks for the metadata are returned.

By changing the exit condition of this loop, we ensure that a
minimum of one data block will always be returned.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-12 10:16:14 +00:00
Abhijith Das
0e5a9fb042 GFS2: Fix error code
We need this one-liner to signal the mount helper of the 'insufficient journals' condition.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-12 10:15:51 +00:00
Linus Torvalds
efa82bab8e Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix the mapping of the NFSERR_SERVERFAULT error
  NFS: Remove a redundant check for PageFsCache in nfs_migrate_page()
  NFS: Fix a bug in nfs_fscache_release_page()
2010-02-11 14:06:28 -08:00
Linus Torvalds
fd48d6c888 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] qla2xxx: Obtain proper host structure during response-queue processing.
  [SCSI] compat_ioct: fix bsg SG_IO
  [SCSI] qla2xxx: make msix interrupt handler safe for irq
  [SCSI] zfcp: Report FC BSG errors in correct field
  [SCSI] mptfusion : mptscsih_abort return value should be SUCCESS instead of value 0.
2010-02-11 14:05:55 -08:00
Michael Neuling
803bf5ec25 fs/exec.c: restrict initial stack space expansion to rlimit
When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.

This fixes a bug caused by b6a2fea393 ("mm:
variable length argument support") and unmasked by
fc63cf2370 ("exec: setup_arg_pages() fails
to return errors").

This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
80K on 4K pages or 'ulimit -s 79') all processes will be killed before
they start.  This is particularly bad with 64K pages, where a ulimit below
1280K will kill every process.

To test, do:

  'ulimit -s 15; ls'

before and after the patch is applied.  Before it's applied, 'ls' should
be killed.  After the patch is applied, 'ls' should no longer be killed.

A stack limit of 15KB since it's small enough to trigger 20*PAGE_SIZE.
Also 15KB not a multiple of PAGE_SIZE, which is a trickier case to handle
correctly with this code.

4K pages should be fine to test with.

[kosaki.motohiro@jp.fujitsu.com: cleanup]
[akpm@linux-foundation.org: cleanup cleanup]
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-11 13:59:43 -08:00
Andreas Schwab
4cfbafd33f compat_ioctl: add compat handler for TIOCGSID ioctl
This is used by tcgetsid(3).

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-11 13:59:42 -08:00
Li Zefan
66655de6d1 seq_file: Add helpers for iteration over a hlist
Some places in kernel need to iterate over a hlist in seq_file,
so provide some common helpers.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-10 11:12:06 -08:00
Arnd Bergmann
f79f118528 compat_ioctl: ignore RAID_VERSION ioctl
md ioctls are now handled by the md driver itself, but mdadm
may call RAID_VERSION on other devices as well. Mark the command
as IGNORE_IOCTL so this fails silently rather than printing
an annoying message.

Reported-by: "Michael S. Tsirkin" <m.s.tsirkin@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-10 07:36:16 -08:00
Linus Torvalds
5551638acb Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix dentry hash calculation for case-insensitive mounts
  [CIFS] Don't cache timestamps on utimes due to coarse granularity
  [CIFS] Maximum username length check in session setup does not match
  cifs: fix length calculation for converted unicode readdir names
  [CIFS] Add support for TCP_NODELAY
2010-02-10 07:16:44 -08:00
Trond Myklebust
fdcb45777a NFS: Fix the mapping of the NFSERR_SERVERFAULT error
It was recently pointed out that the NFSERR_SERVERFAULT error, which is
designed to inform the user of a serious internal error on the server, was
being mapped to an error value that is internal to the kernel.

This patch maps it to the error EREMOTEIO, which is exported to userland
through errno.h.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-09 14:29:29 -05:00
Trond Myklebust
7549ad5f9b NFS: Remove a redundant check for PageFsCache in nfs_migrate_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
2010-02-09 14:29:21 -05:00
Trond Myklebust
2c1740098c NFS: Fix a bug in nfs_fscache_release_page()
Not having an fscache cookie is perfectly valid if the user didn't mount
with the fscache option.

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=15234

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: stable@kernel.org
2010-02-09 14:29:10 -05:00
Linus Torvalds
3af9cf11b6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix p9_client_destroy unconditional calling v9fs_put_trans
  9p: fix memory leak in v9fs_parse_options()
  9p: Fix the kernel crash on a failed mount
  9p: fix option parsing
  9p: Include fsync support for 9p client
  net/9p: fix statsize inside twstat
  net/9p: fail when user specifies a transport which we can't find
  net/9p: fix virtio transport to correctly update status on connect
2010-02-09 11:19:06 -08:00
Jeremy Kerr
50ab2fe147 proc_devtree: include linux/of.h
Currenly, proc_devtree.c depends on asm/prom.h to include linux/of.h, to
provide some device-tree definitions (eg, struct property).

Instead, include linux/of.h directly. We still need asm/prom.h for
HAVE_ARCH_DEVTREE_FIXUPS.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-02-09 08:34:10 -07:00
Jeremy Kerr
8cfb3343f7 of: make set_node_proc_entry private to proc_devtree.c
We only need set_node_proc_entry in proc_devtree.c, so move it there.

This fixes the !HAVE_ARCH_DEVTREE_FIXUPS build, as we can't make make
the definition in linux/of.h conditional on this #define (definitions in
asm/prom.h can't be exposed to linux/of.h, due to the enforced #include
ordering).

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-02-09 08:34:10 -07:00
Linus Torvalds
deb0c98c7f Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
  Revert "nfsd4: fix error return when pseudoroot missing"
2010-02-08 17:08:01 -08:00
Linus Torvalds
a5f28ae4df Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/cluster: Make o2net connect messages KERN_NOTICE
  ocfs2/dlm: Fix printing of lockname
  ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
  ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
  ocfs2: Plugs race between the dc thread and an unlock ast message
  ocfs2: Remove overzealous BUG_ON during blocked lock processing
  ocfs2: Do not downconvert if the lock level is already compatible
  ocfs2: Prevent a livelock in dlmglue
  ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
  ocfs2: Use compat_ptr in reflink_arguments.
  ocfs2/dlm: Handle EAGAIN for compatibility - v2
  ocfs2: Add parenthesis to wrap the check for O_DIRECT.
  ocfs2: Only bug out when page size is larger than cluster size.
  ocfs2: Fix memory overflow in cow_by_page.
  ocfs2/dlm: Print more messages during lock migration
  ocfs2/dlm: Ignore LVBs of locks in the Blocked list
  ocfs2/trivial: Remove trailing whitespaces
  ocfs2: fix a misleading variable name
  ocfs2: Sync max_inline_data_with_xattr from tools.
  ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
2010-02-08 16:05:50 -08:00
Eric Van Hensbergen
bf2d29c64d 9p: fix memory leak in v9fs_parse_options()
If match_strdup() fail this function exits without freeing the options string.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 17:59:34 -06:00
Eric Van Hensbergen
d8c8a9e365 9p: fix option parsing
Options pointer is being moved before calling kfree() which seems
to cause problems.  This uses a separate pointer to track and free
original allocation.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
2010-02-08 16:23:23 -06:00
M. Mohan Kumar
7a4439c406 9p: Include fsync support for 9p client
Implement the fsync in the client side by marking stat field values to 'don't touch' so that server may 
interpret it as a request to guarantee that the contents of the associated file are committed to stable 
storage before the Rwstat message is returned.

Without this patch, calling fsync on a 9p file results in "Invalid argument" error. Please check the attached 
C program.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> 
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> 
Acked-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 15:36:48 -06:00
Sunil Mushran
6efd806634 ocfs2/cluster: Make o2net connect messages KERN_NOTICE
Connect and disconnect messages are more than informational as they are required
during root cause analysis for failures. This patch changes them from KERN_INFO
to KERN_NOTICE.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Faseh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08 13:02:28 -08:00
Sunil Mushran
86a06abab0 ocfs2/dlm: Fix printing of lockname
The debug call printing the name of the lock resource was chopping
off the last character. This patch fixes the problem.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08 13:01:31 -08:00
J. Bruce Fields
260c64d235 Revert "nfsd4: fix error return when pseudoroot missing"
Commit f39bde24b2 fixed the error return from PUTROOTFH in the
case where there is no pseudofilesystem.

This is really a case we shouldn't hit on a correctly configured server:
in the absence of a root filehandle, there's no point accepting version
4 NFS rpc calls at all.

But the shared responsibility between kernel and userspace here means
the kernel on its own can't eliminate the possiblity of this happening.
And we have indeed gotten this wrong in distro's, so new client-side
mount code that attempts to negotiate v4 by default first has to work
around this case.

Therefore when commit f39bde24b2 arrived at roughly the same
time as the new v4-default mount code, which explicitly checked only for
the previous error, the result was previously fine mounts suddenly
failing.

We'll fix both sides for now: revert the error change, and make the
client-side mount workaround more robust.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-08 15:25:23 -05:00
FUJITA Tomonori
84eb8fb42c [SCSI] compat_ioct: fix bsg SG_IO
bsg's SG_IO doesn't work on 32-bit userspace and 64-bit kernelspace.

The problem is that both sg and bsg drivers use SG_IO
ioctl. sg_ioctl_trans() does 32/64-bit conversion even against bsg
header. It messes up bsg header. bsg driver gets garbage.

This patch fixes sg_ioctl_trans to handle only sg header (struct
sg_io_hdr).

Reported-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-02-08 13:43:18 -06:00
Jeff Layton
05507fa2ac cifs: fix dentry hash calculation for case-insensitive mounts
case-insensitive mounts shouldn't use full_name_hash(). Make sure we
use the parent dentry's d_hash routine when one is set.

Reported-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-08 17:52:34 +00:00
Steve French
ccd4bb1beb [CIFS] Don't cache timestamps on utimes due to coarse granularity
force revalidate of the file when any of the timestamps are set since
some filesytem types do not have finer granularity timestamps and
we can not always detect which file systems round timestamps down
to determine whether we can cache the mtime on setattr
samba bugzilla 3775

Acked-by: Shirish Pargaonkar <sharishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-08 17:39:58 +00:00
Linus Torvalds
6339204ecc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Take ima_file_free() to proper place.
  ima: rename PATH_CHECK to FILE_CHECK
  ima: rename ima_path_check to ima_file_check
  ima: initialize ima before inodes can be allocated
  fix ima breakage
  Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
  freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
  befs: fix leak
2010-02-07 11:18:28 -08:00
Linus Torvalds
80e1e82398 Fix race in tty_fasync() properly
This reverts commit 7036251180 ("tty: fix race in tty_fasync") and
commit b04da8bfdf ("fnctl: f_modown should call write_lock_irqsave/
restore") that tried to fix up some of the fallout but was incomplete.

It turns out that we really cannot hold 'tty->ctrl_lock' over calling
__f_setown, because not only did that cause problems with interrupt
disables (which the second commit fixed), it also causes a potential
ABBA deadlock due to lock ordering.

Thanks to Tetsuo Handa for following up on the issue, and running
lockdep to show the problem.  It goes roughly like this:

 - f_getown gets filp->f_owner.lock for reading without interrupts
   disabled, so an interrupt that happens while that lock is held can
   cause a lockdep chain from f_owner.lock -> sighand->siglock.

 - at the same time, the tty->ctrl_lock -> f_owner.lock chain that
   commit 7036251180 introduced, together with the pre-existing
   sighand->siglock -> tty->ctrl_lock chain means that we have a lock
   dependency the other way too.

So instead of extending tty->ctrl_lock over the whole __f_setown() call,
we now just take a reference to the 'pid' structure while holding the
lock, and then release it after having done the __f_setown.  That still
guarantees that 'struct pid' won't go away from under us, which is all
we really ever needed.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Américo Wang <xiyou.wangcong@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-07 10:26:01 -08:00
Al Viro
89068c576b Take ima_file_free() to proper place.
Hooks: Just Say No.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:07:29 -05:00
Mimi Zohar
9bbb6cad01 ima: rename ima_path_check to ima_file_check
ima_path_check actually deals with files!  call it ima_file_check instead.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Mimi Zohar
8eb988c70e fix ima breakage
The "Untangling ima mess, part 2 with counters" patch messed
up the counters.  Based on conversations with Al Viro, this patch
streamlines ima_path_check() by removing the counter maintaince.
The counters are now updated independently, from measuring the file,
in __dentry_open() and alloc_file() by calling ima_counts_get().
ima_path_check() is called from nfsd and do_filp_open().
It also did not measure all files that should have been measured.
Reason: ima_path_check() got bogus value passed as mask.
[AV: mea culpa]
[AV: add missing nfsd bits]

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Al Viro
1e41568d73 Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Jun'ichi Nomura
4b06e5b9ad freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
Thanks Thomas and Christoph for testing and review.
I removed 'smp_wmb()' before up_write from the previous patch,
since up_write() should have necessary ordering constraints.
(I.e. the change of s_frozen is visible to others after up_write)
I'm quite sure the change is harmless but if you are uncomfortable
with Tested-by/Reviewed-by on the modified patch, please remove them.

If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
deactivate_locked_super().
Also, keep sb->s_frozen consistent so that remount can check the frozen state.

Otherwise a crash reported here can happen:
http://lkml.org/lkml/2010/1/16/37
http://lkml.org/lkml/2010/1/28/53

This patch should be applied for 2.6.32 stable series, too.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Thomas Backlund <tmb@mandriva.org>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:21 -05:00
Al Viro
8dd5ca532c befs: fix leak
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:21 -05:00
Steve French
301a6a3177 [CIFS] Maximum username length check in session setup does not match
Fix length check reported by D. Binderman (see below)

d binderman <dcb314@hotmail.com> wrote:
>
> I just ran the sourceforge tool cppcheck over the source code of the
> new Linux kernel 2.6.33-rc6
>
> It said
>
> [./cifs/sess.c:250]: (error) Buffer access out-of-bounds

May turn out to be harmless, but best to be safe. Note max
username length is defined to 32 due to Linux (Windows
maximum is 20).

Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-06 07:08:53 +00:00
Jeff Layton
f12f98dba6 cifs: fix length calculation for converted unicode readdir names
cifs_from_ucs2 returns the length of the converted name, including the
length of the NULL terminator. We don't want to include the NULL
terminator in the dentry name length however since that'll throw off the
hash calculation for the dentry cache.

I believe that this is the root cause of several problems that have
cropped up recently that seem to be papered over with the "noserverino"
mount option. More confirmation of that would be good, but this is
clearly a bug and it fixes at least one reproducible problem that
was reported.

This patch fixes at least this reproducer in this kernel.org bug:

    http://bugzilla.kernel.org/show_bug.cgi?id=15088#c12

Reported-by: Bjorn Tore Sund <bjorn.sund@it.uib.no>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-02-06 06:25:16 +00:00
Roel Kluin
bd6b0bf87d ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
The wrong member was compared in the continguousness check.

Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-05 15:06:21 -08:00
James Bottomley
73c77e2ccc xfs: fix xfs to work with Virtually Indexed architectures
xfs_buf.c includes what is essentially a hand rolled version of
blk_rq_map_kern().  In order to work properly with the vmalloc buffers
that xfs uses, this hand rolled routine must also implement the flushing
API for vmap/vmalloc areas.

[style updates from hch@lst.de]
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-02-05 12:32:35 -06:00
Linus Torvalds
adbfbcd12a Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: apply updated fallocate i_size fix
  Btrfs: do not try and lookup the file extent when finishing ordered io
  Btrfs: Fix oopsen when dropping empty tree.
  Btrfs: remove BUG_ON() due to mounting bad filesystem
  Btrfs: make error return negative in btrfs_sync_file()
  Btrfs: fix race between allocate and release extent buffer.
2010-02-05 07:23:03 -08:00
Linus Torvalds
a9861b5037 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Don't clobber the attribute type in nfs_update_inode()
  NFS: Fix a umount race
  NFS: Fix an Oops when truncating a file
  NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly
  NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily
  NFSv4: Don't allow posix locking against servers that don't support it
  NFSv4: Ensure that the NFSv4 locking can recover from stateid errors
  NFS: Avoid warnings when CONFIG_NFS_V4=n
  NFS: Make nfs_commitdata_release static
  NFS: Try to commit unstable writes in nfs_release_page()
  NFS: Fix a reference leak in nfs_wb_cancel_page()
2010-02-04 16:08:15 -08:00
Linus Torvalds
a3a71ca9a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Extend umount wait coverage to full glock lifetime
  GFS2: Wait for unlock completion on umount
2010-02-04 16:06:48 -08:00
Aneesh Kumar K.V
23b5c50945 Btrfs: apply updated fallocate i_size fix
This version of the i_size fix for fallocate makes sure we only update
the i_size when the current fallocate is really operating outside of
i_size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04 11:33:03 -05:00
Josef Bacik
efd049fb26 Btrfs: do not try and lookup the file extent when finishing ordered io
When running the following fio job

[torrent]
filename=torrent-test
rw=randwrite
size=4g
filesize=4g
bs=4k
ioengine=sync

you would see long stalls where no work was being done.  That is because we were
doing all this extra work to read in the file extent outside of the transaction,
however in the random io case this ends up hurting us because the file extents
are not there to begin with.  So axe this logic, since we end up reading in the
file extent when we go to update it anyway.  This took the fio job from 11 mb/s
with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04 11:31:45 -05:00
Yan, Zheng
7a7965f83e Btrfs: Fix oopsen when dropping empty tree.
When dropping a empty tree, walk_down_tree() skips checking
extent information for the tree root. This will triggers a
BUG_ON in walk_up_proc().

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04 11:31:45 -05:00
Miao Xie
d7ce5843bb Btrfs: remove BUG_ON() due to mounting bad filesystem
Mounting a bad filesystem caused a BUG_ON(). The following is steps to
reproduce it.
 # mkfs.btrfs /dev/sda2
 # mount /dev/sda2 /mnt
 # mkfs.btrfs /dev/sda1 /dev/sda2
 (the program says that /dev/sda2 was mounted, and then exits. )
 # umount /mnt
 # mount /dev/sda1 /mnt

At the third step, mkfs.btrfs exited in the way of make filesystem. So the
initialization of the filesystem didn't finish. So the filesystem was bad, and
it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong
code, not user's operation, so I think it is a bug of btrfs.

This patch fixes it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04 11:31:44 -05:00
Roel Kluin
014e4ac4f7 Btrfs: make error return negative in btrfs_sync_file()
It appears the error return should be negative

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04 11:31:44 -05:00
Yan, Zheng
f044ba7835 Btrfs: fix race between allocate and release extent buffer.
Increase extent buffer's reference count while holding the lock.
Otherwise it can race with try_release_extent_buffer.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-02-04 11:31:44 -05:00
Sunil Mushran
cda70ba8c0 ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
During recovery, the dlm frees the locks for the dead node. If it finds a
lock in a resource for the dead node, it expects that node to also have a
ref in that lock resource. If not, it BUGs.

ossbz#1175 was filed with the above BUG. Now, while it is correct that we
should be expecting the ref, I see no reason why we have to BUG. After all,
we are freeing up the lock and clearing the ref.

This patch replaces the BUG_ON with a printk(). Hopefully, that will give
us more clues next time this happens.

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1175

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-03 17:51:41 -08:00
Sunil Mushran
079b805782 ocfs2: Plugs race between the dc thread and an unlock ast message
This patch plugs a race between the downconvert thread and an unlock ast message.
Specifically, after the downconvert worker has done its task, the dc thread needs
to check whether an unlock ast made the downconvert moot.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@sus.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-03 17:26:03 -08:00
Dave Chinner
5322892d86 xfs: kill xfs_bawrite
There are no more users of this function left in the XFS code
now that we've switched everything to delayed write flushing.
Remove it.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-04 10:09:14 +11:00
Christoph Hellwig
07fec73625 xfs: log changed inodes instead of writing them synchronously
When an inode has already be flushed delayed write,
xfs_inode_clean() returns true and hence xfs_fs_write_inode() can
return on a synchronous inode write without having written the
inode. Currently these sycnhronous writes only come sync(1),
unmount, a sycnhronous NFS export and cachefiles so should be
relatively rare and out of common performance paths.

Realistically, a synchronous inode write is not necessary here; we
can avoid writing the inode by logging any non-transactional changes
that are pending.  This needs to be done with synchronous
transactions, but it avoids seeking between the log and inode
clusters as we do now. We don't force the log if the inode is
pinned, though, so this differs from the fsync case.  For normal
sys_sync and unmount behaviour this is fine because we do a
synchronous log force in xfs_sync_data which is called from the
->sync_fs code.

It does however break the NFS synchronous export guarantees for now,
but work is under way to fix this at a higher level or for the
higher level to provide an additional flag in the writeback control
to tell us that a log force is needed.

Portions of this patch are based on work from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
2010-02-09 11:43:49 +11:00
Linus Torvalds
c1c0cbb878 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix potential leak of dirty data on umount
2010-02-03 08:47:15 -08:00
Trond Myklebust
9b4b351346 NFS: Don't clobber the attribute type in nfs_update_inode()
If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should
not set the S_IFMT part of inode->i_mode.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-03 08:27:35 -05:00
Trond Myklebust
387c149b54 NFS: Fix a umount race
Ensure that we unregister the bdi before kill_anon_super() calls
ida_remove() on our device name.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-03 08:27:35 -05:00
Trond Myklebust
9f557cd807 NFS: Fix an Oops when truncating a file
The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
Since the NFS code assumes that the page stays mapped for as long as the
writeback is active, we can end up Oopsing (among other things).

The only safe fix here is to convert nfs_wait_on_request(), so as to make
it uninterruptible (as is already the case with wait_on_page_writeback()).


Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-02-03 08:27:22 -05:00
Steven Whitehouse
8f05228ee7 GFS2: Extend umount wait coverage to full glock lifetime
Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.

This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-03 09:56:21 +00:00
Steven Whitehouse
e402746a94 GFS2: Wait for unlock completion on umount
This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2010-02-03 09:47:04 +00:00
Sunil Mushran
db0f6ce697 ocfs2: Remove overzealous BUG_ON during blocked lock processing
During blocked lock processing, we should consider the possibility that the
lock is no longer blocking.

Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 23:51:16 -08:00
Sunil Mushran
0d74125a6a ocfs2: Do not downconvert if the lock level is already compatible
During upconvert, if the master were to send a BAST, dlmglue will detect the
upconversion in process and send a cancel convert to the master. Upon receiving
the AST for the cancel convert, it will re-process the lock resource to determine
whether it needs downconverting. Say, the up was from PR to EX and the BAST was
for EX. After the cancel convert, it will need to downconvert to NL.

However, if the node was originally upconverting from NL to EX, then there would
be no reason to downconvert (assuming the same message sequence).

This patch makes dlmglue consider the possibility that the current lock level
is already compatible and that downconverting is not required.

Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.

Fixes ossbz#1178
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178

Reported-by: Coly Li <coly.li@suse.de>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 23:51:14 -08:00
Sunil Mushran
a191282601 ocfs2: Prevent a livelock in dlmglue
There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
to get an ast for an upconvert request, followed immediately by a bast,
there is a small window where the fs may downconvert the lock before the
process requesting the upconvert is able to take the lock.

This patch adds a new flag to indicate that the upconvert is still in
progress and that the dc thread should not downconvert it right now.

Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker
<joel.becker@oracle.com> contributed heavily to this patch.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 23:51:13 -08:00
Wengang Wang
0b94a909eb ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to
downconverted.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 23:50:55 -08:00
Tao Ma
34e6c59af0 ocfs2: Use compat_ptr in reflink_arguments.
Although we use u64 to pass userspace pointers to the kernel
to avoid compat_ioctl, it doesn't work in some ppc platform.
So wrap them with compat_ptr and add compat_ioctl.

The detailed discussion about compat_ptr can be found in thread
http://lkml.org/lkml/2009/10/27/423.

We indeed met with a bug when testing on ppc(-EFAULT is returned
when using old_path). This patch try to fix this.
I have tested in ppc64(with 32 bit reflink) and x86_64(with i686
reflink), both works.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 18:56:37 -08:00
Sunil Mushran
cd34edd8cf ocfs2/dlm: Handle EAGAIN for compatibility - v2
Mainline commit aad1b15310 made the
dlm_begin_reco_handler() return -EAGAIN instead of EAGAIN.

As this error is transmitted over the wire, we want the receiver,
dlm_send_begin_reco_message(), to understand both the older EAGAIN and
the newer -EAGAIN, to allow rolling upgrade of the cluster nodes.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 18:56:34 -08:00
Tao Ma
60c486744c ocfs2: Add parenthesis to wrap the check for O_DIRECT.
Add parenthesis to wrap the check for O_DIRECT.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 18:15:37 -08:00
Tao Ma
0a1ea437d8 ocfs2: Only bug out when page size is larger than cluster size.
In CoW, we have to make sure that the page is already written
out to the disk. So we have a BUG_ON(PageDirty(page)).

In ppc platform we have pagesize=64K, so if the cs=4K, if the
file have fragmented clusters, we will map the page many times.
See this file as an example.
Tree Depth: 0   Count: 19   Next Free Rec: 14
	## Offset        Clusters       Block#          Flags
	0  0             4              2164864         0x2 Refcounted
	1  4             2              9302792         0x2 Refcounted
...

We have to replace the extent recs one by one, so the page with index 0
will be mapped and dirtied twice.

I'd like to leave the BUG_ON there while adding a check so that in
case we meet with an error in other platforms, we can find it easily.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 18:15:35 -08:00
Tao Ma
d622b89a2f ocfs2: Fix memory overflow in cow_by_page.
In ocfs2_duplicate_clusters_by_page, we calculate map_end
by shifting page_index. But actually in case we meet with
a large offset(say in a i686 box, poff_t is only 32 bits
and page_index=2056240), we will overflow. So change the
type of page_index to loff_t.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02 18:14:20 -08:00
anfei zhou
931e80e4b3 mm: flush dcache before writing into page to avoid alias
The cache alias problem will happen if the changes of user shared mapping
is not flushed before copying, then user and kernel mapping may be mapped
into two different cache line, it is impossible to guarantee the coherence
after iov_iter_copy_from_user_atomic.  So the right steps should be:

	flush_dcache_page(page);
	kmap_atomic(page);
	write to page;
	kunmap_atomic(page);
	flush_dcache_page(page);

More precisely, we might create two new APIs flush_dcache_user_page and
flush_dcache_kern_page to replace the two flush_dcache_page accordingly.

Here is a snippet tested on omap2430 with VIPT cache, and I think it is
not ARM-specific:

	int val = 0x11111111;
	fd = open("abc", O_RDWR);
	addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	*(addr+0) = 0x44444444;
	tmp = *(addr+0);
	*(addr+1) = 0x77777777;
	write(fd, &val, sizeof(int));
	close(fd);

The results are not always 0x11111111 0x77777777 at the beginning as expected.  Sometimes we see 0x44444444 0x77777777.

Signed-off-by: Anfei <anfei.zhou@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <linux-arch@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-02 18:11:21 -08:00
Linus Torvalds
1a45dcfe25 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: Do not idle on async queues
  blk-cgroup: Fix potential deadlock in blk-cgroup
  block: fix bugs in bio-integrity mempool usage
  block: fix bio_add_page for non trivial merge_bvec_fn case
  drbd: null dereference bug
  drbd: fix max_segment_size initialization
2010-02-02 12:54:37 -08:00
Linus Torvalds
4dab75ec3e Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Use GFP_NOFS for alloc structure
  GFS2: Fix previous patch
  GFS2: Don't withdraw on partial rindex entries
  GFS2: Fix refcnt leak on gfs2_follow_link() error path
2010-02-02 12:48:26 -08:00
Linus Torvalds
7ab02af428 Fix 'flush_old_exec()/setup_new_exec()' split
Commit 221af7f87b ("Split 'flush_old_exec' into two functions") split
the function at the point of no return - ie right where there were no
more error cases to check.  That made sense from a technical standpoint,
but when we then also combined it with the actual personality setting
going in between flush_old_exec() and setup_new_exec(), it needs to be a
bit more careful.

In particular, we need to make sure that we really flush the old
personality bits in the 'flush' stage, rather than later in the 'setup'
stage, since otherwise we might be flushing the _new_ personality state
that we're just setting up.

So this moves the flags and personality flushing (and 'flush_thread()',
which is the arch-specific function that generally resets lazy FP state
etc) of the old process into flush_old_exec(), so that it doesn't affect
any state that execve() is setting up for the new process environment.

This was reported by Michal Simek as breaking his Microblaze qemu
environment.

Reported-and-tested-by: Michal Simek <michal.simek@petalogix.com>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-02 12:37:44 -08:00
Christoph Hellwig
e8b217e753 xfs: remove invalid barrier optimization from xfs_fsync
We always need to flush the disk write cache and can't skip it just because
the no inode attributes have changed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
2010-02-02 10:16:26 +11:00
Dave Chinner
20026d9201 xfs: kill the unused XFS_QMOPT_* flush flags V2
dquots are never flushed asynchronously. Remove the flag and the
async write support from the flush function. Make the default flush
a delwri flush to make the inode flush code, which leaves the
XFS_QMOPT_SYNC the only flag remaining.  Convert that to use
SYNC_WAIT instead, just like the inode flush code.

V2:
- just pass flush flags straight through

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-02-04 09:48:58 +11:00
Linus Torvalds
13af75740f Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  reiserfs: Fix vmalloc call under reiserfs lock
2010-02-01 10:46:18 -08:00
Steven Whitehouse
ea8d62dadd GFS2: Use GFP_NOFS for alloc structure
This is called under a glock, so its a good plan to use GFP_NOFS

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01 10:01:34 +00:00
Steven Whitehouse
7fe3ec6fe5 GFS2: Fix previous patch
The do_div() call needs to remain.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01 10:00:23 +00:00
Benjamin Marzinski
55f0b4c546 GFS2: Don't withdraw on partial rindex entries
ince gfs2 writes the rindex file a block at a time, and releases the
exclusive lock after each block, it is possible that another process
will grab the lock in the middle of the write.  Since rindex entries are
not an even divisor of blocks, that other process may see partial
entries.  On grows, this is fine.  The process can simply ignore the the
partial entires. Previously, the code withdrew when it saw partial
entries. Now it simply ignores them.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-02-01 09:59:54 +00:00
Ryusuke Konishi
3256a05531 nilfs2: fix potential leak of dirty data on umount
This fixes incorrect usage of nilfs_segctor_confirm() test function in
nilfs_segctor_destroy(); nilfs_segctor_confirm() returns zero if the
filesystem is not clean, so its use in nilfs_segctor_destroy() needs
inversion.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-01-31 14:57:31 +09:00
Chuck Ebbert
9e9432c267 block: fix bugs in bio-integrity mempool usage
Fix two bugs in the bio integrity code:

 use_bip_pool() always returns 0 because it checks against the wrong limit,
 causing the mempool to be used only when regular allocation fails.

 When the mempool is used as a fallback we don't free the data properly.

Signed-Off-By: Chuck Ebbert <cebbert@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-30 20:28:19 +01:00
Linus Torvalds
67f15b06c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: check total number of devices when removing missing
  Btrfs: check return value of open_bdev_exclusive properly
  Btrfs: do not mark the chunk as readonly if in degraded mode
  Btrfs: run orphan cleanup on default fs root
  Btrfs: fix a memory leak in btrfs_init_acl
  Btrfs: Use correct values when updating inode i_size on fallocate
  Btrfs: remove tree_search() in extent_map.c
  Btrfs: Add mount -o compress-force
2010-01-29 10:27:37 -08:00
Linus Torvalds
221af7f87b Split 'flush_old_exec' into two functions
'flush_old_exec()' is the point of no return when doing an execve(), and
it is pretty badly misnamed.  It doesn't just flush the old executable
environment, it also starts up the new one.

Which is very inconvenient for things like setting up the new
personality, because we want the new personality to affect the starting
of the new environment, but at the same time we do _not_ want the new
personality to take effect if flushing the old one fails.

As a result, the x86-64 '32-bit' personality is actually done using this
insane "I'm going to change the ABI, but I haven't done it yet" bit
(TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
personality, but just the "pending" bit, so that "flush_thread()" can do
the actual personality magic.

This patch in no way changes any of that insanity, but it does split the
'flush_old_exec()' function up into a preparatory part that can fail
(still called flush_old_exec()), and a new part that will actually set
up the new exec environment (setup_new_exec()).  All callers are changed
to trivially comply with the new world order.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 08:22:01 -08:00
Josef Bacik
035fe03a7a Btrfs: check total number of devices when removing missing
If you have a disk failure in RAID1 and then add a new disk to the
array, and then try to remove the missing volume, it will fail.  The
reason is the sanity check only looks at the total number of rw devices,
which is just 2 because we have 2 good disks and 1 bad one.  Instead
check the total number of devices in the array to make sure we can
actually remove the device.  Tested this with a failed disk setup and
with this test we can now run

btrfs-vol -r missing /mount/point

and it works fine.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Josef Bacik
7f59203abe Btrfs: check return value of open_bdev_exclusive properly
Hit this problem while testing RAID1 failure stuff.  open_bdev_exclusive
returns ERR_PTR(), not NULL.  So change the return value properly.  This
is important if you accidently specify a device that doesn't exist when
trying to add a new device to an array, you will panic the box
dereferencing bdev.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Josef Bacik
f48b90756b Btrfs: do not mark the chunk as readonly if in degraded mode
If a RAID setup has chunks that span multiple disks, and one of those
disks has failed, btrfs_chunk_readonly will return 1 since one of the
disks in that chunk's stripes is dead and therefore not writeable.  So
instead if we are in degraded mode, return 0 so we can go ahead and
allocate stuff.  Without this patch all of the block groups in a RAID1
setup will end up read-only, which will mean we can't add new disks to
the array since we won't be able to make allocations.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Josef Bacik
e3acc2a685 Btrfs: run orphan cleanup on default fs root
This patch revert's commit

6c090a11e1

Since it introduces this problem where we can run orphan cleanup on a
volume that can have orphan entries re-added.  Instead of my original
fix, Yan Zheng pointed out that we can just revert my original fix and
then run the orphan cleanup in open_ctree after we look up the fs_root.
I have tested this with all the tests that gave me problems and this
patch fixes both problems.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Yang Hongyang
f858153c36 Btrfs: fix a memory leak in btrfs_init_acl
In btrfs_init_acl() cloned acl is not released

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Aneesh Kumar K.V
d1ea6a6145 Btrfs: Use correct values when updating inode i_size on fallocate
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Jan 20 12:57:53 2010 +0530

    Btrfs: Use correct values when updating inode i_size on fallocate

    Even though we allocate more, we should be updating inode i_size
    as per the arguments passed

    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:38 -05:00
Miao Xie
b8d9bfeb18 Btrfs: remove tree_search() in extent_map.c
This patch removes tree_search() in extent_map.c because it is not called by
anything.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:38 -05:00
Chris Mason
a555f810af Btrfs: Add mount -o compress-force
The default btrfs mount -o compress mode will quickly back off
compressing a file if it notices that compression does not reduce the
size of the data being written.  This can save considerable CPU because
all future writes to the file go through uncompressed.

But some files are both very large and have mixed data stored in
them.  In that case, we want to add the ability to always try
compressing data before writing it.

This commit adds mount -o compress-force.  A later commit will add
a new inode flag that does the same thing.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:18:15 -05:00