* 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits)
NFS: NFSv4.1 is no longer a "developer only" feature
NFS: NFS_V4 is no longer an EXPERIMENTAL feature
NFS: Fix /proc/mount for legacy binary interface
NFS: Fix the locking in nfs4_callback_getattr
SUNRPC: Defer deleting the security context until gss_do_free_ctx()
SUNRPC: prevent task_cleanup running on freed xprt
SUNRPC: Reduce asynchronous RPC task stack usage
SUNRPC: Move the bound cred to struct rpc_rqst
SUNRPC: Clean up of rpc_bindcred()
SUNRPC: Move remaining RPC client related task initialisation into clnt.c
SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task
SUNRPC: Make the credential cache hashtable size configurable
SUNRPC: Store the hashtable size in struct rpc_cred_cache
NFS: Ensure the AUTH_UNIX credcache is allocated dynamically
NFS: Fix the NFS users of rpc_restart_call()
SUNRPC: The function rpc_restart_call() should return success/failure
NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
NFSv4: Clean up the process of renewing the NFSv4 lease
NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
NFS: nfs_rename() should not have to flush out writebacks
...
Mark it as 'experimental' instead, since in practice, NFSv4.1 should now be
relatively stable.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a flag so we know if we mounted the NFS server using the legacy
binary interface. If we used the legacy interface, then we should not
show the mountd options.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The delegation is protected by RCU now, so we need to replace the
nfsi->rwsem protection with an rcu protected section.
Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This will allow us to save the original generic cred in rpc_message, so
that if we migrate from one server to another, we can generate a new bound
cred without having to punt back to the NFS layer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix up those functions that depend on knowing whether or not
rpc_restart_call is successful or not.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is no real reason to have RPC_ASSASSINATED() checks in the NFS code.
As far as it is concerned, this is just an RPC error...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In RFC5661, an NFS4ERR_DELAY error on a SEQUENCE operation has the special
meaning that the server is not finished processing the request. In this
case we want to just retry the request without touching the slot.
Also fix a bug whereby we would fail to update the sequence id if the
server returned any error other than NFS_OK/NFS4ERR_DELAY.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We don't really support nfs servers that invalidate the file handle after a
rename, so precautions such as flushing out dirty data before renaming the
file are superfluous.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Christoph points out that the VFS will always flush out data before calling
nfs_fsync(), so we can dispense with a full call to nfs_wb_all(), and
replace that with a simpler call to nfs_commit_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently MAY_ACCESS means that filesystems must check the permissions
right then and not rely on cached results or the results of future
operations on the object. This can be because of a call to sys_access() or
because of a call to chdir() which needs to check search without relying on
any future operations inside that dir. I plan to use MAY_ACCESS for other
purposes in the security system, so I split the MAY_ACCESS and the
MAY_CHDIR cases.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
nfs_commit_inode() needs to be defined irrespectively of whether or not
we are supporting NFSv3 and NFSv4.
Allow the compiler to optimise away code in the NFSv2-only case by
converting it into an inlined stub function.
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
See https://bugzilla.kernel.org/show_bug.cgi?id=16056
If other processes are blocked waiting for kswapd to free up some memory so
that they can make progress, then we cannot allow kswapd to block on those
processes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
In root_nfs_name() it does the following:
if (strlen(buf) + strlen(cp) > NFS_MAXPATHLEN) {
printk(KERN_ERR "Root-NFS: Pathname for remote directory too long.\n");
return -1;
}
sprintf(nfs_export_path, buf, cp);
In the original code if (strlen(buf) + strlen(cp) == NFS_MAXPATHLEN)
then the sprintf() would lead to an overflow. Generally the rest of the
code assumes that the path can have NFS_MAXPATHLEN (1024) characters and
a NUL terminator so the fix is to add space to the nfs_export_path[]
buffer.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
flock locks want to be labelled using the process pid, while posix locks
want to be labelled using the fl_owner.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is needed by NFSv4.0 servers in order to keep the number of locking
stateids at a manageable level.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The 'so_delegations' list appears to be unused.
Also eliminate so_client. If we already have so_server, we can get to the
nfs_client structure.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is no reason to change the nfs_client state every time we allocate a
new session. Move that line into nfs4_init_client_minor_version.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of testing if the nfs_client has a session, we should be testing if
the struct nfs4_sequence_res was set up with one.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In anticipation of the day when we have per-filesystem sessions, and also
in order to allow the session to change in the event of a filesystem
migration event.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Nobody uses the rpc_status parameter.
It is not obvious why we need the struct nfs_client argument either, when
we already have that information in the session.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Firstly, there is little point in first zeroing out the entire struct
nfs4_sequence_res, and then initialising all fields save one. Just
initialise the last field to zero...
Secondly, nfs41_setup_sequence() has only 2 possible return values: 0, or
-EAGAIN, so there is no 'terminate rpc task' case.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the call to rpc_call_async() fails, then the arguments will not be
freed, since there will be no call to nfs41_sequence_call_done
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Apparently, we have never been able to set the atime correctly from the
NFSv4 client.
Reported-by: 小倉一夫 <ka-ogura@bd6.so-net.ne.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Currently, we do not display the minor version mount parameter in the
/proc mount info.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Put the code that is common to both the referral and ordinary mount cases
into a common helper routine.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits,
so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
J.R. Okajima reports that the call to sync_inode() in nfs_wb_page() can
deadlock with other writeback flush calls. It boils down to the fact
that we cannot ever call writeback_single_inode() while holding a page
lock (even if we do set nr_to_write to zero) since another process may
already be waiting in the call to do_writepages(), and so will deny us
the I_SYNC lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we exit from nfs_commit_inode() without ensuring that the COMMIT rpc
call has been completed, we must re-mark the inode as dirty. Otherwise,
future calls to sync_inode() with the WB_SYNC_ALL flag set will fail to
ensure that the data is on the disk.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit 9c7e7e2337 (NFS: Don't call iput() in
nfs_access_cache_shrinker) unintentionally removed the spin unlock for the
inode->i_lock.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.
- Make SHRT_MIN of type s16, not int, for consistency.
[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iput() can potentially attempt to allocate memory, so we should avoid
calling it in a memory shrinker. Instead, rely on the fact that iput() will
call nfs_access_zap_cache().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Both iput() and put_rpccred() might allocate memory under certain
circumstances, so make sure that we don't recurse and deadlock...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We do not want to have the state recovery thread kick off and wait for a
memory reclaim, since that may deadlock when the writebacks end up
waiting for the state recovery thread to complete.
The safe thing is therefore to use GFP_NOFS in all open, close,
delegation return, lock, etc. operations that may be called by the
state recovery thread.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I'm about to change task->tk_start from a jiffies value to a ktime_t
value in order to make RPC RTT reporting more precise.
Recently (commit dc96aef9) nfs4_renew_done() started to reference
task->tk_start so that a jiffies value no longer had to be passed
from nfs4_proc_async_renew(). This allowed the calldata to point to
an nfs_client instead.
Changing task->tk_start to a ktime_t value makes it effectively
useless for renew timestamps, so we need to restore the pre-dc96aef9
logic that provided a jiffies "start" timestamp to nfs4_renew_done().
Both an nfs_client pointer and a timestamp need to be passed to
nfs4_renew_done(), so create a new nfs_renewdata structure that
contains both, resembling what is already done for delegreturn,
lock, and unlock.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up:
fs/nfs/iostat.h: In function ‘nfs_add_server_stats’:
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
fs/nfs/iostat.h:41: warning: comparison between signed and unsigned integer expressions
Commit fce22848 replaced the open-coded per-cpu logic in several
functions in fs/nfs/iostat.h with a single invocation of
this_cpu_ptr(). This macro assumes its second argument is signed,
not unsigned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: fscache_uniq takes a string, so it should be included
with the other string mount option definitions, by convention.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Seen with -Wextra:
/home/cel/linux/fs/nfs/fscache.c: In function ‘__nfs_readpages_from_fscache’:
/home/cel/linux/fs/nfs/fscache.c:479: warning: comparison between signed and unsigned integer expressions
The comparison implicitly converts "int" to "unsigned", making it
safe. But there's no need for the implicit type conversions here, and
the dfprintk() already uses a "%u" formatter for "npages." Better to
reduce confusion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server has given us a delegation on a file, we _know_ that we can
cache the attribute information even when the user has specified 'noac'.
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Keep a global count of how many referrals that the current task has
traversed on a path lookup. Return ELOOP if the count exceeds
MAX_NESTED_LINKS.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the O_EXCL open handling into _nfs4_do_open() where it belongs. Doing
so also allows us to reuse the struct fattr from the opendata.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All we really want is the ability to retrieve the root file handle. We no
longer need the ability to walk down the path, since that is now done in
nfs_follow_remote_path().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS Filehandles and struct fattr are really too large to be allocated on
the stack. This patch adds in a couple of helper functions to allocate them
dynamically instead.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix a number of RCU issues in the NFSv4 delegation code.
(1) delegation->cred doesn't need to be RCU protected as it's essentially an
invariant refcounted structure.
By the time we get to nfs_free_delegation(), the delegation is being
released, so no one else should be attempting to use the saved
credentials, and they can be cleared.
However, since the list of delegations could still be under traversal at
this point by such as nfs_client_return_marked_delegations(), the cred
should be released in nfs_do_free_delegation() rather than in
nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is
insufficient as that doesn't stop the cred from being destroyed, and nor
does calling put_rpccred() after call_rcu(), given that the latter is
asynchronous.
(2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
rcu_derefence_protected() because they can only be called if
nfs_client::cl_lock is held, and that guards against anyone changing
nfsi->delegation under it. Furthermore, the barrier imposed by
rcu_dereference() is superfluous, given that the spin_lock() is also a
barrier.
(3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
struct so that it can issue lockdep advice based on clp->cl_lock for (2).
(4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
should use rcu_access_pointer() outside the spinlocked region as they
merely examine the pointer and don't follow it, thus rendering unnecessary
the need to impose a partial ordering over the one item of interest.
These result in an RCU warning like the following:
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by mount.nfs4/2281:
#0: (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
#1: (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a
stack backtrace:
Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
Call Trace:
[<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
[<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
[<ffffffff810c2d92>] clear_inode+0x9e/0xf8
[<ffffffff810c3028>] dispose_list+0x67/0x10e
[<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
[<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
[<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
[<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
[<ffffffff810b25bc>] deactivate_super+0x68/0x80
[<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
[<ffffffff810c681b>] release_mounts+0x9a/0xb0
[<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
[<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
[<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
[<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
[<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
[<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
[<ffffffff810b2176>] do_kern_mount+0x48/0xe8
[<ffffffff810c810b>] do_mount+0x782/0x7f9
[<ffffffff810c8205>] sys_mount+0x83/0xbe
[<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
Also on:
fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
[<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
[<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
[<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
...
And:
fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
[<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
[<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
[<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
[<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
...
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
nfs: fix some issues in nfs41_proc_reclaim_complete()
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
NFS: Fix an unstable write data integrity race
nfs: testing for null instead of ERR_PTR()
NFS: rsize and wsize settings ignored on v4 mounts
NFSv4: Don't attempt an atomic open if the file is a mountpoint
SUNRPC: Fix a bug in rpcauth_prune_expired
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.
Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The original code passed an ERR_PTR() to rpc_put_task() and instead of
returning zero on success it returned -ENOMEM.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
nfs_page_async_flush. According to the trace in
https://bugzilla.novell.com/show_bug.cgi?id=599628
the problem appears to be due to nfs_wb_page() not waiting for the
PG_writeback flag to clear.
There is a ditto problem in nfs_wb_page_cancel()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit 2c61be0a94 (NFS: Ensure that the WRITE
and COMMIT RPC calls are always uninterruptible) exposed a race on file
close. In order to ensure correct close-to-open behaviour, we want to wait
for all outstanding background commit operations to complete.
This patch adds an inode flag that indicates if a commit operation is under
way, and provides a mechanism to allow ->write_inode() to wait for its
completion if this is a data integrity flush.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 mounts ignore the rsize and wsize mount options, and always use
the default transfer size for both. This seems to be because all
NFSv4 mounts are now cloned, and the cloning logic doesn't copy the
rsize and wsize settings from the parent nfs_server.
I tested Fedora's 2.6.32.11-99 and it seems to have this problem as
well, so I'm guessing that .33, .32, and perhaps older kernels have
this issue as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Stable <stable@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Arnaud Giersch reports that NFSv4 locking is broken when we hold a
delegation since commit 8e469ebd6d (NFSv4:
Don't allow posix locking against servers that don't support it).
According to Arnaud, the lock succeeds the first time he opens the file
(since we cannot do a delegated open) but then fails after we start using
delegated opens.
The following patch fixes it by ensuring that locking behaviour is
governed by a per-filesystem capability flag that is initially set, but
gets cleared if the server ever returns an OPEN without the
NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.
Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org