Commit Graph

5847 Commits (8d7282036f82244c5a1146a1a7edf03c50d278d9)

Author SHA1 Message Date
Tejun Heo fb6896da37 sysfs: restructure add/remove paths and fix inode update
The original add/remove code had the following problems.

* parent's timestamps are updated on dentry instantiation.  this is
  incorrect with reclaimable files.

* updating parent's timestamps isn't synchronized.

* parent nlink update assumes the inode is accessible which won't be
  true once directory dentries are made reclaimable.

This patch restructures add/remove paths to resolve the above
problems.  Add/removal are done in the following steps.

1. sysfs_addrm_start() : acquire locks including sysfs_mutex and other
   resources.

2-a. sysfs_add_one() : add new sd.  linking the new sd into the
     children list is caller's responsibility.

2-b. sysfs_remove_one() : remove a sd.  unlinking the sd from the
     children list is caller's responsibility.

3. sysfs_addrm_finish() : release all resources and clean up.

Steps 2-a and/or 2-b can be repeated multiple times.

Parent's inode is looked up during sysfs_addrm_start().  If available
(always at the moment), it's pinned and nlink is updated as sd's are
added and removed.  Timestamps are updated during finish if any sd has
been added or removed.  If parent's inode is not available during
start, sysfs_mutex ensures that parent inode is not created till
add/remove is complete.

All the complexity is contained inside the helper functions.
Especially, dentry/inode handling is properly hidden from the rest of
sysfs which now mostly operate on sysfs_dirents.  As an added bonus,
codes which use these helpers to add and remove sysfs_dirents are now
more structured and simpler.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:09 -07:00
Tejun Heo 3007e997de sysfs: use sysfs_mutex to protect the sysfs_dirent tree
As kobj sysfs dentries and inodes are gonna be made reclaimable,
i_mutex can't be used to protect sysfs_dirent tree.  Use sysfs_mutex
globally instead.  As the whole tree is protected with sysfs_mutex,
there is no reason to keep sysfs_rename_sem.  Drop it.

While at it, add docbook comments to functions which require
sysfs_mutex locking.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Tejun Heo 5f9953237f sysfs: consolidate sysfs spinlocks
Replace sysfs_lock and kobj_sysfs_assoc_lock with sysfs_assoc_lock.
sysfs_lock was originally to be used to protect sysfs_dirent tree but
mutex seems better choice, so there is no reason to keep sysfs_lock
separate.  Merge the two spinlocks into one.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Tejun Heo 608e266a2d sysfs: make kobj point to sysfs_dirent instead of dentry
As kobj sysfs dentries and inodes are gonna be made reclaimable,
dentry can't be used as naming token for sysfs file/directory, replace
kobj->dentry with kobj->sd.  The only external interface change is
shadow directory handling.  All other changes are contained in kobj
and sysfs.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Tejun Heo f0b0af4792 sysfs: implement sysfs_find_dirent() and sysfs_get_dirent()
Implement sysfs_find_dirent() and sysfs_get_dirent().
sysfs_dirent_exist() is replaced by sysfs_find_dirent().  These will
be used to make directory entries reclamiable.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Tejun Heo 380e6fbb72 sysfs: implement SYSFS_FLAG_REMOVED flag
Implement SYSFS_FLAG_REMOVED flag which currently is used only to
improve sanity check in sysfs_deactivate().  The flag will be used to
make directory entries reclamiable.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Tejun Heo b402d72cf7 sysfs: rename sysfs_dirent->s_type to s_flags and make room for flags
Rename sysfs_dirent->s_type to s_flags, pack type into lower eight
bits and reserve the rest for flags.  sysfs_type() can used to access
the type.  All existing sd->s_type accesses are converted to use
sysfs_type().  While at it, type test is changed to equality test
instead of bit-and test where appropriate.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Tejun Heo d0bcb5689a sysfs: make sysfs_drop_dentry() access inodes using ilookup()
sysfs_drop_dentry() used to go through sd->s_dentry and
sd->s_parent->s_dentry to access the inodes.  This is incorrect
because inode can be cached without dentry.

This patch makes sysfs_drop_dentry() access inodes using ilookup() on
sd->s_ino.  This is both correct and simpler.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:08 -07:00
Rafael J. Wysocki 9d9307dabb sysfs: Fix oops in sysfs_drop_dentry on x86_64
Fix oops on x86_64 caused by the dereference of dir in
sysfs_drop_dentry() made before checking if dir is not NULL
(cf. http://marc.info/?l=linux-kernel&m=118151626704924&w=2).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo 0c73f18b7d sysfs: use singly-linked list for sysfs_dirent tree
Make sysfs_dirent use singly linked list for its tree structure.
sysfs_link_sibling() and sysfs_unlink_sibling() functions are added to
handle simpler cases.  It adds some complexity and cpu cycle overhead
but reduced memory footprint is worthwhile on big machines.

This change reduces the sizeof sysfs_dirent from 104 to 88 on 64bit
and from 60 to 52 on 32bit.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo 8619f97989 sysfs: slim down sysfs_dirent->s_active
Make sysfs_dirent->s_active an atomic_t instead of rwsem.  This
reduces the size of sysfs_dirent from 136 to 104 on 64bit and from 76
to 60 on 32bit with lock debugging turned off.  With lock debugging
turned on the reduction is much larger.

s_active starts at zero and each active reference increments s_active.
Putting a reference decrements s_active.  Deactivation subtracts
SD_DEACTIVATED_BIAS which is currently INT_MIN and assumed to be small
enough to make s_active negative.  If s_active is negative,
sysfs_get() no longer grants new references.  Deactivation succeeds
immediately if there is no active user; otherwise, it waits using a
completion for the last put.

Due to the removal of lockdep tricks, this change makes things less
trickier in release_sysfs_dirent().  As all the complexity is
contained in three s_active functions, I think it's more readable this
way.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo b6b4a4399c sysfs: move s_active functions to fs/sysfs/dir.c
These functions are about to receive more complexity and doesn't
really need to be inlined in the first place.  Move them from
fs/sysfs/sysfs.h to fs/sysfs/dir.c.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo 0b8ead82f5 sysfs: fix root sysfs_dirent -> root dentry association
The root sysfs_dirent didn't point to the root dentry fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo 8312a8d7c1 sysfs: use iget_locked() instead of new_inode()
After dentry is reclaimed, sysfs always used to allocate new dentry
and inode if the file is accessed again.  This causes problem with
operations which only pin the inode.  For example, if inotify watch is
added to a sysfs file and the dentry for the file is reclaimed, the
next update event creates new dentry and new inode making the inotify
watch miss all the events from there on.

This patch fixes it by using iget_locked() instead of new_inode().
sysfs_new_inode() is renamed to sysfs_get_inode() and inode is
initialized iff the inode is newly allocated.  sysfs_instantiate() is
responsible for unlocking new inodes.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo fc9f54b998 sysfs: reorganize sysfs_new_indoe() and sysfs_create()
Reorganize/clean up sysfs_new_inode() and sysfs_create().

* sysfs_init_inode() is separated out from sysfs_new_inode() and is
  responsible for basic initialization.
* sysfs_instantiate() replaces the last step of sysfs_create() and is
  responsible for dentry instantitaion.
* type-specific initialization is moved out to the callers.
* mode is specified only once when creating a sysfs_dirent.
* spurious list_del_init(&sd->s_sibling) dropped from create_dir()

This change is to

* prepare for inode allocation fix.
* separate alloc and init code for synchronization update.
* make dentry/inode initialization more flexible for later changes.

This patch doesn't introduce visible behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo 7f7cfffe60 sysfs: fix parent refcounting during rename and move
Parent reference wasn't properly transferred during rename and move.
Fix it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:07 -07:00
Tejun Heo 42b37df6ab sysfs: make sysfs_alloc_ino() static
sysfs_alloc_ino() isn't used out side of fs/sysfs/dir.c.  Make it
static.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Tejun Heo 7b595756ec sysfs: kill unnecessary attribute->owner
sysfs is now completely out of driver/module lifetime game.  After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners.  Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner.  Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

  http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Tejun Heo dbde0fcf9f sysfs: reimplement sysfs_drop_dentry()
This patch reimplements sysfs_drop_dentry() such that remove_dir() can
use it to drop dentry instead of using a separate mechanism.  With
this change, making directories reclaimable is much easier.

This patch used to contain fixes for two race conditions around
sd->s_dentry but that part has been separated out and included into
mainline early as commit 6aa054aadf and
dd14cbc994.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Tejun Heo 198a2a8470 sysfs: separate out sysfs_attach_dentry()
Consolidate sd <-> dentry association into sysfs_attach_dentry() and
call it after dentry and inode are properly set up.  This is in
preparation of sysfs_drop_dentry() updates.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:05 -07:00
Tejun Heo 73107cb3ad sysfs: kill attribute file orphaning
Now that sysfs_dirent can be disconnected from kobject on deletion,
there is no need to orphan each attribute files.  All [bin_]attribute
nodes are automatically orphaned when the parent node is deleted.
Kill attribute file orphaning.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:05 -07:00
Tejun Heo 0ab66088c8 sysfs: implement sysfs_dirent active reference and immediate disconnect
sysfs: implement sysfs_dirent active reference and immediate disconnect

Opening a sysfs node references its associated kobject, so userland
can arbitrarily prolong lifetime of a kobject which complicates
lifetime rules in drivers.  This patch implements active reference and
makes the association between kobject and sysfs immediately breakable.

Now each sysfs_dirent has two reference counts - s_count and s_active.
s_count is a regular reference count which guarantees that the
containing sysfs_dirent is accessible.  As long as s_count reference
is held, all sysfs internal fields in sysfs_dirent are accessible
including s_parent and s_name.

The newly added s_active is active reference count.  This is acquired
by invoking sysfs_get_active() and it's the caller's responsibility to
ensure sysfs_dirent itself is accessible (should be holding s_count
one way or the other).  Dereferencing sysfs_dirent to access objects
out of sysfs proper requires active reference.  This includes access
to the associated kobjects, attributes and ops.

The active references can be drained and denied by calling
sysfs_deactivate().  All active sysfs_dirents must be deactivated
after deletion but before the default reference is dropped.  This
enables immediate disconnect of sysfs nodes.  Once a sysfs_dirent is
deleted, it won't access any entity external to sysfs proper.

Because attr/bin_attr ops access both the node itself and its parent
for kobject, they need to hold active references to both.
sysfs_get/put_active_two() helpers are provided to help grabbing both
references.  Parent's is acquired first and released last.

Unlike other operations, mmapped area lingers on after mmap() is
finished and the module implement implementing it and kobj need to
stay referenced till all the mapped pages are gone.  This is
accomplished by holding one set of active references to the bin_attr
and its parent if there have been any mmap during lifetime of an
openfile.  The references are dropped when the openfile is released.

This change makes sysfs lifetime rules independent from both kobject's
and module's.  It not only fixes several race conditions caused by
sysfs not holding onto the proper module when referencing kobject, but
also helps fixing and simplifying lifetime management in driver model
and drivers by taking sysfs out of the equation.

Please read the following message for more info.

  http://article.gmane.org/gmane.linux.kernel/510293

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:05 -07:00
Tejun Heo eb36165353 sysfs: implement bin_buffer
Implement bin_buffer which contains a mutex and pointer to PAGE_SIZE
buffer to properly synchronize accesses to per-openfile buffer and
prepare for immediate-kobj-disconnect.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo 2b29ac252a sysfs: reimplement symlink using sysfs_dirent tree
sysfs symlink is implemented by referencing dentry and kobject from
sysfs_dirent - symlink entry references kobject, dentry is used to
walk the tree.  This complicates object lifetimes rules and is
dangerous - for example, there is no way to tell to which module the
target of a symlink belongs and referencing that kobject can make it
linger after the module is gone.

This patch reimplements symlink using only sysfs_dirent tree.  sd for
a symlink points and holds reference to the target sysfs_dirent and
all walking is done using sysfs_dirent tree.  Simpler and safer.

Please read the following message for more info.

  http://article.gmane.org/gmane.linux.kernel/510293

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo aecdcedaab sysfs: implement kobj_sysfs_assoc_lock
kobj->dentry can go away anytime unless the user controls when the
associated sysfs node is deleted.  This patch implements
kobj_sysfs_assoc_lock which protects kobj->dentry.  This will be used
to maintain kobj based API when converting sysfs to use sysfs_dirent
tree instead of dentry/kobject.

Note that this lock belongs to kobject/driver-model not sysfs.  Once
sysfs is converted to not use kobject in its interface, this can be
removed from sysfs.

This is in preparation of object reference simplification.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo 3e5190380e sysfs: make sysfs_dirent->s_element a union
Make sd->s_element a union of sysfs_elem_{dir|symlink|attr|bin_attr}
and rename it to s_elem.  This is to achieve...

* some level of type checking : changing symlink to point to
  sysfs_dirent instead of kobject is much safer and less painful now.
* easier / standardized dereferencing
* allow sysfs_elem_* to contain more than one entry

Where possible, pointer is obtained by directly deferencing from sd
instead of going through other entities.  This reduces dependencies to
dentry, inode and kobject.  to_attr() and to_bin_attr() are unused now
and removed.

This is in preparation of object reference simplification.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo 0c096b507f sysfs: add sysfs_dirent->s_name
Add s_name to sysfs_dirent.  This is to further reduce dependency to
the associated dentry.  Name is copied for directories and symlinks
but not for attributes.

Where possible, name dereferences are converted to use sd->s_name.
sysfs_symlink->link_name and sysfs_get_name() are unused now and
removed.

This change allows symlink to be implemented using sysfs_dirent tree
proper, which is the last remaining dentry-dependent sysfs walk.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo 13b3086d2e sysfs: add sysfs_dirent->s_parent
Add sysfs_dirent->s_parent.  With this patch, each sd points to and
holds a reference to its parent.  This allows walking sysfs tree
without referencing sd->s_dentry which can go away anytime if the user
doesn't control when it's deleted.

sd->s_parent is initialized and parent is referenced in
sysfs_attach_dirent().  Reference to parent is released when the sd is
released, so as long as reference to a sd is held, s_parent can be
followed.

dentry walk in sysfs_readdir() is convereted to s_parent walk.

This will be used to reimplement symlink such that it uses only
sysfs_dirent tree.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo a26cd7226c sysfs: consolidate sysfs_dirent creation functions
Currently there are four functions to create sysfs_dirent -
__sysfs_new_dirent(), sysfs_new_dirent(), __sysfs_make_dirent() and
sysfs_make_dirent().  Other than sysfs_make_dirent(), no function has
two users if calls to implement other functions are excluded.

This patch consolidates sysfs_dirent creation functions into the
following two.

* sysfs_new_dirent() : allocate and initialize
* sysfs_attach_dirent() : attach to sysfs_dirent hierarchy and/or
			  associate with dentry

This simplifies interface and gives callers more flexibility.  This is
in preparation of object reference simplification.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:04 -07:00
Tejun Heo 996b73764e sysfs: flatten and fix sysfs_rename_dir() error handling
Error handling in sysfs_rename_dir() was broken.

* When lookup_one_len() fails, 0 is returned.

* If parent inode check fails, returns with inode mutex and rename
  rwsem held.

This patch fixes the above bugs and flattens error handling such that
it's more readable and easier to modify.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:03 -07:00
Tejun Heo dfeb9fb034 sysfs: flatten cleanup paths in sysfs_add_link() and create_dir()
Flatten cleanup paths in sysfs_add_link() and create_dir() to improve
readability and ease further changes to these functions.  This is in
preparation of object reference simplification.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:03 -07:00
Tejun Heo 93e3cd8270 sysfs: fix error handling in binattr write()
Error handling in fs/sysfs/bin.c:write() was wrong because size_t
count is used to receive return value from flush_write() which is
negative on failure.

This patch updates write() such that int variable is used instead.
read() is updated the same way for consistency.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:03 -07:00
Tejun Heo 7a23ad4404 sysfs: make sysfs_put() ignore NULL sd
Make sysfs_put() ignore NULL sd instead of oopsing.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:03 -07:00
Tejun Heo 2b611bb7ab sysfs: allocate inode number using ida
sysfs used simple incrementing allocator which is not guaranteed to be
unique.  This patch makes sysfs use ida to give each sd a unique and
packed inode number.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:03 -07:00
Tejun Heo fa7f912ad4 sysfs: move release_sysfs_dirent() to dir.c
There is no reason this function should be inlined and soon to follow
sysfs object reference simplification will make it heavier.  Move it
to dir.c.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:03 -07:00
Jan Kara cfc94cdf8e debugfs: add rename for debugfs files
Implement debugfs_rename() to allow renaming files/directories in debugfs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:00 -07:00
Frank Filz 137d6acaa6 NFSv4: Make sure unlock is really an unlock when cancelling a lock
I ran into a curious issue when a lock is being canceled. The
cancellation results in a lock request to the vfs layer instead of an
unlock request. This is particularly insidious when the process that
owns the lock is exiting. In that case, sometimes the erroneous lock is
applied AFTER the process has entered zombie state, preventing the lock
from ever being released. Eventually other processes block on the lock
causing a slow degredation of the system. In the 2.6.16 kernel this was
investigated on, the problem is compounded by the fact that the cl_sem
is held while blocking on the vfs lock, which results in most processes
accessing the nfs file system in question hanging.

In more detail, here is how the situation occurs:

first _nfs4_do_setlk():

static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *fl, int reclaim)
...
        ret = nfs4_wait_for_completion_rpc_task(task);
        if (ret == 0) {
...
        } else
                data->cancelled = 1;

then nfs4_lock_release():

static void nfs4_lock_release(void *calldata)
...
        if (data->cancelled != 0) {
                struct rpc_task *task;
                task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp,
                                data->arg.lock_seqid);

The problem is the same file_lock that was passed in to _nfs4_do_setlk()
gets passed to nfs4_do_unlck() from nfs4_lock_release(). So the type is
still F_RDLCK or FWRLCK, not F_UNLCK. At some point, when cancelling the
lock, the type needs to be changed to F_UNLCK. It seemed easiest to do
that in nfs4_do_unlck(), but it could be done in nfs4_lock_release().
The concern I had with doing it there was if something still needed the
original file_lock, though it turns out the original file_lock still
needs to be modified by nfs4_do_unlck() because nfs4_do_unlck() uses the
original file_lock to pass to the vfs layer, and a copy of the original
file_lock for the RPC request.

It seems like the simplest solution is to force all situations where
nfs4_do_unlck() is being used to result in an unlock, so with that in
mind, I made the following change:

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen c98451bdb2 NLM: fix source address of callback to client
Use the destination address of the original NLM request as the
source address in callbacks to the client.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Trond Myklebust 6f2e64d3e1 NFSv4: Make the NFS state model work with the nosharedcache mount option
Consider the case where the user has mounted the remote filesystem
server:/foo on the two local directories /bar and /baz using the
nosharedcache mount option. The files /bar/file and /baz/file are
represented by different inodes in the local namespace, but refer to the
same file /foo/file on the server.
Consider the case where a process opens both /bar/file and /baz/file, then
closes /bar/file: because the nfs4_state is not shared between /bar/file
and /baz/file, the kernel will see that the nfs4_state for /bar/file is no
longer referenced, so it will send off a CLOSE rpc call. Unless the
open_owners differ, then that CLOSE call will invalidate the open state on
/baz/file too.

Conclusion: we cannot share open state owners between two different
non-shared mount instances of the same filesystem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Trond Myklebust 275a5d24bf NFS: Error when mounting the same filesystem with different options
Unless the user sets the NFS_MOUNT_NOSHAREDCACHE mount flag, we should
return EBUSY if the filesystem is already mounted on a superblock that
has set conflicting mount options.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Trond Myklebust 75180df2ed NFS: Add the mount option "nosharecache"
Prior to David Howell's mount changes in 2.6.18, users who mounted
different directories which happened to be from the same filesystem on the
server would get different super blocks, and hence could choose different
mount options. As long as there were no hard linked files that crossed from
one subtree to another, this was quite safe.
Post the changes, if the two directories are on the same filesystem (have
the same 'fsid'), they will share the same super block, and hence the same
mount options.

Add a flag to allow users to elect not to share the NFS super block with
another mount point, even if the fsids are the same. This will allow
users to set different mount options for the two different super blocks, as
was previously possible. It is still up to the user to ensure that there
are no cache coherency issues when doing this, however the default
behaviour will be to share super blocks whenever two paths result in
the same fsid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever 8007122520 NFS: Add support for mounting NFSv4 file systems with string options
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever 136d558ce7 NFS: Add final pieces to support in-kernel mount option parsing
Hook in final components required for supporting in-kernel mount option
parsing for NFSv2 and NFSv3 mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever 0076d7b7ba NFS: Introduce generic mount client API
For NFSv2 and v3 mounts, the first step is to contact the server's MOUNTD
and request the file handle for the root of the mounted share.  Add a
function to the NFS client that handles this operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever bf0fd7680f NFS: Add enums and match tables for mount option parsing
This generic infrastructure works for both NFS and NFSv4 mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever 013a8c1ab5 NFS: Improve debugging output in NFS in-kernel mount client
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever 19207231c9 NFS: Clean up in-kernel NFS mount
Clean up white space and coding conventions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever 3ea97309e6 NFS: Remake nfsroot_mount as a permanent part of NFS client
In preparation for supporting NFSv2 and NFSv3 mount option handling in the
kernel NFS client, convert mount_clnt.c to be a permanent part of the NFS
client, instead of built only when CONFIG_ROOT_NFS is enabled.

In addition, we also replace the "struct sockaddr_in *" argument with
something more generic, to help support IPv6 at some later point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever 43780b87fa SUNRPC: Add a convenient default for the hostname when calling rpc_create()
A couple of callers just use a stringified IP address for the rpc client's
hostname.  Move the logic for constructing this into rpc_create(), so it can
be shared.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever cce63cd637 SUNRPC: Rename rpcb_getport_external routine
In preparation for handling NFS mount option parsing in the kernel,
rename rpcb_getport_external as rpcb_get_port_sync, and make it available
always (instead of only when CONFIG_ROOT_NFS is enabled).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever f0768ebd09 NFS: Introduce nfs4_validate_mount_options
Refactor NFSv4 mount processing to break out mount data validation
in the same way it's broken out in the NFSv2/v3 mount path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever 5df36e78da NFS: Clean up nfs_validate_mount_data
Move error handling code out of the main code path.  The switch statement
was also improperly indented, according to Documentation/CodingStyle.  This
prepares nfs_validate_mount_data for the addition of option string parsing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever fc50d58fd0 NFS: Clean-up: Refactor IP address sanity checks in NFS client
NFS and NFSv4 mounts can now share server address sanity checking.  And, it
provides an easy mechanism for adding IPv6 address checking at some later
point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever 4d81cd1611 NFS: Clean-up: fix a compiler warning in fs/nfs/super.c
/home/cel/linux/fs/nfs/super.c: In function 'nfs_pseudoflavour_to_name':
/home/cel/linux/fs/nfs/super.c:270: warning: comparison between signed and unsigned

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever 0655960f76 NFS: Clean up error handling in nfs_get_sb
The error return logic in nfs_get_sb now matches nfs4_get_sb, and is more maintainable.
A subsequent patch will take advantage of this simplification.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever 29eb981a3b NFS: Clean-up: Replace nfs_copy_user_string with strndup_user
The new string utility function strndup_user can be used instead of
nfs_copy_user_string, eliminating an unnecessary duplication of function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever 5680d48be8 NFS: Clean-up: Define macros for maximum host and export path name lengths
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever 9eaa67c6a5 NFS: Clean-up: use correct type when converting NFS blocks to local blocks
inode->i_blocks is a blkcnt_t these days, which can be a u64 or unsigned
long, depending on the setting of CONFIG_LSF.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Trond Myklebust 8bda4e4c98 NFSv4: Fix up stateid locking...
We really don't need to grab both the state->so_owner and the
inode->i_lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust 1ac7e2fd35 NFSv4: Clean up the callers of nfs4_open_recover_helper()
Rely on nfs4_try_open_cached() when appropriate.

Also fix an RCU violation in _nfs4_do_open_reclaim()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust 6ee4126890 NFSv4: Don't call OPEN if we already have an open stateid for a file
If we already have a stateid with the correct open mode for a given file,
then we can reuse that stateid instead of re-issuing an OPEN call without
violating the close-to-open caching semantics.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust aac00a8d0a NFSv4: Check for the existence of a delegation in nfs4_open_prepare()
We should not be calling open() on an inode that has a delegation unless
we're doing a reclaim.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust 3e309914a1 NFSv4: Clean up _nfs4_proc_open()
Use a flag instead of the 'data->rpc_status = -ENOMEM hack.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:42 -04:00
Trond Myklebust 1b370bc28f NFSv4: Allow nfs4_opendata_to_nfs4_state to return errors.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:42 -04:00
Trond Myklebust 6f43ddccb3 NFSv4: Improve the debugging of bad sequence id errors...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:42 -04:00
Trond Myklebust 003707c722 NFSv4: Always use the delegation if we have one
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust 0f9f95e0ad NFSv4: Clean up confirmation of sequence ids...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust 412c77cee6 NFSv4: Defer inode revalidation when setting up a delegation
Currently we force a synchronous call to __nfs_revalidate_inode() in
nfs_inode_set_delegation(). This not only ensures that we cannot call
nfs_inode_set_delegation from an asynchronous context, but it also slows
down any call to open().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust 8383e4602c NFSv4: Use RCU to protect delegations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust 13437e12fb NFSv4: Support recalling delegations by stateid part 2
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust 9016302784 NFSv4: Support recalling delegations by stateid
There appear to be some rogue servers out there that issue multiple
delegations with different stateids for the same file. Ensure that when we
return delegations, we do so on a per-stateid basis rather than a per-file
basis.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust 2ced46c270 NFSv4: Fix up a bug in nfs4_open_recover()
Don't clobber the delegation info...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust 549d6ed5e8 NFSv4: set the delegation in nfs4_opendata_to_nfs4_state
This ensures that nfs4_open_release() and nfs4_open_confirm_release()
can now handle an eventual delegation that was returned with out open.
As such, it fixes a delegation "leak" when the user breaks out of an open
call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust 1c816efa24 NFSv4: Fix a bug in __nfs4_find_state_byowner
The test for state->state == 0 does not tell you that the stateid is in the
process of being freed. It really tells you that the stateid is not yet
initialised...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust 1b45c46cf7 NFSv4: Fix atomic open for execute...
Currently we do not check for the FMODE_EXEC flag as we should. For that
particular case, we need to perform an ACCESS call to the server in order
to check that the file is executable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust 9f958ab885 NFSv4: Reduce the chances of an open_owner identifier collision
Currently we just use a 32-bit counter.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust 88d9093997 NFSv4: nfs_increment_open_seqid should not return a value
It is a void function...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust e6889620e8 NFSv4: Fix underestimate of NFSv4 lookup request size
Also fix up the underestimate of fs_locations

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust 2cebf82883 NFSv4: Fix the underestimate of NFSv4 open request size
The maximum size depends on the filename size and a number of other
elements which are currently not being counted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust bd625ba80d NFSv4: Fix the NFSv4 owner and owner_group size estimates
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust 7af654f8d1 NFSv4: Don't reuse expired nfs4_state_owner structs
That just confuses certain NFSv4 servers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust 27b3f949b7 NFSv4: Fix a credential reference leak in nfs4_get_state_owner()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust 587142f85f NFS: Replace NFS_I(inode)->req_lock with inode->i_lock
There is no justification for keeping a special spinlock for the exclusive
use of the NFS writeback code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust 4e56e082dd NFSv4: Clean up _nfs4_proc_lookup() vs _nfs4_proc_lookupfh()
They differ only slightly in the arguments they take. Why have they not
been merged?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust 1be27f3660 SUNRPC: Remove the tk_auth macro...
We should almost always be deferencing the rpc_auth struct by means of the
credential's cr_auth field instead of the rpc_clnt->cl_auth anyway. Fix up
that historical mistake, and remove the macro that propagated it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:37 -04:00
Trond Myklebust f61534dfd3 SUNRPC: Remove redundant calls to rpciod_up()/rpciod_down()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:30 -04:00
Trond Myklebust 90c5755ff5 SUNRPC: Kill rpc_clnt->cl_oneshot
Replace it with explicit calls to rpc_shutdown_client() or
rpc_destroy_client() (for the case of asynchronous calls).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:29 -04:00
Trond Myklebust 34f52e3591 SUNRPC: Convert rpc_clnt->cl_users to a kref
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:28 -04:00
Trond Myklebust c6d00e639b NFSv4: Convert struct nfs4_opendata to use struct kref
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:28 -04:00
Trond Myklebust 3bec63db55 NFS: Convert struct nfs_open_context to use a kref
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:27 -04:00
Trond Myklebust edc05fc1c2 NFS: reduce latency by using conditional rescheduling in nfs_scan_list
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:27 -04:00
Trond Myklebust dce34ce298 NFS: Prevent integer overflow in nfs_scan_list()
Also ensure that nfs_inode ncommit and npages are large enough to represent
all possible values for the number of pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:27 -04:00
Trond Myklebust 2aefa10431 NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inode
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust 5c36968343 NFS cleanup: speed up nfs_scan_commit using radix tree tags
Add a tag for requests that are waiting for a COMMIT

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust 9fd367f0f3 NFS cleanup: Rename NFS_PAGE_TAG_WRITEBACK to NFS_PAGE_TAG_LOCKED
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust c03b402461 NFS: Convert struct nfs_page to use krefs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust a50f7951a3 NFS: Fix an Oops in the nfs_access_cache_shrinker()
The nfs_access_cache_shrinker may race with nfs_access_zap_cache().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust e2f032e9ef NFS: nfs3_proc_create() should use nfs_post_op_update_inode()
Also get rid of a redundant call to nfs_setattr_update_inode(). The call to
nfs3_proc_setattr() already takes care of that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton aa53ed541a NFS4: on a O_EXCL OPEN make sure SETATTR sets the fields holding the verifier
The Linux NFS4 client simply skips over the bitmask in an O_EXCL open
call and so it doesn't bother to reset any fields that may be holding
the verifier. This patch has us save the first two words of the bitmask
(which is all the current client has #defines for). The client then
later checks this bitmask and turns on the appropriate flags in the
sattr->ia_verify field for the following SETATTR call.

This patch only currently checks to see if the server used the atime
and mtime slots for the verifier (which is what the Linux server uses
for this). I'm not sure of what other fields the server could
reasonably use, but adding checks for others should be trivial.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust fc6ae3cf48 NFS: Re-enable forced umounts
They disappeared some time around 2.6.18.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00