Commit Graph

14609 Commits (8516a500029890a72622d245f8ed32c4e30969b7)

Author SHA1 Message Date
Jan Kara cb25797d45 ocfs2: Add lockdep annotations
Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:34:26 -07:00
Jan Kara 9a7aa12f39 vfs: Set special lockdep map for dirs only if not set by fs
Some filesystems need to set lockdep map for i_mutex differently for
different directories. For example OCFS2 has system directories (for
orphan inode tracking and for gathering all system files like journal
or quota files into a single place) which have different locking
locking rules than standard directories. For a filesystem setting
lockdep map is naturaly done when the inode is read but we have to
modify unlock_new_inode() not to overwrite the lockdep map the filesystem
has set.

Acked-by: peterz@infradead.org
CC: mingo@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:34:22 -07:00
Sunil Mushran df152c241d ocfs2: Disable orphan scanning for local and hard-ro mounts
Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:55 -07:00
Sunil Mushran 3211949f89 ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()
We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one.  This starts a normal orphan scan
cycle.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:53 -07:00
Sunil Mushran 692684e19e ocfs2: Stop orphan scan as early as possible during umount
Currently if the orphan scan fires a tick before the user issues the umount,
the umount will wait for the queued orphan scan tasks to complete.

This patch makes the umount stop the orphan scan as early as possible so as
to reduce the probability of the queued tasks slowing down the umount.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:51 -07:00
Sunil Mushran c3d38840ab ocfs2: Fix ocfs2_osb_dump()
Skip printing information that is not valid for local mounts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:49 -07:00
Sunil Mushran 94e41ecfe0 ocfs2: Pin journal head before accessing jh->b_committed_data
This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses
to jh->b_committed_data.

Fixes oss bugzilla#1131
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:47 -07:00
Tao Ma 1962f39abb ocfs2: Update atime in splice read if necessary.
We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock
in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so
that we can update atime in splice read if necessary.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22 14:24:45 -07:00
Joel Becker 1c520dfbf3 ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.
The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state.  Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state.  This is not a standard behavior, but ocfs2 has
always relied on it.  Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue.  When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB).  The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag.  As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid().  It returns non-zero when
the LVB is valid.  o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
2009-06-22 14:24:30 -07:00
Linus Torvalds 7e0338c0de Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
  SUNRPC: Fix the TCP server's send buffer accounting
  nfsd41: Backchannel: minorversion support for the back channel
  nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
  nfsd41: Remove ip address collision detection case
  nfsd: optimise the starting of zero threads when none are running.
  nfsd: don't take nfsd_mutex twice when setting number of threads.
  nfsd41: sanity check client drc maxreqs
  nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
  NFS: kill off complicated macro 'PROC'
  sunrpc: potential memory leak in function rdma_read_xdr
  nfsd: minor nfsd_vfs_write cleanup
  nfsd: Pull write-gathering code out of nfsd_vfs_write
  nfsd: track last inode only in use_wgather case
  sunrpc: align cache_clean work's timer
  nfsd: Use write gathering only with NFSv2
  NFSv4: kill off complicated macro 'PROC'
  NFSv4: do exact check about attribute specified
  knfsd: remove unreported filehandle stats counters
  knfsd: fix reply cache memory corruption
  knfsd: reply cache cleanups
  ...
2009-06-22 12:55:50 -07:00
Linus Torvalds df36b439c5 Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (128 commits)
  nfs41: sunrpc: xprt_alloc_bc_request() should not use spin_lock_bh()
  nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
  nfs: remove unnecessary NFS_INO_INVALID_ACL checks
  NFS: More "sloppy" parsing problems
  NFS: Invalid mount option values should always fail, even with "sloppy"
  NFS: Remove unused XDR decoder functions
  NFS: Update MNT and MNT3 reply decoding functions
  NFS: add XDR decoder for mountd version 3 auth-flavor lists
  NFS: add new file handle decoders to in-kernel mountd client
  NFS: Add separate mountd status code decoders for each mountd version
  NFS: remove unused function in fs/nfs/mount_clnt.c
  NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument
  NFS: Clean up MNT program definitions
  lockd: Don't bother with RPC ping for NSM upcalls
  lockd: Update NSM state from SM_MON replies
  NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available
  NFS: Return error code from nfs_callback_up() to user space
  NFS: Do not display the setting of the "intr" mount option
  NFS: add support for splice writes
  nfs41: Backchannel: CB_SEQUENCE validation
  ...
2009-06-22 12:53:06 -07:00
Hitoshi Mitake e38be994b9 Making fs/minix/minix.h double including safe
I happened to find that fs/minix/minix.h doesn't guard double include.

Yes, I know this never cause something destructive because this is
self-evidence that no source file includes minix.h twice, but I think
fixing this is better than disregarding it.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 11:34:42 -07:00
Benny Halevy 578e458568 nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
nfs4_open_recover_helper clears opendata->o_res
before calling nfs4_init_opendata_res, thus causing
NFSv4.0 OPEN operations to be sent rather than nfsv4.1.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-20 14:55:12 -04:00
Linus Torvalds 12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
OGAWA Hirofumi 3e107603ae fat: Fix the removal of opts->fs_dmask
(ce3b0f8d5c2203301fc87f3aaaed73e5819e2a48: New helper - current_umask())
is removing the opts->fs_dmask, probably it's a cut-and-paste
miss or something.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2009-06-20 21:50:47 +09:00
Linus Torvalds bee89ab228 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: inotify_destroy_mark_entry could get called twice
2009-06-19 17:46:44 -07:00
Linus Torvalds 31583d6acf Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Fix kernel-doc parameter name typo in blk-settings.c:
  block: rename CONFIG_LBD to CONFIG_LBDAF
  block: Fix bounce_pfn setting
  hd: stop defining MAJOR_NR
2009-06-19 17:43:04 -07:00
Eric Paris 528da3e9e2 inotify: inotify_destroy_mark_entry could get called twice
inotify_destroy_mark_entry could get called twice for the same mark since it
is called directly in inotify_rm_watch and when the mark is being destroyed for
another reason.  As an example assume that the file being watched was just
deleted so inotify_destroy_mark_entry would get called from the path
fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() ->
fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry().  If this
happened at the same time as userspace tried to remove a watch via
inotify_rm_watch we could attempt to remove the mark from the idr twice and
could thus double dec the ref cnt and potentially could be in a use after
free/double free situation.  The fix is to have inotify_rm_watch use the
generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the
inotify_destroy_mark_entry() function can only be called one.

This patch also renames the function to inotify_ingored_remove_idr() so it is
clear what is actually going on in the function.

Hopefully this fixes:
[   20.342058] idr_remove called for id=20 which is not allocated.
[   20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077
[   20.353933] Call Trace:
[   20.356410]  [<ffffffff811a82b7>] idr_remove+0x115/0x18f
[   20.361737]  [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75
[   20.367061]  [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf
[   20.373771]  [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf
[   20.380306]  [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10
[   20.386238]  [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170
[   20.393293]  [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf
[   20.399829]  [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6
[   20.405850]  [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Peter Ziljlstra <peterz@infradead.org>
2009-06-19 12:42:48 -04:00
Bartlomiej Zolnierkiewicz 90c699a9ee block: rename CONFIG_LBD to CONFIG_LBDAF
Follow-up to "block: enable by default support for large devices
and files on 32-bit archs".

Rename CONFIG_LBD to CONFIG_LBDAF to:
- allow update of existing [def]configs for "default y" change
- reflect that it is used also for large files support nowadays

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-19 08:08:50 +02:00
Andy Adamson ab52ae6db0 nfsd41: Backchannel: minorversion support for the back channel
Prepare to share backchannel code with NFSv4.1.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: use nfsd4_cb_sequence for callback minorversion]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 18:33:57 -07:00
Andy Adamson ef52bff840 nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
Mimic the client and prepare to share the back channel xdr with NFSv4.1.
Bump the number of operations in each encode routine, then backfill the
number of operations.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 18:33:57 -07:00
Trond Myklebust 1f84603c09 Merge branch 'devel-for-2.6.31' into for-2.6.31
Conflicts:
	fs/nfs/client.c
	fs/nfs/super.c
2009-06-18 18:13:44 -07:00
Mike Sager 6ddbbbfe52 nfsd41: Remove ip address collision detection case
Verified that cthon and pynfs exchange id tests pass (except for the
two expected fails: EID8 and EID50)

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-06-18 17:43:53 -07:00
Linus Torvalds 0732f87761 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: clean up jbd2_journal_try_to_free_buffers()
  ext4: Don't update ctime for non-extent-mapped inodes
  ext4: Fix up whitespace issues in fs/ext4/inode.c
  ext4: Fix 64-bit block type problem on 32-bit platforms
  ext4: teach the inode allocator to use a goal inode number
  ext4: Use a hash of the topdir directory name for the Orlov parent group
  ext4: document the "abort" mount option
  ext4: move the abort flag from s_mount_opts to s_mount_flags
  ext4: update the s_last_mounted field in the superblock
  ext4: change s_mount_opt to be an unsigned int
  ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl
  ext4: avoid unnecessary spinlock in critical POSIX ACL path
  ext3: avoid unnecessary spinlock in critical POSIX ACL path
  ext4: convert instrumentation from markers to tracepoints
  jbd2: convert instrumentation from markers to tracepoints
2009-06-18 14:07:46 -07:00
Peter Oberparleiter 0b923606e7 seq_file: add function to write binary data
seq_write() can be used to construct seq_files containing arbitrary data.
Required by the gcov-profiling interface to synthesize binary profiling
data files.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:57 -07:00
Oleg Nesterov 3b34fc5880 elf_core_dump: use rcu_read_lock() to access ->real_parent
In theory it is not safe to dereference ->parent/real_parent without
tasklist or rcu lock, we can race with re-parenting.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:52 -07:00
Jeff Mahoney 1d965fe0eb reiserfs: fix warnings with gcc 4.4
Several code paths in reiserfs have a construct like:

 if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ...

which, in addition to being ugly, end up causing compiler warnings with
gcc 4.4.0.  Previous compilers didn't issue a warning.

fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined
fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined
fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined
fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined

I believe this is due to the ih being passed to macros which evaluate the
argument more than once.  This is old code and we haven't seen any
problems with it, but this patch eliminates the warnings.

It converts the multiple evaluation macros to static inlines and does a
preassignment for the cases that were causing the warnings because that
code is just ugly.

Reported-by: Chris Mason <mason@oracle.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:46 -07:00
Roel Kluin 37044c86ba ufs: sector_t cannot be negative
unsigned i_block,fragment cannot be negative.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:46 -07:00
Jan Kara 5404ac8e44 isofs: cleanup mount option processing
Remove unused variables from isofs_sb_info (used to be some mount
options), unify variables for option to use 0/1 (some options used
'y'/'n'), use bit fields for option flags in superblock.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Jan Kara 5c4a656b7e isofs: fix setting of uid and gid to 0
isofs allows setting of default uid and gid of files but value 0 was used
to indicate that user did not specify any uid/gid mount option.  Since
this option also overrides uid/gid set in Rock Ridge extension, it makes
sense to allow forcing uid/gid 0.  Fix option processing to allow this.

Cc: <Hans-Joachim.Baader@cjt.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Jan Kara 52b680c812 isofs: let mode and dmode mount options override rock ridge mode setting
So far, permissions set via 'mode' and/or 'dmode' mount options were
effective only if the medium had no rock ridge extensions (or was mounted
without them).  Add 'overriderockmode' mount option to indicate that these
options should override permissions set in rock ridge extensions.  Maybe
this should be default but the current behavior is there since mount
options were created so I think we should not change how they behave.

Cc: <Hans-Joachim.Baader@cjt.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Jan Kara ef43618a47 ext3: make sure inode is deleted from orphan list after truncate
As Ted pointed out, it can happen that ext3_truncate() returns without
removing inode from orphan list.  This way we could in some rare cases
(like when we get ENOMEM from an allocation in ext3_truncate called
because of failed ext3_write_begin) leave the inode on orphan list and
that triggers assertion failure on umount.

So make ext3_truncate() always remove inode from in-memory orphan list.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Hisashi Hifumi 6f3f1cb21f jbd: clean up journal_try_to_free_buffers()
I delete the following patch
"commit 3f31fddfa2
Author: Mingming Cao <cmm@us.ibm.com>
Date:   Fri Jul 25 01:46:22 2008 -0700

    jbd: fix race between free buffer and commit transaction

This patch is no longer needed because if race between freeing buffer and
committing transaction functionality occurs and dio gets error, currently
dio falls back to buffered IO by the following patch.

	commit 6ccfa806a9
	Author: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
	Date:   Tue Sep 2 14:35:40 2008 -0700

   	VFS: fix dio write returning EIO when try_to_release_page fails

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Mingming Cao <cmm@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Jan Kara e8ef7aaea7 ext3: fix chain verification in ext3_get_blocks()
Chain verification in ext3_get_blocks() has been hosed since it called
verify_chain(chain, NULL) which always returns success.  As a result
readers could in theory race with truncate.  On the other hand the race
probably cannot happen with the current locking scheme, since by the
time ext3_truncate() is called all the pages are already removed and
hence get_block() shouldn't be called on such pages...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Jan Kara 39fe7557b4 ext2: Do not update mtime of a moved directory
One of our users is complaining that his backup tool is upset on ext2
(while it's happy on ext3, xfs, ...) because of the mtime change.

The problem is:

    mkdir foo
    mkdir bar
    mkdir foo/a

Now under ext2:
    mv foo/a foo/b

changes mtime of 'foo/a' (foo/b after the move).  That does not really
make sense and it does not happen under any other filesystem I've seen.

More complicated is:
    mv foo/a bar/a

This changes mtime of foo/a (bar/a after the move) and it makes some
sense since we had to update parent directory pointer of foo/a.  But
again, no other filesystem does this.  So after some thoughts I'd vote
for consistency and change ext2 to behave the same as other filesystems.

Do not update mtime of a moved directory.  Specs don't say anything
about it (neither that it should, nor that it should not be updated) and
other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do
it.  So let's become more consistent.

Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel.

Reported-by: <ronny.pretzsch@dfs.de>
Cc: <hare@suse.de>
Cc: Jörn Engel <joern@logfs.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:44 -07:00
Cyrill Gorcunov 2f6d311080 proc: vmcore - use kzalloc in get_new_element()
Instead of kmalloc+memset better use straight kzalloc

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:41 -07:00
Michal Simek bcac2b1b7d procfs: remove sparse errors in proc_devtree.c
CHECK   fs/proc/proc_devtree.c
fs/proc/proc_devtree.c:197:14: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:203:34: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:210:14: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:223:26: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:226:14: warning: Using plain integer as NULL pointer

Signed-off-by: Michal Simek <monstr@monstr.eu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:41 -07:00
Davide Libenzi 3fe4a975d6 epoll: fix nested calls support
This fixes a regression in 2.6.30.

I unfortunately accepted a patch time ago, to drop the "current" usage
from possible IRQ context, w/out proper thought over it.  The patch
switched to using the CPU id by bounding the nested call callback with a
get_cpu()/put_cpu().

Unfortunately the ep_call_nested() function can be called with a callback
that grabs sleepy locks (from own f_op->poll()), that results in epic
fails.  The following patch uses the proper "context" depending on the
path where it is called, and on the kind of callback.

This has been reported by Stefan Richter, that has also verified the patch
is his previously failing environment.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:41 -07:00
Keika Kobayashi d3d64df21d proc: export statistics for softirq to /proc
Export statistics for softirq in /proc/softirqs and /proc/stat.

1. /proc/softirqs
Implement /proc/softirqs which shows the number of softirq
for each CPU like /proc/interrupts.

2. /proc/stat
Add the "softirq" line to /proc/stat.
This line shows the number of softirq for all cpu.
The first column is the total of all softirqs and
each subsequent column is the total for particular softirq.

[kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop]
Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:41 -07:00
NeilBrown 671e1fcf63 nfsd: optimise the starting of zero threads when none are running.
Currently, if we ask to set then number of nfsd threads to zero when
there are none running, we set up all the sockets and register the
service, and then tear it all down again.
This is pointless.

So detect that case and exit promptly.
(also remove an assignment to 'error' which was never used.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
2009-06-18 09:42:41 -07:00
NeilBrown 82e12fe924 nfsd: don't take nfsd_mutex twice when setting number of threads.
Currently when we write a number to 'threads' in nfsdfs,
we take the nfsd_mutex, update the number of threads, then take the
mutex again to read the number of threads.

Mostly this isn't a big deal.  However if we are write '0', and
portmap happens to be dead, then we can get unpredictable behaviour.
If the nfsd threads all got killed quickly and the last thread is
waiting for portmap to respond, then the second time we take the mutex
we will block waiting for the last thread.
However if the nfsd threads didn't die quite that fast, then there
will be no contention when we try to take the mutex again.

Unpredictability isn't fun, and waiting for the last thread to exit is
pointless, so avoid taking the lock twice.
To achieve this, get nfsd_svc return a non-negative number of active
threads when not returning a negative error.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 09:40:31 -07:00
Peter Zijlstra d3a9262e59 fs: Provide empty .set_page_dirty() aop for anon inodes
.set_page_dirty() is one of those a_ops that defaults to the
buffer implementation when not set. Therefore provide a dummy
function to make it do nothing.

(Uncovered by perfcounters fd's which can now be writable-mmap-ed.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 14:46:10 +02:00
Jan Kara 24a5d59f34 udf: Use device size when drive reported bogus number of written blocks
Some drives report 0 as the number of written blocks when there are some blocks
recorded. Use device size in such case so that we can automagically mount such
media.

Signed-off-by: Jan Kara <jack@suse.cz>
2009-06-18 12:33:16 +02:00
James Morris 4bf259e3ae nfs: remove unnecessary NFS_INO_INVALID_ACL checks
Unless I'm mistaken, NFS_INO_INVALID_ACL is being checked twice during
getacl calls (i.e. first via nfs_revalidate_inode() and then by each all
site).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:14 -07:00
Chuck Lever a5a16bae70 NFS: More "sloppy" parsing problems
Specifying "port=-5" with the kernel's current mount option parser
generates "unrecognized mount option".  If "sloppy" is set, this
causes the mount to succeed and use the default values; the desired
behavior is that, since this is a valid option with an invalid value,
the mount should fail, even with "sloppy."

To properly handle "sloppy" parsing, we need to distinguish between
correct options with invalid values, and incorrect options.  We will
need to parse integer values by hand, therefore, and not rely on
match_token().

For instance, these must all fail with "invalid value":

	port=12345678
	port=-5
	port=samuel

and not with "unrecognized option," as they do currently.

Thus, for the sake of match_token() we need to treat the values for
these options as strings, and do the conversion to integers using
strict_strtol().

This is basically the same solution we used for the earlier "retry="
fix (commit ecbb3845), except in this case the kernel actually has to
parse the value, rather than ignore it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:14 -07:00
Chuck Lever d23c45fd84 NFS: Invalid mount option values should always fail, even with "sloppy"
Ian Kent reports:

"I've noticed a couple of other regressions with the options vers
and proto option of mount.nfs(8).

The commands:

mount -t nfs -o vers=<invalid version> <server>:/<path> /<mountpoint>
mount -t nfs -o proto=<invalid proto> <server>:/<path> /<mountpoint>

both immediately fail.

But if the "-s" option is also used they both succeed with the
mount falling back to defaults (by the look of it).

In the past these failed even when the sloppy option was given, as
I think they should. I believe the sloppy option is meant to allow
the mount command to still function for mount options (for example
in shared autofs maps) that exist on other Unix implementations but
aren't present in the Linux mount.nfs(8). So, an invalid value
specified for a known mount option is different to an unknown mount
option and should fail appropriately."

See RH bugzilla 486266.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:13 -07:00
Chuck Lever 065015e5ef NFS: Remove unused XDR decoder functions
Clean up: Remove xdr_decode_fhstatus() and xdr_decode_fhstatus3(), now
that they are unused.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:13 -07:00
Chuck Lever 8e02f6b9aa NFS: Update MNT and MNT3 reply decoding functions
Solder xdr_stream-based XDR decoding functions into the in-kernel mountd
client that are more careful about checking data types and watching for
buffer overflows.  The new MNT3 decoder includes support for auth-flavor
list decoding.

The "_sz" macro for MNT3 replies was missing the size of the file handle.
I've added this back, and included the size of the auth flavor array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:13 -07:00
Chuck Lever a14017db28 NFS: add XDR decoder for mountd version 3 auth-flavor lists
Introduce an xdr_stream-based XDR decoder that can unpack the auth-
flavor list returned in a MNT3 reply.

The nfs_mount() function's caller allocates an array, and passes the
size and a pointer to it.  The decoder decodes all the flavors it can
into the array, and returns the number of decoded flavors.

If the caller is not interested in the auth flavors, it can pass a
value of zero as the size of the pre-allocated array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:12 -07:00
Chuck Lever 4fdcd9966d NFS: add new file handle decoders to in-kernel mountd client
Introduce xdr_stream-based XDR file handle decoders to the in-kernel
mountd client.  These are more careful than the existing decoder
functions about buffer overflows and data type and range checking.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:12 -07:00
Chuck Lever fb12529577 NFS: Add separate mountd status code decoders for each mountd version
Introduce data structures and xdr_stream-based decoding functions for
unmarshalling mountd status codes properly.

Mountd version 3 uses specific standard error return codes that are
not errno values and not NFS3ERR_ values.  These have a well-defined
standard mapping to local errno values.  Introduce data structures
and a decoder function that map these status codes to local errno
values properly.  This is new functionality (but not used yet).

Version 1 mountd status values are defined by RFC 1094 as UNIX error
values (errno values).  Errno values on heterogeneous systems do not
necessarily match each other.  To avoid exposing possibly incorrect
errno values to upper layers, the current XDR decoder converts all
non-zero MNT version 1 status codes to -EACCES.

The OpenGroup XNFS standard provides a mapping similar to but smaller
than the version 3 error codes.  Implement a decoder that uses the XNFS
error codes, replacing the current decoder.

For both mountd protocol versions, map unrecognized errors to -EACCES.

Finally we introduce a replacement data structure for mnt_fhstatus
at this time, which is used by the new XDR decoders.  In addition to
documenting that the status value returned by the XDR decoders is
always an errno, this new structure will be expanded in subsequent
patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:12 -07:00
Chuck Lever 99835db430 NFS: remove unused function in fs/nfs/mount_clnt.c
Clean up: remove xdr_encode_dirpath() now that it has been replaced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever 29a1bd6bf8 NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument
Check the length of the supplied dirpath, and see that it fits
properly in the RPC buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever 2ad780978b NFS: Clean up MNT program definitions
Clean up:  Relocate MNT program procedure number definitions to the
only file that uses them.  Relocate the version number definitions,
which are shared, to nfs.h.  Remove duplicate program number
definitions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever 0e5c2632e1 lockd: Don't bother with RPC ping for NSM upcalls
Cut NSM upcall RPC traffic in half -- don't do a NULL call first.
The cases where a ping would be helpful are rare.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever 6c9dc42551 lockd: Update NSM state from SM_MON replies
When rpc.statd starts up in user space at boot time, it attempts to
write the latest NSM local state number into
/proc/sys/fs/nfs/nsm_local_state.

If lockd.ko isn't loaded yet (as is the case in most configurations),
that file doesn't exist, thus the kernel's NSM state remains set to
its initial value of zero during lockd operation.

This is a problem because rpc.statd and lockd use the NSM state number
to prevent repeated lock recovery on rebooted hosts.  If lockd sends
a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state
number is received, there is no way for lockd or rpc.statd to
distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
recovery could be performed after the rebooted host has already
started reclaiming locks, and those locks will be lost.

We could change /etc/init.d/nfslock so it always modprobes lockd.ko
before starting rpc.statd.  However, if lockd.ko is ever unloaded
and reloaded, we are back at square one, since the NSM state is not
preserved across an unload/reload cycle.  This may happen frequently
on clients that use automounter.  A period of NFS inactivity causes
lockd.ko to be unloaded, and the kernel loses its NSM state setting.

Instead, let's use the fact that rpc.statd plants the local system's
NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
synchronous SM_MON upcall to the local rpc.statd _before_ sending its
first NLM request to a new remote.  This would permit rpc.statd to
provide the current NSM state to lockd, even after lockd.ko had been
unloaded and reloaded.

Note that NLMPROC_LOCK arguments are constructed before the
nsm_monitor() call, so we have to rearrange argument construction very
slightly to make this all work out.

And, the kernel appears to treat NSM state as a u32 (see struct
nlm_args and nsm_res).  Make nsm_local_state a u32 as well, to ensure
we don't get bogus comparison results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:10 -07:00
Chuck Lever 18fc316419 NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available
Clear "ret" if the error return from svc_create_xprt(AF_INET6) was
-EAFNOSUPORT.  Otherwise, callback start-up will succeed, but
nfs_callback_up() will return -EAFNOSUPPORT anyway, and the first
NFSv4 mount attempt after a reboot will fail.

Bug introduced by commit f738f517 in 2.6.30-rc1.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:10 -07:00
Chuck Lever a21bdd9b96 NFS: Return error code from nfs_callback_up() to user space
If the kernel cannot start the NFSv4 callback service during a mount
request, it returns -ENOMEM to user space, resulting in this message:

   mount.nfs4: Cannot allocate memory

Adjust nfs_alloc_client() and nfs_get_client() to pass NFSv4 callback
start-up errors back to user space so a less mysterious error message
can be displayed by the mount command.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:10 -07:00
Chuck Lever c381ad2cf2 NFS: Do not display the setting of the "intr" mount option
The "intr" mount option has been deprecated for a while, but
/proc/mounts continues to display "nointr" whether "intr" or "nointr"
has been specified for a mount point.

Since these options do not have any effect, simply do not display
them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:09 -07:00
Suresh Jayaraman bf40d3435c NFS: add support for splice writes
Adds support for splice writes. It effectively calls
generic_file_splice_write() to do the writes.

We need not worry about O_APPEND case as the combination of splice()
writes and O_APPEND is disallowed. This patch propagates NFS write
errors back to the caller. The number of bytes written via splice are
being added to NFSIO_NORMALWRITTENBYTES as these are effectively
cached writes.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:09 -07:00
Trond Myklebust 301933a0ac Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31 2009-06-17 17:59:58 -07:00
Hisashi Hifumi 536fc240e7 jbd2: clean up jbd2_journal_try_to_free_buffers()
This patch reverts 3f31fddf, which is no longer needed because if a
race between freeing buffer and committing transaction functionality
occurs and dio gets error, currently dio falls back to buffered IO due
to the commit 6ccfa806.

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Mingming Cao <cmm@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-17 20:08:51 -04:00
Ricardo Labiaga 68f3f90133 nfs41: Backchannel: CB_SEQUENCE validation
Validates the callback's sessionID, the slot number, and the sequence ID.
Increments the slot's sequence.

Detects replays, but simply prints a debug message (if debugging is enabled
since we don't yet implement a duplicate request cache for the backchannel.
This should not present a problem, since only idempotent callbacks are
currently implemented.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: Be more obvious about the return value]
[nfs41: Backchannel: dprink in host order]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:43 -07:00
Ricardo Labiaga 963891ac43 nfs41: Backchannel: New find_client_with_session()
Finds the 'struct nfs_client' that matches the server's address, major
version number, and session ID.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:43 -07:00
Ricardo Labiaga f8625a6a4b nfs41: Backchannel: Add a backchannel slot table to the session
Defines a new 'struct nfs4_slot_table' in the 'struct nfs4_session'
for use by the backchannel.  Initializes, resets, and destroys the backchannel
slot table in the same manner the forechannel slot table is initialized,
reset, and destroyed.

The sequenceid for each slot in the backchannel slot table is initialized
to 0, whereas the forechannel slotid's sequenceid is set to 1.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:42 -07:00
Ricardo Labiaga 050047ce71 nfs41: Backchannel: Refactor nfs4_init_slot_table()
Generalize nfs4_init_slot_table() so it can be used to initialize the
backchannel slot table in addition to the forechannel slot table.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:42 -07:00
Ricardo Labiaga b73dafa7ac nfs41: Backchannel: Refactor nfs4_reset_slot_table()
Generalize nfs4_reset_slot_table() so it can be used to reset the
backchannel slot table in addition to the forechannel slot table.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:41 -07:00
Ricardo Labiaga 65fc64e547 nfs41: Backchannel: update cb_sequence args and results
Change the type of cs_addr and csr_status to 'struct sockaddr' and
'__be32' since the cb_sequence processing function will use existing
functionality that expects these types.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:40 -07:00
Benny Halevy 281fe15dc1 nfs41: verify CB_SEQUENCE position in callback compound
CB_SEQUENCE must appear first in the callback compound RPC.
If it is not the first operation NFS4ERR_SEQUENCE_POS must be returned.
If the first operation ni the CB_COMPOUND is not CB_SEQUENCE then
NFS4ERR_OP_NOT_IN_SESSION must be returned.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: refactor op preprocessing out of process_op]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:39 -07:00
Benny Halevy 4aece6a19c nfs41: cb_sequence xdr implementation
[nfs41: get rid of READMEM and COPYMEM for callback_xdr.c]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get rid of READ64 in callback_xdr.c]
See http://linux-nfs.org/pipermail/pnfs/2009-June/007846.html
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:38 -07:00
Benny Halevy d49433e1e3 nfs41: cb_sequence proc implementation
Currently, just free up any referring calls information.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: fix csr_{,target}highestslotid]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:38 -07:00
Benny Halevy 2d9b9ec344 nfs41: cb_sequence protocol level data structures
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:37 -07:00
Benny Halevy 34bc47c941 nfs41: consider minorversion in callback_xdr:process_op
Note that this patch changes the nfsv4.0 behavior also when
CONFIG_NFS_V4_1 is not defined where NFS4ERR_MINOR_VERS_MISMATCH
will be returned if the client received a CB_COMPOUND
with minorversion != 0.  Previously, it would have
returned NFS4ERR_OP_ILLEGAL for CB_SEQUENCE.
(or if the server is broken and sent OP_CB_GETATTR or OP_CB_RECALL
with minorversion!=0, they would have been processed normally.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: refactor op preprocessing out of process_op]
See http://linux-nfs.org/pipermail/pnfs/2009-June/007845.html
[nfs41: define CB_NOTIFY_DEVICEID as not supported]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:37 -07:00
Benny Halevy 45377b94ed nfs41: callback numbers definitions
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:36 -07:00
Benny Halevy 48a9e2d228 nfs41: decode minorversion 1 cb_compound header
decode cb_compound header conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Get rid of cb_compound_hdr_arg.callback_ident

callback_ident is not used anywhere so we shouldn't waste any memory to
store it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: no need to break read_buf in decode_compound_hdr_arg]
See http://linux-nfs.org/pipermail/pnfs/2009-June/007844.html
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:35 -07:00
Benny Halevy b8f2ef84b0 nfs41: store minorversion in cb_compound_hdr_arg
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:35 -07:00
Andy Adamson 5a0ffe544c nfs41: Release backchannel resources associated with session
Frees the preallocated backchannel resources that are associated with
this session when the session is destroyed.

A backchannel is currently created once per session. Destroy the backchannel
only when the session is destroyed.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:34 -07:00
Andy Adamson 0f91421e8e nfs41: Client indicates presence of NFSv4.1 callback channel.
Set the SESSION4_BACK_CHAN flag to indicate the client supports a backchannel.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:33 -07:00
Andy Adamson 0b5b7ae0a8 nfs41: Setup the backchannel
The NFS v4.1 callback service has already been setup, and
rpc_xprt->serv points to the svc_serv structure describing it.
Invoke the xprt_setup_backchannel() initialization to pre-
allocate the necessary backchannel structures.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: change nfs4_put_session(nfs4_session**) to nfs4_destroy_session(nfs_session*)]
Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved xprt_setup_backchannel from nfs4_init_session to nfs4_init_backchannel]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:32 -07:00
Andy Adamson e82dc22dac nfs41: Allow NFSv4 and NFSv4.1 callback services to coexist
Tracks the nfs_callback_info for both versions, enabling the callback
service for v4 and v4.1 to run concurrently and be stopped independently
of each other.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:32 -07:00
Benny Halevy 8f97524235 nfs41: create a svc_xprt for nfs41 callback thread and use for incoming callbacks
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:31 -07:00
Ricardo Labiaga a43cde94fe nfs41: Implement NFSv4.1 callback service process.
nfs41_callback_up() initializes the necessary queues and creates the new
nfs41_callback_svc thread.  This thread executes the callback service which
waits for requests to arrive on the svc_serv->sv_cb_list.

NFS41_BC_MIN_CALLBACKS is set to 1 because we expect callbacks to not
cause substantial latency.

The actual processing of the callback will be implemented as a separate patch.

There is only one NFSv4.1 callback service.  The first caller of
nfs4_callback_up() creates the service, subsequent callers increment a
reference count on the service.  The service is destroyed when the last
caller invokes nfs_callback_down().

The transport needs to hold a reference to the callback service in order
to invoke it during callback processing.  Currently this reference is only
obtained when the service is first created.  This is incorrect, since
subsequent registrations for other transports will leave the xprt->serv
pointer uninitialized, leading to an oops when a callback arrives on
the "unreferenced" transport.

This patch fixes the problem by ensuring that a reference to the service
is saved in xprt->serv, either because the service is created by this
invocation to nfs4_callback_up() or by a prior invocation.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Add a reference to svc_serv during callback service bring up]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Type check arguments of nfs_callback_up]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: save svc_serv in nfs_callback_info]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Removal of ugly #ifdefs]
[nfs41: Update to removal of ugly #ifdefs]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:29 -07:00
Trond Myklebust 5cd973c44a NFSv4/NLM: Push file locking BKL dependencies down into the NLM layer
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:23:01 -07:00
Trond Myklebust 3f09df70e3 NFS: Ensure we always hold the BKL when dereferencing inode->i_flock
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:23:00 -07:00
Trond Myklebust 965b5d6791 NFSv4: Handle more errors when recovering open file and locking state
It is possible for servers to return NFS4ERR_BAD_STATEID when
the state management code is recovering locks or is reclaiming state when
returning a delegation. Ensure that we handle that case.
While we're at it, add in handlers for NFS4ERR_STALE,
NFS4ERR_ADMIN_REVOKED, NFS4ERR_OPENMODE, NFS4ERR_DENIED and
NFS4ERR_STALE_STATEID, since the protocol appears to allow for them too.

Also handle ENOMEM...

Finally, rather than add new NFSv4.0-specific errors and error handling into
the generic delegation code, move that open file and locking state error
handling into the NFSv4 layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:59 -07:00
Trond Myklebust d5122201a7 NFSv4: Move error handling out of the delegation generic code
The NFSv4 delegation recovery code is required by the protocol to handle
more errors. Rather than add NFSv4.0 specific errors into 'generic'
delegation code, we should move the error handling into the NFSv4 layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:58 -07:00
Trond Myklebust 01c3f05228 NFSv4: Fix the 'nolock' option regression
NFSv4 should just ignore the 'nolock' option. It is an NFSv2/v3 thing...
This fixes the Oops in http://bugzilla.kernel.org/show_bug.cgi?id=13330

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:58 -07:00
Benny Halevy 7146851376 nfs41: minorversion support for nfs4_{init,destroy}_callback
move nfs4_init_callback into nfs4_init_client_minor_version
and nfs4_destroy_callback into nfs4_clear_client_minor_version

as these need to happen also when auto-negotiating the minorversion
once the callback service for nfs41 becomes different than for nfs4.0

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Fix checkpatch warning]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Type check arguments of nfs_callback_up]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: Remove FIXME comment]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 13:06:01 -07:00
Benny Halevy 9bdaa86d2a nfs41: Refactor nfs4_{init,destroy}_callback for nfs4.0
Refactor-out code to bring the callback service up and down.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:47 -07:00
Benny Halevy 34dc1ad752 nfs41: increment_{open,lock}_seqid
Unlike minorversion0, in nfsv4.1 the open and lock seqids need
not be incremented by the client and should always be set to zero.

This is implemented using a new nfs_rpc_ops methods -
increment_open_seqid and increment_lock_seqid

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:43:45 -07:00
Andy Adamson 78722e9c92 nfs41: only retry EXCHANGE_ID on recoverable errors
Stops an infinite loop of EXCHANGE_ID.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed checkpatch warnings]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:44 -07:00
Benny Halevy 008f55d0e0 nfs41: recover lease in _nfs4_lookup_root
This creates the nfsv4.1 session on mount.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:13 -07:00
Andy Adamson b4b82607ff nfs41: get_clid_cred for EXCHANGE_ID
Unlike SETCLIENTID, EXCHANGE_ID requires a machine credential. Do not search
for credentials other than the machine credential.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:13 -07:00
Andy Adamson 90a16617ee nfs41: add a get_clid_cred function to nfs4_state_recovery_ops
EXCHANGE_ID has different credential requirements than SETCLIENTID.
Prepare for a separate credential function.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:12 -07:00
Andy Adamson 591d71cbde nfs41: establish sessions-based clientid
nfsv4.1 clientid is established via EXCHANGE_ID rather than
SETCLIENTID{,_CONFIRM}

This is implemented using a new establish_clid method in
nfs4_state_recovery_ops.

nfs41: establish clientid via exchange id only if cred != NULL

>From 2.6.26 reclaimer() uses machine cred for setting up the client id
therefore it is never expected to be NULL.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
[removed dprintk]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: lease renewal]
[revamped patch for new nfs4_state_manager design]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:11 -07:00
Andy Adamson a7b721037f nfs41: introduce get_state_renewal_cred
Use the machine cred for sending SEQUENCE to renew
the client's lease.

[revamp patch for new state management design starting 2.6.29]
[nfs41: support minorversion 1 for nfs4_check_lease]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get cred in exchange_id when cred arg is NULL]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use cl_machined_cred instead of cl_ex_cred]
    Since EXCHANGE_ID insists on using the machine credential, cl_ex_cred is
    not needed. nfs4_proc_exchange_id() is only called if the machine credential
    is available. Remove the credential logic from nfs4_proc_exchange_id.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:11 -07:00
Benny Halevy 8e69514f29 nfs41: support minorversion 1 for nfs4_check_lease
[moved nfs4_get_renew_cred related changes to
 "nfs41: introduce get_state_renewal_cred"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:10 -07:00
Benny Halevy 29fba38b79 nfs41: lease renewal
Send a NFSv4.1 SEQUENCE op rather than RENEW that was deprecated in
minorversion 1.
Use the nfs_client minorversion to select reboot_recover/
network_partition_recovery/state_renewal ops.

Note: we use reclaimer to create the nfs41 session before there are any
cl_superblocks for the nfs_client.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[revamped patch for new nfs4_state_manager design]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: obliterate nfs4_state_recovery_ops.renew_lease method]
    moved to nfs4_state_maintenance_ops
[also undid per-minorversion nfs4_state_recovery_ops here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:09 -07:00
Andy Adamson b069d94af7 nfs41: schedule async session reset
Define a new session reset state which is set upon a sequence operation error
in both the sync and async error handlers.

Place all new requests and all but the last outstanding rpc on the
slot_tbl_waitq. Spawn the recovery thread when the last slot is free.
Call nfs4_proc_destroy_session, reinitialize the session, call
nfs4_proc_create_session, clear the session reset state, and wake up the next
task on the slot_tbl_waitq.

Return the nfs4_proc_destroy_session status to the session reclaimer and
check for NFS4ERR_BADSESSION and NFS4ERR_DEADSESSION. Other destroy session
errors should be handled in nfs4_proc_destroy_session where the call can
be retried with adjusted arguments.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
nfs41: make nfs4_wait_bit_killable public]
    nfs4_wait_bit_killable to be used by NFSv4.1 session recover logic.
Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: have create_session work on nfs_client]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: trigger the state manager for session reset]
    Replace the session reset state with the NFS4CLNT_SESSION_SETUP cl_state.
    Place all rpc tasks to sleep on the slot table waitqueue until the slot
    table is drained, then schedule state recovery and wait for it to complete.
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: remove nfs41_session_recovery [ch]
Replaced by using the nfs4_state_manager.
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: nfs4_wait_bit_killable only used locally]
[nfs41: keep nfs4_wait_bit_killable static]
[nfs41: keep const nfs_server in nfs4_handle_exception]
[nfs41: remove session parameter from nfs4_find_slot]
Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: resset the session from nfs41_setup_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:09 -07:00
Andy Adamson 4745e3154b nfs41: kick start nfs41 session recovery when handling errors
Remove checking for any errors that the SEQUENCE operation does not return.
-NFS4ERR_STALE_CLIENTID, NFS4ERR_EXPIRED, NFS4ERR_CB_PATH_DOWN, NFS4ERR_BACK_CHAN_BUSY, NFS4ERR_OP_NOT_IN_SESSION.

SEQUENCE operation error recovery is very primative, we only reset the session.

Remove checking for any errors that are returned by the SEQUENCE operation, but
that resetting the session won't address.
NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SEQUENCE_POS,NFS4ERR_TOO_MANY_OPS.

Add error checking for missing SEQUENCE errors that a session reset will
address.
NFS4ERR_BAD_HIGH_SLOT, NFS4ERR_DEADSESSION, NFS4ERR_SEQ_FALSE_RETRY.

A reset of the session is currently our only response to a SEQUENCE operation
error. Don't reset the session on errors where a new session won't help.

Don't reset the session on errors where a new session won't help.

[nfs41: nfs4_async_handle_error update error checking]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: trigger the state manager for session reset]
    Replace session state bit with nfs_client state bit.  Set the
    NFS4CLNT_SESSION_SETUP bit upon a session related error in the sync/async
    error handlers.
[nfs41: _nfs4_async_handle_error fix session reset error list]
Sequence operation errors that session reset could help.
NFS4ERR_BADSESSION
NFS4ERR_BADSLOT
NFS4ERR_BAD_HIGH_SLOT
NFS4ERR_DEADSESSION
NFS4ERR_CONN_NOT_BOUND_TO_SESSION
NFS4ERR_SEQ_FALSE_RETRY
NFS4ERR_SEQ_MISORDERED

Sequence operation errors that a session reset would not help

NFS4ERR_BADXDR
NFS4ERR_DELAY
NFS4ERR_REP_TOO_BIG
NFS4ERR_REP_TOO_BIG_TO_CACHE
NFS4ERR_REQ_TOO_BIG
NFS4ERR_RETRY_UNCACHED_REP
NFS4ERR_SEQUENCE_POS
NFS4ERR_TOO_MANY_OPS

Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41 nfs4_handle_exception fix session reset error list]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved nfs41_sequece_call_done code to nfs41: sequence operation]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:08 -07:00