Commit graph

27090 commits

Author SHA1 Message Date
Al Viro
66f8f50920 affs: bury unused macros
... unused since 2.4.4.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Al Viro
af569596a9 kill v9fs_dentry_from_dir_inode()
In *all* callers we have a dentry of child of that directory.
Just use ->d_parent of that one, for fsck sake...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Sage Weil
c862868bb4 ceph: move encode_fh to new API
Use parent_inode has a flag for whether nfsd wants a connectable fh, but
generate one opportunistically so that we can take advantage of the
additional info in there.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro
b0b0382bb4 ->encode_fh() API change
pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.

NOTE: that needs ceph fix folded in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro
6d42e7e9f6 ubifs: use generic_fillattr()
don't open-code it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro
77ba78776e xfs: switch to proper __bitwise type for KM_... flags
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro
c217a2a004 switch utimes() to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro
0aa2ee5f0a switch statfs to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:31 -04:00
Al Viro
bdc689594b switch flock to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:31 -04:00
Al Viro
20ba5d736f switch signalfd4() to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro
545ec2c794 switch fcntl to fget_raw_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro
7449af1e8b switch xattr syscalls to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro
863ced7fe7 switch readdir/getdents to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:29 -04:00
Al Viro
c2bd6c11cd switch do_fsync() to fget_light()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:29 -04:00
Linus Torvalds
442a9ffabb Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS updates from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (29 commits)
  cifs: fix oops while traversing open file list (try #4)
  cifs: Fix comment as d_alloc_root() is replaced by d_make_root()
  CIFS: Introduce SMB2 mounts as vers=2.1
  CIFS: Introduce SMB2 Kconfig option
  CIFS: Move add/set_credits and get_credits_field to ops structure
  CIFS: Move protocol specific demultiplex thread calls to ops struct
  CIFS: Move protocol specific part from cifs_readv_receive to ops struct
  CIFS: Move header_size/max_header_size to ops structure
  CIFS: Move protocol specific part from SendReceive2 to ops struct
  cifs: Include backup intent search flags during searches {try #2)
  CIFS: Separate protocol specific part from setlk
  CIFS: Separate protocol specific part from getlk
  CIFS: Separate protocol specific lock type handling
  CIFS: Convert lock type to 32 bit variable
  CIFS: Move locks to cifsFileInfo structure
  cifs: convert send_nt_cancel into a version specific op
  cifs: add a smb_version_operations/values structures and a smb_version enum
  cifs: remove the vers= and version= synonyms for ver=
  cifs: add warning about change in default cache semantics in 3.7
  cifs: display cache= option in /proc/mounts
  ...
2012-05-29 12:42:10 -07:00
Linus Torvalds
53f2c4a8fd NFS client updates for Linux 3.5
New features include:
 - Rewrite the O_DIRECT code so that it can share the same coalescing and
   pNFS functionality as the page cache code.
 - Allow the server to provide hints as to when we should use pNFS, and
   when it is more efficient to read and write through the metadata
   server.
 - NFS cache consistency updates:
   - Use the ctime to emulate a change attribute for NFSv2/v3 so that
     all NFS versions can share the same cache management code.
   - New cache management code will only look at the change attribute
     and size attribute when deciding whether or not our cached data
     is still valid or not.
   - Don't request NFSv4 post-op attributes on writes in cases such as
     O_DIRECT, where we don't care about data cache consistency, or
     when we have a write delegation, and know that our cache is
     still consistent.
   - Don't request NFSv4 post-op attributes on operations such as
     COMMIT, where there are no expected metadata updates.
   - Don't request NFSv4 directory post-op attributes in cases where
     the operations themselves already return change attribute updates:
     i.e.  operations such as OPEN, CREATE, REMOVE, LINK and RENAME.
 - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
   if we detect no attempts to lookup filenames.
 - Improve the code sharing between NFSv2/v3 and v4 mounts
 - NFSv4.1 state management efficiency improvements
 - More patches in preparation for NFSv4/v4.1 migration functionality.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw/MNAAoJEGcL54qWCgDyxU8P/2kKqhAlhoLEArBqo9FT3/OK
 YrNs5uO/erTgnCG8L0XQvTKjHB9F7TAeFXqTmBZuPlb1afRpHHt2vzPqzIvUCeOC
 ZXm8vzZf4nxWZgEFoTDdUBvqQi9lLdIzCRhSaVCKcRnNwiuaKDd/iwykbWGcHqmv
 jtR4lzXPllJdKCUL3yb3juVrpq6Vvn254ID2pqdnYcEtIJIHgaRZpwdp4Iz9+8b5
 Moishiw2rgCBJIhf+VCYd8B2oYfMgSDPxG1o3etkwY46qo+4s+CIls9Vu/6YzGXK
 3+NdLatRDqKhQpLm0/R+dI3rntnTZ8x6LgWnTGxUsiqb6pAaHZPK284rf2eh/s7M
 Q4G4203r0uw539kIt6eKOGqC9c8kZAPCHlQSPCaImZyCJsz+6OMShNlGB5bZpFPr
 tbdxaxudrhCF7UVKXicJCWgv2nIHtek6fNwey1jqFoYgZP5ipiBKymvXQC5WAMBw
 7RHJor/JEC+UJkVg/7Mkpg0UNw3E36CTYLeRJKlNCS6YO9NJQseCDxhhMNAy/ab7
 RGO8DVMkUsOUH20S+a19LyeFQtveWFIE0DiDqRn0KnNGhGwHrv2t4xFukjlrf4Sw
 8FQUBRdtFxfmspfA1IdoTY49XZQda5eagvTy1MyaWEh+jPSJ4G5j3sSjFiaKAJqw
 79iQKFGkxPOSHx2yCdAF
 =suVW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "New features include:
   - Rewrite the O_DIRECT code so that it can share the same coalescing
     and pNFS functionality as the page cache code.
   - Allow the server to provide hints as to when we should use pNFS,
     and when it is more efficient to read and write through the
     metadata server.
   - NFS cache consistency updates:
     * Use the ctime to emulate a change attribute for NFSv2/v3 so that
       all NFS versions can share the same cache management code.
     * New cache management code will only look at the change attribute
       and size attribute when deciding whether or not our cached data
       is still valid or not.
     * Don't request NFSv4 post-op attributes on writes in cases such as
       O_DIRECT, where we don't care about data cache consistency, or
       when we have a write delegation, and know that our cache is still
       consistent.
     * Don't request NFSv4 post-op attributes on operations such as
       COMMIT, where there are no expected metadata updates.
     * Don't request NFSv4 directory post-op attributes in cases where
       the operations themselves already return change attribute
       updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and
       RENAME.
   - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
     if we detect no attempts to lookup filenames.
   - Improve the code sharing between NFSv2/v3 and v4 mounts
   - NFSv4.1 state management efficiency improvements
   - More patches in preparation for NFSv4/v4.1 migration functionality."

Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache
qstr name initialization changes (that made the length/hash a 64-bit
union)

* tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits)
  NFSv4: Add debugging printks to state manager
  NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
  NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
  NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
  NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
  NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
  NFSv4.1: Handle errors in nfs4_bind_conn_to_session
  NFSv4.1: nfs4_bind_conn_to_session should drain the session
  NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
  NFSv4.1: Add DESTROY_CLIENTID
  NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
  NFSv4.1: Ensure we use the correct credentials for session create/destroy
  NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
  NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
  NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
  NFSv4: Clean up the error handling for nfs4_reclaim_lease
  NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
  nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
  nfs4.1: add BIND_CONN_TO_SESSION operation
  NFSv4.1 test the mdsthreshold hint parameters
  ...
2012-05-29 10:43:51 -07:00
Trond Myklebust
cc0a984368 NFSv4: Add debugging printks to state manager
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-28 17:21:57 -04:00
Trond Myklebust
fb13bfa7e1 NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
If a file OPEN is denied due to a share lock, the resulting
NFS4ERR_SHARE_DENIED is currently mapped to the default EIO.
This patch adds a more appropriate mapping, and brings Linux
into line with what Solaris 10 does.

See https://bugzilla.kernel.org/show_bug.cgi?id=43286

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-05-28 17:21:48 -04:00
Linus Torvalds
a01ee165a1 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull exofs updates from Boaz Harrosh:
 "Just a couple of patches.  The first is a BUG fix destined for stable
  which missed the 3.4-rc7 Kernel.  The second is just a fixture
  addition so exofs is able to be better exported as a cluster file
  system via pNFS."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Add SYSFS info for autologin/pNFS export
  exofs: Fix CRASH on very early IO errors.
2012-05-28 13:10:41 -07:00
Linus Torvalds
90324cc1b1 avoid iput() from flusher thread
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
 uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
 +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
 KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
 ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
 tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
 Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
 NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
 /9c4zxxekjHZnn2oooEa
 =oLXf
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Trond Myklebust
359d7d1c97 NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
We're already invalidating the data cache, and setting the new change
attribute. Since directories don't care about the i_size field, there
is no need to be forcing any extra revalidation of the page cache.

We do keep the NFS_INO_INVALID_ATTR flag, in order to force an
attribute cache revalidation on stat() calls since we do not
update the mtime and ctime fields.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-28 10:05:47 -04:00
Trond Myklebust
f2c1b5100d NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
The results from a call to nfs4_proc_create_session() should always
be fed into nfs4_handle_reclaim_lease_error, so that we can
handle errors such as NFS4ERR_SEQ_MISORDERED correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:50:44 -04:00
Trond Myklebust
9f594791dd NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
Let nfs4_schedule_session_recovery() handle the details of choosing
between resetting the session, and other session related recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:33:07 -04:00
Trond Myklebust
7c5d725684 NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:33:07 -04:00
Trond Myklebust
bf674c8228 NFSv4.1: Handle errors in nfs4_bind_conn_to_session
Ensure that we handle NFS4ERR_DELAY errors separately, and then
let nfs4_recovery_handle_error() handle all other cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:32:06 -04:00
Trond Myklebust
43ac544cb3 NFSv4.1: nfs4_bind_conn_to_session should drain the session
In order to avoid races with other RPC calls that end up setting the
NFS4CLNT_BIND_CONN_TO_SESSION flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:00:08 -04:00
Linus Torvalds
36126f8f2e word-at-a-time: make the interfaces truly generic
This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
complicated, but a lot more generic.

In particular, it allows us to really do the operations efficiently on
both little-endian and big-endian machines, pretty much regardless of
machine details.  For example, if you can rely on a fast population
count instruction on your architecture, this will allow you to make your
optimized <asm/word-at-a-time.h> file with that.

NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
not truly generic, it actually only works on big-endian.  Why? Because
on little-endian the generic algorithms are wasteful, since you can
inevitably do better. The x86 implementation is an example of that.

(The only truly non-generic part of the asm-generic implementation is
the "find_zero()" function, and you could make a little-endian version
of it.  And if the Kbuild infrastructure allowed us to pick a particular
header file, that would be lovely)

The <asm/word-at-a-time.h> functions are as follows:

 - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
   uses.

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.

   This is the "quick-and-dirty" zero tester: it's what is run inside
   the hot loops.

 - "prep_zero_mask()": take the word, the data that has_zero() produced,
   and the constant pool, and generate an *exact* mask of which byte had
   the first zero.  This is run directly *outside* the loop, and allows
   the "has_zero()" function to answer the "is there a zero byte"
   question without necessarily getting exactly *which* byte is the
   first one to contain a zero.

   If you do multiple byte lookups concurrently (eg "hash_name()", which
   looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
   phase, the result of those can be or'ed together to get the "either
   or" case.

 - The result from "prep_zero_mask()" can then be fed into "find_zero()"
   (to find the byte offset of the first byte that was zero) or into
   "zero_bytemask()" (to find the bytemask of the bytes preceding the
   zero byte).

   The existence of zero_bytemask() is optional, and is not necessary
   for the normal string routines.  But dentry name hashing needs it, so
   if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

This changes the generic strncpy_from_user() function and the dentry
hashing functions to use these modified word-at-a-time interfaces.  This
gets us back to the optimized state of the x86 strncpy that we lost in
the previous commit when moving over to the generic version.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-26 11:33:40 -07:00
Trond Myklebust
32b0131069 NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory
supposed to already know the correct value of the seqid, in which case
RFC5661 states that it should ignore the value returned.

Also ensure that if the sanity check in nfs4_check_cl_exchange_flags
fails, then we must not change the nfs_client fields.

Finally, clean up the code: we don't need to retest the value of
'status' unless it can change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:31 -04:00
Trond Myklebust
6624553910 NFSv4.1: Add DESTROY_CLIENTID
Ensure that we destroy our lease on last unmount

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:30 -04:00
Trond Myklebust
2cf047c994 NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:10 -04:00
Trond Myklebust
848f5bda54 NFSv4.1: Ensure we use the correct credentials for session create/destroy
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 18:02:09 -04:00
Trond Myklebust
ad24ecfbcd NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
For backward compatibility with nfs-utils.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:09 -04:00
Trond Myklebust
89a217360e NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
Apparently the patch "NFS: Always use the same SETCLIENTID boot verifier"
is tickling a Linux nfs server bug, and causing a regression: the server
can get into a situation where it keeps replying NFS4ERR_SEQ_MISORDERED
to our CREATE_SESSION request even when we are sending the correct
sequence ID.

Fix this by purging the lease and then retrying.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:42 -04:00
Trond Myklebust
be0bfed002 NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
Otherwise we can end up not sending a new exchange-id/setclientid

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:13 -04:00
Trond Myklebust
2a6ee6aa2f NFSv4: Clean up the error handling for nfs4_reclaim_lease
Try to consolidate the error handling for nfs4_reclaim_lease into
a single function instead of doing a bit here, and a bit there...

Also ensure that NFS4CLNT_PURGE_STATE handles errors correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:13 -04:00
Linus Torvalds
ece78b7df7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
 "Interesting bits are:
   - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
     quota code does not need i_mutex anymore in any unusual way.
   - backport (from ext4) of a fix of a checkpointing bug (missing cache
     flush) that could lead to fs corruption on power failure

  The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: trivial fix to comment for ext2_free_blocks
  ext2: remove the redundant comment for ext2_export_ops
  ext3: return 32/64-bit dir name hash according to usage type
  quota: Get rid of nested I_MUTEX_QUOTA locking subclass
  quota: Use precomputed value of sb_dqopt in dquot_quota_sync
  ext2: Remove i_mutex use from ext2_quota_write()
  reiserfs: Remove i_mutex use from reiserfs_quota_write()
  ext4: Remove i_mutex use from ext4_quota_write()
  ext3: Remove i_mutex use from ext3_quota_write()
  quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
  jbd: Write journal superblock with WRITE_FUA after checkpointing
  jbd: protect all log tail updates with j_checkpoint_mutex
  jbd: Split updating of journal superblock and marking journal empty
  ext2: do not register write_super within VFS
  ext2: Remove s_dirt handling
  ext2: write superblock only once on unmount
  ext3: update documentation with barrier=1 default
  ext3: remove max_debt in find_group_orlov()
  jbd: Refine commit writeout logic
2012-05-25 08:14:59 -07:00
Linus Torvalds
b5f4035adf Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly.
  * Fix self-ballooning code, and balloon logic when booting as initial domain.
  * Move array printing code to generic debugfs
  * Support XenBus domains.
  * Lazily free grants when a domain is dead/non-existent.
  * In M2P code use batching calls
 Bug-fixes:
  * Fix NULL dereference in allocation failure path (hvc_xen)
  * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
  * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPu9MpAAoJEFjIrFwIi8fJaNQH/RylThiO+O+LBpPrO8VRUw+2
 /Io98T7ZK2ggoUeaJx0C8irM0JMFAkxGMcfX3w9fwNt/BTec4s++4JhbN1jYN0da
 6a0PqINo+M8y73So6CBfuJDCunaRLGKVG/ibIO3Y3WAff51/H+DMvO7uYYDAE0aA
 mikyOxnaty0DiG5i4JGDHGmCzDASfK/jgGccZ03m6522mDx5ZIbTzZWONLfz8dqT
 rbxnn9vrNLgEYWuzyLMwW0GymToUtt01xBQvwJLAbhn8lr1WBRBLpxXA+5iYNQrn
 Ri25G7keYJhG4uwZfaHnR+4HTrmhlGzK1Z96dkqpGUaeIcdyWmPMp22VtBBiwG8=
 =uyRr
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen updates from Konrad Rzeszutek Wilk:
 "Features:
   * Extend the APIC ops implementation and add IRQ_WORKER vector
     support so that 'perf' can work properly.
   * Fix self-ballooning code, and balloon logic when booting as initial
     domain.
   * Move array printing code to generic debugfs
   * Support XenBus domains.
   * Lazily free grants when a domain is dead/non-existent.
   * In M2P code use batching calls
  Bug-fixes:
   * Fix NULL dereference in allocation failure path (hvc_xen)
   * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
   * Fix HVM guest resume - we would leak an PIRQ value instead of
     reusing the existing one."

Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.

* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: do not map the same GSI twice in PVHVM guests.
  hvc_xen: NULL dereference on allocation failure
  xen: Add selfballoning memory reservation tunable.
  xenbus: Add support for xenbus backend in stub domain
  xen/smp: unbind irqworkX when unplugging vCPUs.
  xen: enter/exit lazy_mmu_mode around m2p_override calls
  xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
  xen: implement IRQ_WORK_VECTOR handler
  xen: implement apic ipi interface
  xen/setup: update VA mapping when releasing memory during setup
  xen/setup: Combine the two hypercall functions - since they are quite similar.
  xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
  xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
  xen/gnttab: add deferred freeing logic
  debugfs: Add support to print u32 array in debugfs
  xen/p2m: An early bootup variant of set_phys_to_machine
  xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
  xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
  xen/p2m: Move code around to allow for better re-usage.
2012-05-24 16:02:08 -07:00
Linus Torvalds
ce004178be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc changes from David S. Miller:
 "This has the generic strncpy_from_user() implementation architectures
  can now use, which we've been developing on linux-arch over the past
  few days.

  For good measure I ran both a 32-bit and a 64-bit glibc testsuite run,
  and the latter of which pointed out an adjustment I needed to make to
  sparc's user_addr_max() definition.  Linus, you were right, STACK_TOP
  was not the right thing to use, even on sparc itself :-)

  From Sam Ravnborg, we have a conversion of sparc32 over to the common
  alloc_thread_info_node(), since the aspect which originally blocked
  our doing so (sun4c) has been removed."

Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix user_addr_max() definition.
  lib: Sparc's strncpy_from_user is generic enough, move under lib/
  kernel: Move REPEAT_BYTE definition into linux/kernel.h
  sparc: Increase portability of strncpy_from_user() implementation.
  sparc: Optimize strncpy_from_user() zero byte search.
  sparc: Add full proper error handling to strncpy_from_user().
  sparc32: use the common implementation of alloc_thread_info_node()
2012-05-24 15:10:28 -07:00
Linus Torvalds
9978306e31 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Pull XFS update from Ben Myers:

 - Removal of xfsbufd
 - Background CIL flushes have been moved to a workqueue.
 - Fix to xfs_check_page_type applicable to filesystems where
   blocksize < page size
 - Fix for stale data exposure when extsize hints are used.
 - A series of xfs_buf cache cleanups.
 - Fix for XFS_IOC_ALLOCSP
 - Cleanups for includes and removal of xfs_lrw.[ch].
 - Moved all busy extent handling to it's own file so that it is easier
   to merge with userspace.
 - Fix for log mount failure.
 - Fix to enable inode reclaim during quotacheck at mount time.
 - Fix for delalloc quota accounting.
 - Fix for memory reclaim deadlock on agi buffer.
 - Fixes for failed writes and to clean up stale delalloc blocks.
 - Fix to use GFP_NOFS in blkdev_issue_flush
 - SEEK_DATA/SEEK_HOLE support

* 'for-linus' of git://oss.sgi.com/xfs/xfs: (57 commits)
  xfs: add trace points for log forces
  xfs: fix memory reclaim deadlock on agi buffer
  xfs: fix delalloc quota accounting on failure
  xfs: protect xfs_sync_worker with s_umount semaphore
  xfs: introduce SEEK_DATA/SEEK_HOLE support
  xfs: make xfs_extent_busy_trim not static
  xfs: make XBF_MAPPED the default behaviour
  xfs: flush outstanding buffers on log mount failure
  xfs: Properly exclude IO type flags from buffer flags
  xfs: clean up xfs_bit.h includes
  xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
  xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
  xfs: move xfs_fsb_to_db to xfs_bmap.h
  xfs: clean up busy extent naming
  xfs: move busy extent handling to it's own file
  xfs: move xfsagino_t to xfs_types.h
  xfs: use iolock on XFS_IOC_ALLOCSP calls
  xfs: kill XBF_DONTBLOCK
  xfs: kill xfs_read_buf()
  xfs: kill XBF_LOCK
  ...
2012-05-24 14:14:46 -07:00
Trond Myklebust
bbafffd293 NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
Exchange ID can be called in a lease reclaim situation, so it
will deadlock if it then tries to write out dirty NFS pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:31:39 -04:00
Weston Andros Adamson
a9e64442f1 nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
The state manager can handle SEQ4_STATUS_CB_PATH_DOWN* flags with a
BIND_CONN_TO_SESSION instead of destroying the session and creating a new one.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:26:21 -04:00
Weston Andros Adamson
7c44f1ae4a nfs4.1: add BIND_CONN_TO_SESSION operation
This patch adds the BIND_CONN_TO_SESSION operation which is needed for
upcoming SP4_MACH_CRED work and useful for recovering from broken connections
without destroying the session.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:22:19 -04:00
Andy Adamson
d23d61c8d3 NFSv4.1 test the mdsthreshold hint parameters
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:49 -04:00
Andy Adamson
2701d086db NFSv4.1 add nfs_inode book keeping for mdsthreshold
Keep track of the number of bytes read or written via buffered, direct, and
mem-mapped i/o for use by mdsthreshold size_io hints.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson
82be417aa3 NFSv4.1 cache mdsthreshold values on OPEN
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson
88034c3d88 NFSv4.1 mdsthreshold attribute xdr
We only support one layout type per file system, so one threshold_item4 per
mdsthreshold4.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:47 -04:00
David S. Miller
446969084d kernel: Move REPEAT_BYTE definition into linux/kernel.h
And make sure that everything using it explicitly includes
that header file.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 13:10:05 -07:00
Linus Torvalds
28f3d71761 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull more networking updates from David Miller:
 "Ok, everything from here on out will be bug fixes."

1) One final sync of wireless and bluetooth stuff from John Linville.
   These changes have all been in his tree for more than a week, and
   therefore have had the necessary -next exposure.  John was just away
   on a trip and didn't have a change to send the pull request until a
   day or two ago.

2) Put back some defines in user exposed header file areas that were
   removed during the tokenring purge.  From Stephen Hemminger and Paul
   Gortmaker.

3) A bug fix for UDP hash table allocation got lost in the pile due to
   one of those "you got it..  no I've got it.." situations.  :-)

   From Tim Bird.

4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
   try to coalesce overlapping frags and crash.  Fix from Eric Dumazet.

5) RCU routing table lookups can race with free_fib_info(), causing
   crashes when we deref the device pointers in the route.  Fix by
   releasing the net device in the RCU callback.  From Yanmin Zhang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
  tcp: take care of overlaps in tcp_try_coalesce()
  ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
  mm: add a low limit to alloc_large_system_hash
  ipx: restore token ring define to include/linux/ipx.h
  if: restore token ring ARP type to header
  xen: do not disable netfront in dom0
  phy/micrel: Fix ID of KSZ9021
  mISDN: Add X-Tensions USB ISDN TA XC-525
  gianfar:don't add FCB length to hard_header_len
  Bluetooth: Report proper error number in disconnection
  Bluetooth: Create flags for bt_sk()
  Bluetooth: report the right security level in getsockopt
  Bluetooth: Lock the L2CAP channel when sending
  Bluetooth: Restore locking semantics when looking up L2CAP channels
  Bluetooth: Fix a redundant and problematic incoming MTU check
  Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
  Bluetooth: Fix EIR data generation for mgmt_device_found
  Bluetooth: Fix Inquiry with RSSI event mask
  Bluetooth: improve readability of l2cap_seq_list code
  Bluetooth: Fix skb length calculation
  ...
2012-05-24 11:54:29 -07:00
Tim Bird
31fe62b958 mm: add a low limit to alloc_large_system_hash
UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.

On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.

As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.

This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.

We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.

Reported-by: Mark Asselstine <mark.asselstine@windriver.com>
Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 00:28:21 -04:00
Linus Torvalds
644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00