Commit Graph

27671 Commits (f8310c59201b183ebee2e3fe0c7242f5729be0af)

Author SHA1 Message Date
Trond Myklebust 2a4c8994ee NFSv4.1: Fix umount when filelayout DS is also the MDS
Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client.  The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.

The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.

The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.

Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18 08:45:16 -04:00
Dan Carpenter 1cfb727107 UBIFS: fix assertion
The asserts here never check anything because it uses '|' instead of
'&'.  Now if the flags are not set it prints a warning a a stack trace.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-18 14:17:08 +03:00
Matthew Garrett 7dea9665fe hfsplus: fix bless ioctl when used with hardlinks
HFS+ doesn't really implement hard links - instead, hardlinks are indicated
by a magic file type which refers to an indirect node in a hidden
directory. The spec indicates that stat() should return the inode number
of the indirect node, but it turns out that this doesn't satisfy the
firmware when it's looking for a bootloader - it wants the catalog ID of
the hardlink file instead. Fix up this case.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-17 14:39:59 -07:00
Janne Kalliomäki a6dc8c0421 hfsplus: fix overflow in sector calculations in hfsplus_submit_bio
The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.

Signed-off-by: Janne Kalliomäki <janne@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-17 14:39:45 -07:00
Linus Torvalds d865983292 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs compile warning fixes from Chris Mason.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: cast devid to unsigned long long for printk %llu
  Btrfs: init old_generation in get_old_root
2012-06-16 17:01:41 -07:00
Linus Torvalds 873b779d99 NFS client bugfixes for Linux 3.5
Highlights include:
 
  - Fix a couple of mount regressions due to the recent cleanups.
  - Fix an Oops in the open recovery code
  - Fix an rpc_pipefs upcall hang that results from some of the
    net namespace work from 3.4.x (stable kernel candidate).
  - Fix a couple of write and o_direct regressions that were found
    at last weeks Bakeathon testing event in Ann Arbor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP2gmaAAoJEGcL54qWCgDyrBMP/RY/T++He8y5k3M9aEqiIv0q
 D8ZVMwzID6f4Zgw4xRg96aYr02sBTw0q+0mP5x1EZmg8mK29rnBiVeKHE1iwSfXq
 10/SYISlpIjhJC4I4kHXGd2KClgj7qRRCbDKFRWwoIIwYU+kJn8MRnPa9XqdL8kP
 q68lrtayW8THSJDR8bk1GQn+ARxGeoY++qzHxm3vpQCbZVVb19VqKMWAWSN4VKqb
 epWehOSAzB3iA7HrLRbf8Y8/sDdXewxCQpr9CC/wxuu++l5ifPphR0ToX+k9VZXI
 BKFLUojCUZHTMAgCxuxjrFYehMeyClbzL2lLkz5Pgj0gQhOX6Myj+WMXoEg/uWfo
 XNf51FH3yBbnfayTaOUs6Y50iuU+dQO7TUTAoWTPpW9V/iT5z/fWAKUVJhDtrPk5
 DVDkR6SEgb4P1RqkehZKLq5k5GSAcTR+MZr452eDrFYXJrY8ORDE6o6kP4Rr3Nnd
 n8gap0gHxzIYlhBghem6+nLN+HhpZQopWeD8mNub20VuXsChRDr9/+XWuMCSJaZF
 2kleVdt2+rTDzi9bJTRYlsX397oaThL0NbRvshHAwnXIDtIQrzxx6+dUyOsEWMEu
 go/EdSUUESXGNlsWTqewCBsOjPeE4L5ijI/QglfDkF+CzD5dDjrxl+5i57iMKVfc
 Ydste3pQJkS7PiZu1sWA
 =unbu
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Fix a couple of mount regressions due to the recent cleanups.
   - Fix an Oops in the open recovery code
   - Fix an rpc_pipefs upcall hang that results from some of the net
     namespace work from 3.4.x (stable kernel candidate).
   - Fix a couple of write and o_direct regressions that were found at
     last weeks Bakeathon testing event in Ann Arbor."

* tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add an endian notation for sparse
  NFSv4.1: integer overflow in decode_cb_sequence_args()
  rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
  NFSv4 do not send an empty SETATTR compound
  NFSv2: EOF incorrectly set on short read
  NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
  NFS: fix directio refcount bug on commit
  NFSv4: Fix unnecessary delegation returns in nfs4_do_open
  NFSv4.1: Convert another trivial printk into a dprintk
  NFS4: Fix open bug when pnfs module blacklisted
  NFS: Remove incorrect BUG_ON in nfs_found_client
  NFS: Map minor mismatch error to protocol not support error.
  NFS: Fix a commit bug
  NFS4: Set parsed mount data version to 4
  NFSv4.1: Ensure we clear session state flags after a session creation
  NFSv4.1: Convert a trivial printk into a dprintk
  NFSv4: Fix up decode_attr_mdsthreshold
  NFSv4: Fix an Oops in the open recovery code
  NFSv4.1: Fix a request leak on the back channel
2012-06-15 17:37:23 -07:00
Linus Torvalds 93dd048dbd Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from J. Bruce Fields.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux:
  nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
  NFS: hard-code init_net for NFS callback transports
2012-06-15 17:27:31 -07:00
Chris Mason a8c4a33b98 Btrfs: cast devid to unsigned long long for printk %llu
Avoid warning in 32 bit machines

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 20:07:17 -04:00
Chris Mason 4325edd078 Btrfs: init old_generation in get_old_root
gcc was giving an uninit variable warning here.  Strictly
speaking we don't need to init it, but this will make things
much less error prone.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 20:06:54 -04:00
Linus Torvalds 718f58ad61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "The dates look like I had to rebase this morning because there was a
  compiler warning for a printk arg that I had missed earlier.

  These are all fixes, including one to prevent using stale pointers for
  device names, and lots of fixes around transaction abort cleanups
  (Josef, Liu Bo).

  Jan Schmidt also sent in a number of fixes for the new reference
  number tracking code.

  Liu Bo beat me to updating the MAINTAINERS file.  Since he thought to
  also fix the git url, I kept his commit."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
  Btrfs: update MAINTAINERS info for BTRFS FILE SYSTEM
  Btrfs: destroy the items of the delayed inodes in error handling routine
  Btrfs: make sure that we've made everything in pinned tree clean
  Btrfs: avoid memory leak of extent state in error handling routine
  Btrfs: do not resize a seeding device
  Btrfs: fix missing inherited flag in rename
  Btrfs: fix incompat flags setting
  Btrfs: fix defrag regression
  Btrfs: call filemap_fdatawrite twice for compression
  Btrfs: keep inode pinned when compressing writes
  Btrfs: implement ->show_devname
  Btrfs: use rcu to protect device->name
  Btrfs: unlock everything properly in the error case for nocow
  Btrfs: fix btrfs_destroy_marked_extents
  Btrfs: abort the transaction if the commit fails
  Btrfs: wake up transaction waiters when aborting a transaction
  Btrfs: fix locking in btrfs_destroy_delayed_refs
  Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
  Btrfs: fix race in tree mod log addition
  Btrfs: add btrfs_next_old_leaf
  ...
2012-06-15 16:04:37 -07:00
Kay Sievers e2ae715d66 kmsg - kmsg_dump() use iterator to receive log buffer content
Provide an iterator to receive the log buffer content, and convert all
kmsg_dump() users to it.

The structured data in the kmsg buffer now contains binary data, which
should no longer be copied verbatim to the kmsg_dump() users.

The iterator should provide reliable access to the buffer data, and also
supports proper log line-aware chunking of data while iterating.

Signed-off-by: Kay Sievers <kay@vrfy.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Reported-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Tested-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-15 14:53:59 -07:00
Miao Xie 67cde3448d Btrfs: destroy the items of the delayed inodes in error handling routine
the items of the delayed inodes were forgotten to be freed, this patch
fixes it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:28 -04:00
Liu Bo ed0eaa1498 Btrfs: make sure that we've made everything in pinned tree clean
Since we have two trees for recording pinned extents, we need to go through
both of them to make sure that we've done everything clean.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:27 -04:00
Liu Bo 6e841e32b1 Btrfs: avoid memory leak of extent state in error handling routine
We've forgotten to clear extent states in pinned tree, which will results in
space counter mismatch and memory leak:

WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
...
space_info 2 has 8380416 free, is not full
space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
...

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:27 -04:00
Liu Bo 4e42ae1bdc Btrfs: do not resize a seeding device
Seeding devices are not supposed to change any more.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:42:26 -04:00
Liu Bo bc1782374b Btrfs: fix missing inherited flag in rename
When we move a file into a directory with compression flag, we need to
inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
But if we move a file into a directory without compression flag, we need
to clear both of them.

It is the way how our setflags deals with compression flag, so keep
the same behaviour here.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15 11:33:30 -04:00
Chris Mason acbcabd2de Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus 2012-06-15 11:33:16 -04:00
Li Zefan 69e380d176 Btrfs: fix incompat flags setting
It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
has only one bit set.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14 21:30:57 -04:00
Li Zefan 6c282eb40e Btrfs: fix defrag regression
If a file has 3 small extents:

| ext1 | ext2 | ext3 |

Running "btrfs fi defrag" will only defrag the last two extents, if those
extent mappings hasn't been read into memory from disk.

This bug was introduced by commit 17ce6ef8d7
("Btrfs: add a check to decide if we should defrag the range")

The cause is, that commit looked into previous and next extents using
lookup_extent_mapping() only.

While at it, remove the code that checks the previous extent, since
it's sufficient to check the next extent.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-14 21:30:55 -04:00
Josef Bacik 7ddf5a42d3 Btrfs: call filemap_fdatawrite twice for compression
I removed this in an earlier commit and I was wrong.  Because compression
can return from filemap_fdatawrite() without having actually set any of it's
pages as writeback() it can make filemap_fdatawait() do essentially nothing,
and then we won't find any ordered extents because they may not have been
created yet.  So not only does this make fsync() completely useless, but it
will also screw up if you truncate on a non-page aligned offset since we
zero out the end and then wait on ordered extents and then call drop caches.
We can drop the cache before the io completes and then we try to unpin the
extent we just wrote we won't find it and everything goes sideways.  So fix
this by putting it back and put a giant comment there to keep me from trying
to remove it in the future.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:54 -04:00
Josef Bacik 8180ef8894 Btrfs: keep inode pinned when compressing writes
A user reported lots of problems using compression on the new code and it
turns out part of the problem was that igrab() was failing when we added a
new ordered extent.  This is because when writing out an inode under
compression we immediately return without actually doing anything to the
pages, and then in another thread at some point down the line actually do
the ordered dance.  The problem is between the point that we start writeback
and we actually add the ordered extent we could be trying to reclaim the
inode, which makes igrab() return NULL.  So we need to do an igrab() when we
create the async extent and then drop it when we are done with it.  This
makes sure we stay pinned in memory until the ordered extent can get a
reference on it and we are good to go.  With this patch we no longer panic
in btrfs_finish_ordered_io().  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:53 -04:00
Josef Bacik 9c5085c147 Btrfs: implement ->show_devname
Because btrfs can remove the device that was mounted we need to have a
->show_devname so that in this case we can print out some other device in
the file system to /proc/mount.  So if there are multiple devices in a btrfs
file system we will just print the device with the lowest devid that we can
find.  This will make everything consistent and deal with device removal
properly.  The drawback is if you mount with a device that is higher than
the lowest devicd it won't show up as the mounted device in /proc/mounts,
but this is a small price to pay. This was inspired by Miao Xie's patch.
Thanks,

Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:30:37 -04:00
Josef Bacik 606686eeac Btrfs: use rcu to protect device->name
Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory.  Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock().  This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch.  Thanks,

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:16 -04:00
Josef Bacik 17ca04aff7 Btrfs: unlock everything properly in the error case for nocow
I was getting hung on umount when a transaction was aborted because a range
of one of the free space inodes was still locked.  This is because the nocow
stuff doesn't unlock anything on error.  This fixed the problem and I
verified that is what was happening.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:15 -04:00
Josef Bacik ee670f0af3 Btrfs: fix btrfs_destroy_marked_extents
So we're forcing the eb's to have their ref count set to 1 so invalidatepage
works but this breaks lots of things, for example root nodes, and is just
plain wrong, we don't need to just evict all of this stuff.  Also drop the
invalidatepage altogether and add a page_cache_release().  With this patch
we no longer hang when trying to access the root nodes after an aborted
transaction and we no longer leak memory.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:14 -04:00
Josef Bacik 7b8b92af58 Btrfs: abort the transaction if the commit fails
If a transaction commit fails we don't abort it so we don't set an error on
the file system.  This patch fixes that by actually calling the abort stuff
and then adding a check for a fs error in the transaction start stuff to
make sure it is caught properly.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:13 -04:00
Josef Bacik d7096fc3ef Btrfs: wake up transaction waiters when aborting a transaction
I was getting lots of hung tasks and a NULL pointer dereference because we
are not cleaning up the transaction properly when it aborts.  First we need
to reset the running_transaction to NULL so we don't get a bad dereference
for any start_transaction callers after this.  Also we cannot rely on
waitqueue_active() since it's just a list_empty(), so just call wake_up()
directly since that will do the barrier for us and such.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:12 -04:00
Josef Bacik b939d1ab76 Btrfs: fix locking in btrfs_destroy_delayed_refs
The transaction abort stuff was throwing warnings from the list debugging
code because we do a list_del_init outside of the delayed_refs spin lock.
The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
but we need to take the ref head mutex to make sure it's not being processed
currently, and so if it is we need to drop the spin lock and then take and
drop the mutex and do the search again.  If we can take the mutex then we
can safely remove the head from the list and carry on.  Now when the
transaction aborts I don't get the list debugging warnings.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:11 -04:00
Josef Bacik beb42dd793 Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
While doing my enospc work I got a transaction abortion that resulted in a
panic when we tried to unlock_page() an already unlocked page.  This is
because we aren't calling extent_clear_unlock_delalloc with the locked page
so it was unlocking all the pages in the range.  This is wrong since
__extent_writepage expects to have the page locked still unless we return
*page_started as 1.  This should keep us from panicing.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-14 21:29:09 -04:00
J. Bruce Fields bc2df47a40 nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
Most frequent symptom was a BUG triggering in expire_client, with the
server locking up shortly thereafter.

Introduced by 508dc6e110 "nfsd41:
free_session/free_client must be called under the client_lock".

Cc: stable@kernel.org
Cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:54:08 -04:00
Stanislav Kinsbursky 12918b10d5 NFS: hard-code init_net for NFS callback transports
In case of destroying mount namespace on child reaper exit, nsproxy is zeroed
to the point already. So, dereferencing of it is invalid.
This patch hard-code "init_net" for all network namespace references for NFS
callback services. This will be fixed with proper NFS callback
containerization.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:53:43 -04:00
Jan Schmidt 3310c36eef Btrfs: fix race in tree mod log addition
When adding to the tree modification log, we grab two locks at different
stages. We must not drop the outer lock until we're done with section
protected by the inner lock. This moves the unlock call for the outer lock
to the appropriate position.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:52:39 +02:00
Jan Schmidt 3d7806eca4 Btrfs: add btrfs_next_old_leaf
To make sense of the tree mod log, the backref walker not only needs
btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was
calling btrfs_search_slot. This obviously didn't give the correct result.

This commit adds btrfs_next_old_leaf, a drop-in replacement for
btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly
like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot
with this time_seq parameter.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:52:09 +02:00
Jan Schmidt a95236d99f Btrfs: fix return value for __tree_mod_log_oldest_root
In __tree_mod_log_oldest_root() we must return the found operation even if
it's not a ROOT_REPLACE operation. Otherwise, the caller assumes that there
are no operations to be rewinded and returns immediately.

The code in the caller is modified to improve readability.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:22 +02:00
Jan Schmidt 8ba97a15e7 Btrfs: use btrfs_read_lock_root_node in get_old_root
get_old_root could race with root node updates because we weren't locking
the node early enough. Use btrfs_read_lock_root_node to grab the root locked
in the very beginning and release the lock as soon as possible (just like
btrfs_search_slot does).

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:21 +02:00
Jan Schmidt f617e2fd52 Btrfs: remove obsolete btrfs_next_leaf call from __resolve_indirect_ref
When resolving indirect refs, we used to call btrfs_next_leaf in case we
didn't find an exact match. While we should find exact matches most of the
time, in case we don't, we must continue searching. Treating those matches
differently depending on the level we're searching doesn't make sense.

Even worse, we might end up searching for a key larger than the largest, in
which case there is no next_leaf and subsequent jobs would fail. This commit
drops the bogous lines.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-14 18:44:20 +02:00
Anton Vorontsov 364ed2f465 pstore/inode: Make pstore_fill_super() static
There's no reason to extern it. The patch fixes the annoying sparse
warning:

CHECK   fs/pstore/inode.c
fs/pstore/inode.c:264:5: warning: symbol 'pstore_fill_super' was not
declared. Should it be static?

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:40 -07:00
Anton Vorontsov 93cce04968 pstore/ram: Should zap persistent zone on unlink
Otherwise, unlinked file will reappear on the next boot.

Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:40 -07:00
Anton Vorontsov fce3979304 pstore/ram_core: Factor persistent_ram_zap() out of post_init()
A handy function that we will use outside of ram_core soon. But
so far just factor it out and start using it in post_init().

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:40 -07:00
Anton Vorontsov 25b63da647 pstore/ram_core: Do not reset restored zone's position and size
Otherwise, the files will survive just one reboot, and on a subsequent
boot they will disappear.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:39 -07:00
Anton Vorontsov 201e4aca5a pstore/ram: Should update old dmesg buffer before reading
Without the update, we'll only see the new dmesg buffer after the
reboot, but previously we could see it right away. Making an oops
visible in pstore filesystem before reboot is a somewhat dubious
feature, but removing it wasn't an intentional change, so let's
restore it.

For this we have to make persistent_ram_save_old() safe for calling
multiple times, and also extern it.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-13 16:52:39 -07:00
Eric Dumazet 047fe36052 splice: fix racy pipe->buffers uses
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14dbb (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-13 21:16:42 +02:00
Suresh Jayaraman e73f843a32 cifs: fix parsing of password mount option
The double delimiter check that allows a comma in the password parsing code is
unconditional. We set "tmp_end" to the end of the string and we continue to
check for double delimiter. In the case where the password doesn't contain a
comma we end up setting tmp_end to NULL and eventually setting "options" to
"end". This results in the premature termination of the options string and hence
the values of UNCip and UNC are being set to NULL. This results in mount failure
with "Connecting to DFS root not implemented yet" error.

This error is usually not noticable as we have password as the last option in
the superblock mountdata. But when we call expand_dfs_referral() from
cifs_mount() and try to compose mount options for the submount, the resulting
mountdata will be of the form

   ",ver=1,user=foo,pass=bar,ip=x.x.x.x,unc=\\server\share"

and hence results in the above error. This bug has been seen with older NAS
servers running Samba 3.0.24.

Fix this by moving the double delimiter check inside the conditional loop.

Changes since -v1

   - removed the wrong strlen() micro optimization.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Cc: stable@vger.kernel.org [3.1+]
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-12 12:53:02 -05:00
Linus Torvalds 266ae4e615 fix unbalanced wb->list_lock in 3.5-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0tJwAAoJECvKgwp+S8JaSVAP/1FKw7u5E9QzOW8kPCj58nig
 FVF3BgaYgVrJglRjNOGTSfhJl3TVHGTzEMGICtsKUvXmgs7VmMudyWFuyokxfrZi
 z70DIbSuPDOMEt31MXiACq9z4E2IrZZ2kTYpRVd+zCV/2s4p789ejDLxOkk2ikpt
 M4TcIMPCAU2e4/ljRjNEQkg8d23ehdJsH9Hg8k56Jtb57e0LNpwnaZ0FzFKCzYh0
 Orm2dm9375vyzXZ2WNvgzl8ClJdEjkCyTq79i/39aQs4WeXsEP3nk+PFZYd+jrYA
 XwvV1sHJ43BOCCWwYS5dP0i9uGnRVlh/F2tyWWSSU2OizHzUXuOUWfXgY1JWyzaT
 uwEoCa8atYUOD7gp6ndnNR3uqBXtw0SihESst5fC7rKWuKOl62XToD6Pg6rPMex4
 ZuRaUw4Ts4efbX0dw3iDHMrS6Cf7a0JH52QOEYovrge4mDjiJzClHR69hsmVyoVb
 7s3j0q+mhur+NcnMkvC0C+pHn/m18q1i/yIBrD2K/UoLVGA17QMIZGoKfGnkeoaF
 a4fcoJ7fQE09tJ9KV1NLEdeD9PXQtETcNIAIZCfBCEJP9CPhHaUYkky+sOMcvKTr
 xEAe4f4WiHY7mTy8CRcDQjiOBigJe7Ksm+0qElauGcYw5Kqqrpdap1yPKbG4QWPL
 ABVYJyOvKSC3HLA6i7Gi
 =RBRw
 -----END PGP SIGNATURE-----

Merge tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback locking fix from Wu Fengguang:
 "fix unbalanced wb->list_lock in 3.5-rc1"

* tag 'writeback-lock-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix lock imbalance in writeback_sb_inodes()
2012-06-12 18:28:58 +03:00
Dan Carpenter e216c8c771 NFS: add an endian notation for sparse
This is supposed to be a __be32 value.  Sparse complains a lot:

fs/nfs/callback_xdr.c:699:30: warning: incorrect type in initializer (different base types)
fs/nfs/callback_xdr.c:699:30:    expected unsigned int [unsigned] status
fs/nfs/callback_xdr.c:699:30:    got restricted __be32 const [usertype] csr_status
fs/nfs/callback_xdr.c:715:9: warning: cast to restricted __be32
fs/nfs/callback_xdr.c:716:16: warning: incorrect type in return expression (different base types)
fs/nfs/callback_xdr.c:716:16:    expected restricted __be32
fs/nfs/callback_xdr.c:716:16:    got unsigned int [unsigned] status

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:40 -04:00
Dan Carpenter 0439f31c35 NFSv4.1: integer overflow in decode_cb_sequence_args()
This seems like it could overflow on 32 bits.  Use kmalloc_array() which
has overflow protection built in.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:36 -04:00
Randy Dunlap b84297197c exofs: fix sparse non-ANSI function warning
Fix sparse non-ANSI function warning:

  fs/exofs/sys.c:112:28: warning: non-ANSI function declaration of function 'exofs_sysfs_dbg_print'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-12 06:33:22 +03:00
Andy Adamson 2669940db8 NFSv4 do not send an empty SETATTR compound
Commit 536e43d12b ATTR_OPEN check can result in
an ia_valid with only ATTR_FILE set, and no NFS_VALID_ATTRS attributes to
request from the server.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:53 -04:00
Sachin Prabhu 64f9a83665 NFSv2: EOF incorrectly set on short read
In cases where the server returns fewer bytes then those requested, we
can incorrectly set the eof flag for the file. Fixing this allows the
request to be retried with updated offset and count arguments.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:00 -04:00
Bryan Schumaker c5afc8da5b NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
Older versions of nfs utils don't always pass a "vers=" mount option for
NFS.  This chould lead to attempts at using NFS v0 due to a zeroed out
nfs_parsed_mount_data struct.  I solve this by setting the default NFS
version to NFS_DEFAULT_VERSION in the v2 and v3 cases (v4 has already been
taken care of by a similar patch).

Reported-by: Joerg Roedel <joro@&bytes.org>
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:38:59 -04:00
Fred Isaman 906369e43c NFS: fix directio refcount bug on commit
This reverts a hunk from commit 0427708657
"NFS: Clean up - Simplify reference counting in fs/nfs/direct.c"

The cleanups in that patch affect the write path, but by the time
processing hits commit the removed reference has been added back by
nfs_scan_commit_list().  Without this reversion, any page that is
sent to commit holds on to an unbalanced reference that is never
freed.  The immediate effect is an imbalance over the wire between
OPENs and CLOSEs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:32:45 -04:00
Jan Kara ead188f9f9 writeback: Fix lock imbalance in writeback_sb_inodes()
Fix bug introduced by 169ebd90.  We have to have wb_list_lock locked when
restarting writeback loop after having waited for inode writeback.

Bug description by Ted Tso:

  I can reproduce this fairly easily by using ext4 w/o a journal, running
  under KVM with 1024megs memory, with fsstress (xfstests #13):

  [   45.153294] =====================================
  [   45.154784] [ BUG: bad unlock balance detected! ]
  [   45.155591] 3.5.0-rc1-00002-gb22b1f1 #124 Not tainted
  [   45.155591] -------------------------------------
  [   45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
  [   45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
  [   45.155591] but there are no more locks to release!

Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-06-09 08:32:15 +09:00
Linus Torvalds 77249539cd ext4 bug fixes for 3.5-rc2
This update contains two bug fixes, both destined for the stable tree.
 Perhaps the most important is one which fixes ext4 when used with file
 systems originally formatted for use with ext3, but then later
 converted to take advantage of ext4.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJP0jyAAAoJENNvdpvBGATwGSIP/R0+4j/SqSncjLIEx6EvrCeY
 eyR9JYb0F44P1bKlZ6WErrupysmhPqHubnbxMg9bVMTxefN/m6tEyG+EG4MwMxFP
 mRtc5EVT0i/9k7U6s4Ey0kSnBnZqx6IBdFUR5lHrvAf7Ud920eY12Cx4oscI7Fj6
 E81lWASOkDTjzsJpzFIqcW7b257Ne1oxPAu1p290W5riSkDzle7qqT5xT+lBAbr0
 YM8ZFhznUdyv2aeuchf/dfys5YdVgSporplIjvP69bRyNr4x5B6QOUGxuX3RNhV3
 Hqf83GW8HycyAEdl4AtakNtxiMPhrIy0S/RamWqBTE2+nws3Wnkd1wN4c5HjZZad
 w5u2E/uSxedZfgGDZZriRwHCgR+2I2CEtMy45gL96o+2Cel6unN+vjVPTVbMpMUy
 3PsPoPjPVmNPLyn4vIfHlEQ5ZLhBzVpXFEsca9s+6JeHQ4dKP77fa8iq5AjzCjzA
 FYvmf1fNbDW1da81fRUOSnwuaC0JuUvOu5eQLLQ3jzz5gj+DvZvQqvleMHTXkQ3X
 hLpSXNTtXUBYeXIw5aJzh5VPJqqGzPkpJhAEbMRgCobRA1HtuAEJzZf4J7+jiWjV
 IrQpw0HIeZiK1oZW8iC8YYbIz8AakU+MnIVhvcPc3rIr5eeEtWPM/9ETx/mrO9s2
 /y0bwu84eQLIctFjzR2s
 =9oyU
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bug fixes from Theodore Ts'o:
 "This update contains two bug fixes, both destined for the stable tree.
  Perhaps the most important is one which fixes ext4 when used with file
  systems originally formatted for use with ext3, but then later
  converted to take advantage of ext4."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: don't set i_flags in EXT4_IOC_SETFLAGS
  ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
2012-06-08 11:15:31 -07:00
Linus Torvalds e72643088f Fix UBI and UBIFS - they refuse to work without debugfs. This was
broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
 Kconfig switches.
 
 Also, correct locking in 'ubi_wl_flush()' - it was extended to support
 flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0jAHAAoJECmIfjd9wqK0qN8P/1JRuky3xsX8o5S5tqVLKkzO
 4naM8bH1AuUemg8f+K+2IUyHp1Mp0lsBj7zA46+3xWXvRcZBJaWgqZFkLOuWSZn0
 ChRGKKhdwRkXokq7JHgUJCrdTLyXsBV+uuBbJ+oTPQa2+7f1mxgZuLc+MJzpoKAs
 sNdE3A2pH3hhXDJKuT6hrM2wqD/2rW5bGurW4sYBxVUJEZ793HeXu30lTffZuWHo
 ClzDtC0iWIS5NmdN0sbQL6Os3p+uiR5mjJek0eABTHAmgXvfBUWpf9ewTXYS2ulK
 zY8brtIy2N8qmXFa8ZyguZKpCdulDqRLPz/2bmX5rOJSsyogoNxHI+UlZFCfq0u/
 txA+Wm4dQZtvSM3Stws6gxuIDCX83wOgHq/gUitMsVkxo+McwevjXpZb4Ha80OVl
 Iw7ERGrbxZn+B+lTh+9jgD6kstqpefO7fBP5gkLRvHToxDKY+6sdH6b3UuE6pDJI
 SA2D1uDyk33A/dyvYZThcvsq7KMdF0mhfLuxkScSfP17hc0hnh/FSzkj3fHrADfy
 oA4Oo/jKFbOFnmByNHjzvOaqeYHVsKwzq+feDcZ7sYA5GNrVMVViLZQljQBdJWo5
 E9howusrb81AGvjCIFobntBLE8DZF5OGfgt/n+60+jo3/qern/4jeuWfGL/Q9Wcw
 x27oVmOBLjuHuG3hXlGt
 =X7QS
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Artem Bityutskiy:
 "Fix UBI and UBIFS - they refuse to work without debugfs.  This was
  broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
  Kconfig switches.

  Also, correct locking in 'ubi_wl_flush()' - it was extended to support
  flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal."

* tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs:
  UBI: correct ubi_wl_flush locking
  UBIFS: fix debugfs-less systems support
  UBI: fix debugfs-less systems support
2012-06-08 11:04:06 -07:00
Linus Torvalds 32ba9c3fca Revert "vfs: stop d_splice_alias creating directory aliases"
This reverts commit 7732a557b1 (and commit
3f50fff4da, which was a follow-up
cleanup).

We're chasing an elusive bug that Dave Jones can apparently reproduce
using his system call fuzzer tool, and that looks like some kind of
locking ordering problem on the directory i_mutex chain.  Our i_mutex
locking is rather complex, and depends on the topological ordering of
the directories, which is why we have been very wary of splicing
directory entries around.

Of course, we really don't want to ever see aliased unconnected
directories anyway, so none of this should ever happen, but this revert
aims to basically get us back to a known older state.

Bruce points to some of the previous discussion at

       http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk>

and in particular a long post from Neil:

       http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown>

It should be noted that it's possible that Dave's problems come from
other changes altohgether, including possibly just the fact that Dave
constantly is teachning his fuzzer new tricks.  So what appears to be a
new bug could in fact be an old one that just gets newly triggered, but
reverting these patches as "still under heavy discussion" is the right
thing regardless.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-08 10:34:03 -07:00
Trond Myklebust 2d0dbc6ae8 NFSv4: Fix unnecessary delegation returns in nfs4_do_open
While nfs4_do_open() expects the fmode argument to be restricted to
combinations of FMODE_READ and FMODE_WRITE, both nfs4_atomic_open()
and nfs4_proc_create will pass the nfs_open_context->mode,
which contains the full fmode_t.

This patch ensures that nfs4_do_open strips the other fmode_t bits,
fixing a problem in which the nfs4_do_open call would result in an
unnecessary delegation return.

Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-06-08 11:08:42 -04:00
Linus Torvalds 48d212a2ee Revert "mm: correctly synchronize rss-counters at exit/exec"
This reverts commit 40af1bbdca.

It's horribly and utterly broken for at least the following reasons:

 - calling sync_mm_rss() from mmput() is fundamentally wrong, because
   there's absolutely no reason to believe that the task that does the
   mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
   you name it.

 - calling it *after* having done mmdrop() on it is doubly insane, since
   the mm struct may well be gone now.

 - testing mm against NULL before you call it is insane too, since a
NULL mm there would have caused oopses long before.

.. and those are just the three bugs I found before I decided to give up
looking for me and revert it asap.  I should have caught it before I
even took it, but I trusted Andrew too much.

Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07 17:54:07 -07:00
Tao Ma b22b1f178f ext4: don't set i_flags in EXT4_IOC_SETFLAGS
Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to
change the i_flags automatically but fails to remove the error setting
of i_flags.  So we still have the problem of trashing state flags.
Fix this by removing the assignment.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-06-07 19:04:19 -04:00
Theodore Ts'o b0dd6b70f0 ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
Ext3 filesystems that are converted to use as many ext4 file system
features as possible will enable uninit_bg to speed up e2fsck times.
These file systems will have a native ext3 layout of inode tables and
block allocation bitmaps (as opposed to ext4's flex_bg layout).
Unfortunately, in these cases, when first allocating a block in an
uninitialized block group, ext4 would incorrectly calculate the number
of free blocks in that block group, and then errorneously report that
the file system was corrupt:

EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd

This problem can be reproduced via:

    mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
    mount -t ext4 /dev/vdd /mnt
    fallocate -l 4600m /mnt/test

The problem was caused by a bone headed mistake in the check to see if a
particular metadata block was part of the block group.

Many thanks to Kees Cook for finding and bisecting the buggy commit
which introduced this bug (commit fd034a84e1, present since v3.2).

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: stable@kernel.org
2012-06-07 18:56:06 -04:00
Konstantin Khlebnikov 40af1bbdca mm: correctly synchronize rss-counters at exit/exec
mm->rss_stat counters have per-task delta: task->rss_stat.  Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats,
audit and other stuff.  Unfortunately the kernel does this before
calling mm_release(), which can call put_user() for processing
task->clear_child_tid.  So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should
be no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>		[3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-07 14:43:55 -07:00
Trond Myklebust 02c67525cf NFSv4.1: Convert another trivial printk into a dprintk
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:45:53 -04:00
Fred Isaman fb47ddc9d5 NFS4: Fix open bug when pnfs module blacklisted
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:44:24 -04:00
Trond Myklebust f07936f2c4 NFS: Remove incorrect BUG_ON in nfs_found_client
It is perfectly valid for nfs_get_client() to return a nfs_client that
is in the process of setting up the NFSv4.1 session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 10:21:03 -04:00
Artem Bityutskiy 818039c7d5 UBIFS: fix debugfs-less systems support
Commit "f70b7e5 UBIFS: remove Kconfig debugging option" broke UBIFS and it
refuses to initialize if debugfs (CONFIG_DEBUG_FS) is disabled. I incorrectly
assumed that debugfs files creation function will return success if debugfs
is disabled, but they actually return -ENODEV. This patch fixes the issue.

Reported-by: Paul Parsons <lost.distance@yahoo.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Paul Parsons <lost.distance@yahoo.com>
2012-06-07 10:43:54 +03:00
Steve Dickson f25efd851c NFS: Map minor mismatch error to protocol not support error.
Sservers that only have NFSv4.1 support the
NFS4ERR_MINOR_VERS_MISMATCH error is return on
v4.0 mounts. Mapping that error to EPROTONOSUPPORT
will cause the mount to back off to v3 instead of
failing.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-06 14:32:40 -04:00
Trond Myklebust 9bce008bae NFS: Fix a commit bug
The new commit code fails to copy the verifier into the wb_verf field
of _all_ the nfs_page structures; it only copies it into the first entry.
The consequence is that most requests end up failing to match in
nfs_commit_release.

Fix is to copy the verifier into the req->wb_verf field in
nfs_write_completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-05 18:38:47 -04:00
Bryan Schumaker cdf66442fa NFS4: Set parsed mount data version to 4
This patch only affects mounting through "-t nfs4" since it doesn't set
up an nfs version to use in the mount data.  The nfs client was trying
to mount using NFS v0, causing either a BUG() or a protocol not
supported message.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 15:50:47 -04:00
Linus Torvalds f9ba7179ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix blksize calculation
  fuse: fix stat call on 32 bit platforms
  fuse: optimize fallocate on permanent failure
  fuse: add FALLOCATE operation
  fuse: Convert to kstrtoul_from_user
2012-06-05 10:11:11 -07:00
Trond Myklebust bda197f5d7 NFSv4.1: Ensure we clear session state flags after a session creation
Both nfs4_reset_session and nfs41_init_clientid need to clear all the
session related state flags on success.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 10:22:14 -04:00
Trond Myklebust 08106ac7c8 NFSv4.1: Convert a trivial printk into a dprintk
There is no need to bug the user about the server returning an error
on destroy_session. The error will be handled by the state manager,
without any need for further input from anyone else.
So convert that printk into a debugging dprintk.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 10:08:24 -04:00
Trond Myklebust 029c534737 NFSv4: Fix up decode_attr_mdsthreshold
Fix an incorrect use of 'likely()'. The FATTR4_WORD2_MDSTHRESHOLD
bit is only expected in NFSv4.1 OPEN calls, and so is actually
rather _unlikely_.

decode_attr_mdsthreshold needs to clear FATTR4_WORD2_MDSTHRESHOLD
from the attribute bitmap after it has decoded the data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-06-05 10:00:47 -04:00
Trond Myklebust 1549210fcc NFSv4: Fix an Oops in the open recovery code
The open recovery code does not need to request a new value for the
mdsthreshold, and so does not allocate a struct nfs4_threshold.
The problem is that encode_getfattr_open() will still request an
mdsthreshold, and so we end up Oopsing in decode_attr_mdsthreshold.

This patch fixes encode_getfattr_open so that it doesn't request an
mdsthreshold when the caller isn't asking for one. It also fixes
decode_attr_mdsthreshold so that it errors if the server returns
an mdsthreshold that we didn't ask for (instead of Oopsing).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-06-05 10:00:14 -04:00
Linus Torvalds bf2785a818 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Move get_next_mid to ops struct
  CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe
  CIFS: Improve identation in cifs_unlock_range
  CIFS: Fix possible wrong memory allocation
2012-06-04 15:00:58 -07:00
Linus Torvalds 0640113be2 vfs: Fix /proc/<tid>/fdinfo/<fd> file handling
Cyrill Gorcunov reports that I broke the fdinfo files with commit
30a08bf2d3 ("proc: move fd symlink i_mode calculations into
tid_fd_revalidate()"), and he's quite right.

The tid_fd_revalidate() function is not just used for the <tid>/fd
symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the
permission model for those are different.

So do the dynamic symlink permission handling just for symlinks, making
the fdinfo files once more appear as the proper regular files they are.

Of course, Al Viro argued (probably correctly) that we shouldn't do the
symlink permission games at all, and make the symlinks always just be
the normal 'lrwxrwxrwx'.  That would have avoided this issue too, but
since somebody noticed that the permissions had changed (which was the
reason for that original commit 30a08bf2d3 in the first place), people
do apparently use this feature.

[ Basically, you can use the symlink permission data as a cheap "fdinfo"
  replacement, since you see whether the file is open for reading and/or
  writing by just looking at st_mode of the symlink.  So the feature
  does make sense, even if the pain it has caused means we probably
  shouldn't have done it to begin with. ]

Reported-and-tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-04 11:00:45 -07:00
Jan Schmidt 4d5a0565ce Btrfs: remove call to btrfs_header_nritems with no effect
This is a leftover from cleanup patch 559af821. Before the cleanup,
btrfs_header_nritems was called inside an if condition. As it has no side
effects we need to preserve here, it should simply be dropped.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-04 14:35:29 +02:00
Linus Torvalds 829f51dbd8 Merge branch 'akpm' (Fixups for Andrew's patchbomb)
Merge fixups for the mac NLS tables from Andrew.

* emailed from Andrew Morton, and one cleanup by me:
  nls: fix (and rename) mac NLS table files and config options
  fs/nls/Makefile: remove bogus CONFIG_ assignments
2012-06-01 19:56:23 -07:00
Linus Torvalds 8b8c0daa2c nls: fix (and rename) mac NLS table files and config options
The config options in the Kconfig file (with _CODEPAGE_ in the name)
didn't match the config option name in the Makefile (no _CODEPAGE_).

And both of them were of the hard-to-read MACXYZZY variety, which made
them hard to parse for normal humans: MACROMAN easily reads as "macro
man", not as "Mac Roman".

So rename the options to be consistent, and be NLS_MAC_xyzzy.  Rename
the files to be mac-xyzzy.c too, and drop the "nls" part entirely (it's
already in the directory name).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 19:51:22 -07:00
Andrew Morton 92a8956364 fs/nls/Makefile: remove bogus CONFIG_ assignments
These were debug things which snuck through.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Vladimir Serbinenko <phcoder@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-01 19:47:26 -07:00
Linus Torvalds f5e7e844a5 - More robust parsing especially of xattr data in JFFS2
- Updates to mxc_nand and gpmi drivers to support new boards and device tree
  - Improve consistency of information about ECC strength in NAND devices
  - Clean up partition handling of plat_nand
  - Support NAND drivers without dedicated access to OOB area
  - BCH hardware ECC support for OMAP
  - Other fixes and cleanups, and a few new device IDs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEABECAAYFAk/JG1wACgkQdwG7hYl686M80wCglN4kutx20j+KJWuZofkr9Hog
 weEAoI4jrqEWEdW9EcT2CIWQw7eG+1v+
 =7tdo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd

Pull mtd update from David Woodhouse:
 - More robust parsing especially of xattr data in JFFS2
 - Updates to mxc_nand and gpmi drivers to support new boards and device tree
 - Improve consistency of information about ECC strength in NAND devices
 - Clean up partition handling of plat_nand
 - Support NAND drivers without dedicated access to OOB area
 - BCH hardware ECC support for OMAP
 - Other fixes and cleanups, and a few new device IDs

Fixed trivial conflict in drivers/mtd/nand/gpmi-nand/gpmi-nand.c due to
added include files next to each other.

* tag 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd: (75 commits)
  mtd: mxc_nand: move ecc strengh setup before nand_scan_tail
  mtd: block2mtd: fix recursive call of mtd_writev
  mtd: gpmi-nand: define ecc.strength
  mtd: of_parts: fix breakage in Kconfig
  mtd: nand: fix scan_read_raw_oob
  mtd: docg3 fix in-middle of blocks reads
  mtd: cfi_cmdset_0002: Slight cleanup of fixup messages
  mtd: add fixup for S29NS512P NOR flash.
  jffs2: allow to complete xattr integrity check on first GC scan
  jffs2: allow to discriminate between recoverable and non-recoverable errors
  mtd: nand: omap: add support for hardware BCH ecc
  ARM: OMAP3: gpmc: add BCH ecc api and modes
  mtd: nand: check the return code of 'read_oob/read_oob_raw'
  mtd: nand: remove 'sndcmd' parameter of 'read_oob/read_oob_raw'
  mtd: m25p80: Add support for Winbond W25Q80BW
  jffs2: get rid of jffs2_sync_super
  jffs2: remove unnecessary GC pass on sync
  jffs2: remove unnecessary GC pass on umount
  jffs2: remove lock_super
  mtd: gpmi: add gpmi support for mx6q
  ...
2012-06-01 16:55:42 -07:00
Linus Torvalds 86c47b70f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull third pile of signal handling patches from Al Viro:
 "This time it's mostly helpers and conversions to them; there's a lot
  of stuff remaining in the tree, but that'll either go in -rc2
  (isolated bug fixes, ideally via arch maintainers' trees) or will sit
  there until the next cycle."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  x86: get rid of calling do_notify_resume() when returning to kernel mode
  blackfin: check __get_user() return value
  whack-a-mole with TIF_FREEZE
  FRV: Optimise the system call exit path in entry.S [ver #2]
  FRV: Shrink TIF_WORK_MASK [ver #2]
  FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions
  new helper: signal_delivered()
  powerpc: get rid of restore_sigmask()
  most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set
  set_restore_sigmask() is never called without SIGPENDING (and never should be)
  TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set
  don't call try_to_freeze() from do_signal()
  pull clearing RESTORE_SIGMASK into block_sigmask()
  sh64: failure to build sigframe != signal without handler
  openrisc: tracehook_signal_handler() is supposed to be called on success
  new helper: sigmask_to_save()
  new helper: restore_saved_sigmask()
  new helpers: {clear,test,test_and_clear}_restore_sigmask()
  HAVE_RESTORE_SIGMASK is defined on all architectures now
2012-06-01 11:53:44 -07:00
Pavel Shilovsky 8825736060 CIFS: Move get_next_mid to ops struct
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:19 -05:00
Pavel Shilovsky 7f0adb53bc CIFS: Make accessing is_valid_oplock/dump_detail ops struct field safe
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:16 -05:00
Pavel Shilovsky ea319d57d3 CIFS: Improve identation in cifs_unlock_range
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:12 -05:00
Pavel Shilovsky 0013fb4ca3 CIFS: Fix possible wrong memory allocation
when cifs_reconnect sets maxBuf to 0 and we try to calculate a size
of memory we need to store locks.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-06-01 12:35:08 -05:00
Linus Torvalds 1193755ac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs changes from Al Viro.
 "A lot of misc stuff.  The obvious groups:
   * Miklos' atomic_open series; kills the damn abuse of
     ->d_revalidate() by NFS, which was the major stumbling block for
     all work in that area.
   * ripping security_file_mmap() and dealing with deadlocks in the
     area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
     general.
   * ->encode_fh() switched to saner API; insane fake dentry in
     mm/cleancache.c gone.
   * assorted annotations in fs (endianness, __user)
   * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
   * ->update_time() work from Josef.
   * other bits and pieces all over the place.

  Normally it would've been in two or three pull requests, but
  signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
  nfs: don't open in ->d_revalidate
  vfs: retry last component if opening stale dentry
  vfs: nameidata_to_filp(): don't throw away file on error
  vfs: nameidata_to_filp(): inline __dentry_open()
  vfs: do_dentry_open(): don't put filp
  vfs: split __dentry_open()
  vfs: do_last() common post lookup
  vfs: do_last(): add audit_inode before open
  vfs: do_last(): only return EISDIR for O_CREAT
  vfs: do_last(): check LOOKUP_DIRECTORY
  vfs: do_last(): make ENOENT exit RCU safe
  vfs: make follow_link check RCU safe
  vfs: do_last(): use inode variable
  vfs: do_last(): inline walk_component()
  vfs: do_last(): make exit RCU safe
  vfs: split do_lookup()
  Btrfs: move over to use ->update_time
  fs: introduce inode operation ->update_time
  reiserfs: get rid of resierfs_sync_super
  reiserfs: mark the superblock as dirty a bit later
  ...
2012-06-01 10:34:35 -07:00
Linus Torvalds 4edebed866 Ext4 updates for 3.5
The major new feature added in this update is Darrick J. Wong's
 metadata checksum feature, which adds crc32 checksums to ext4's
 metadata fields.  There is also the usual set of cleanups and bug
 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPyNleAAoJENNvdpvBGATwtLMP/i3WsPyTvxmYP6HttHXQb8Jk
 GYCoTQ5bZMuTbOwOGg3w137cXWBv5uuPpxIk79YVLHSWx6HuanlGIa7/VnPKIaLu
 2ihuvVfnrDqpwQ4MJaSq4R1Eka9JCwZ7HbYYo+fYOVobxgw588JVV9VVI9EdKRGz
 z11UkW8iHE0f6Xa5gOhdAMkR0uaPnxwJX/qHZYiHuognRivuwMglqWJSiMr8nQmo
 A2GmeoLehhW+k65IqgTCmSW6ZgFTvZdk6bskQIij3fOYHW3hHn/gcLFtmLTIZ/B5
 LZdg/lngPYve+R/UyypliGKi+pv1qNEiTiBm0rrBgsdZFkBdGj0soSvGZzeK+Mp4
 Q1vAmOBPYPFzs6nVzPst2n/osryyykFCK6TgSGZ50dosJ0NO8cBeDdX/gh9JKD2R
 yQUMUltOCCSj/eWU4iwqZ0T3FXRiH/+S3XMHznoKJiwUyGDBNQy4+Yg2k2WzUXrz
 Cu5t5BwNG2WNP7y5Et/wmUIzpC7VPId4qYmGyHe7OwTxSJgW+6f7GVkHfjWcDMuv
 pGgEUiInbMmLajP3v2/LKfVU4hXLZy4uJbhoBgDdeIpZrnPifJG/MwDOS4W+dLVT
 tDzgO1SAh3/E4jATreZ5bjzD/HGsfe1OX09UH3Pbc1EcgkrLnyrQXFwdHshdVu4A
 cxMoKNPVCQJySb1UrLkO
 =SdJJ
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull Ext4 updates from Theodore Ts'o:
 "The major new feature added in this update is Darrick J Wong's
  metadata checksum feature, which adds crc32 checksums to ext4's
  metadata fields.

  There is also the usual set of cleanups and bug fixes."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
  ext4: hole-punch use truncate_pagecache_range
  jbd2: use kmem_cache_zalloc wrapper instead of flag
  ext4: remove mb_groups before tearing down the buddy_cache
  ext4: add ext4_mb_unload_buddy in the error path
  ext4: don't trash state flags in EXT4_IOC_SETFLAGS
  ext4: let getattr report the right blocks in delalloc+bigalloc
  ext4: add missing save_error_info() to ext4_error()
  ext4: add debugging trigger for ext4_error()
  ext4: protect group inode free counting with group lock
  ext4: use consistent ssize_t type in ext4_file_write()
  ext4: fix format flag in ext4_ext_binsearch_idx()
  ext4: cleanup in ext4_discard_allocated_blocks()
  ext4: return ENOMEM when mounts fail due to lack of memory
  ext4: remove redundundant "(char *) bh->b_data" casts
  ext4: disallow hard-linked directory in ext4_lookup
  ext4: fix potential integer overflow in alloc_flex_gd()
  ext4: remove needs_recovery in ext4_mb_init()
  ext4: force ro mount if ext4_setup_super() fails
  ext4: fix potential NULL dereference in ext4_free_inodes_counts()
  ext4/jbd2: add metadata checksumming to the list of supported features
  ...
2012-06-01 10:12:15 -07:00
Al Viro 754421c8ca HAVE_RESTORE_SIGMASK is defined on all architectures now
Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
and picks default set_restore_sigmask() in linux/thread_info.h.  Kill the
ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
get merged.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:46 -04:00
Miklos Szeredi 0ef97dcfce nfs: don't open in ->d_revalidate
NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
mount needs to be followed or not.  It does check d_mountpoint() on the dentry,
which can result in a weird error if the VFS found that the mount does not in
fact need to be followed, e.g.:

  # mount --bind /mnt/nfs /mnt/nfs-clone
  # echo something > /mnt/nfs/tmp/bar
  # echo x > /tmp/file
  # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
  # cat  /mnt/nfs/tmp/bar
  cat: /mnt/nfs/tmp/bar: Not a directory

Which should, by any sane filesystem, result in "something" being printed.

So instead do the open in f_op->open() and in the unlikely case that the cached
dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let
the VFS retry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:02 -04:00
Miklos Szeredi 16b1c1cd71 vfs: retry last component if opening stale dentry
NFS optimizes away d_revalidates for last component of open.  This means that
open itself can find the dentry stale.

This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.

If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost.  In this case fall back to
non-RCU lookup.  Currently this is not used since NFS will always leave RCU
mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:01 -04:00
Miklos Szeredi 50ee93afca vfs: nameidata_to_filp(): don't throw away file on error
If open fails, don't put the file.  This allows it to be reused if open needs to
be retried.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:01 -04:00
Miklos Szeredi 91daee988d vfs: nameidata_to_filp(): inline __dentry_open()
Copy __dentry_open() into nameidata_to_filp().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:01 -04:00
Miklos Szeredi 78f71eff3c vfs: do_dentry_open(): don't put filp
Move put_filp() out to __dentry_open(), the only caller now.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:00 -04:00
Miklos Szeredi 90ad1a8ecb vfs: split __dentry_open()
Split __dentry_open() into two functions:

  do_dentry_open() - does most of the actual work, doesn't put file on failure
  open_check_o_direct() - after a successful open, checks direct_IO method

This will allow i_op->atomic_open to do just the file initialization and leave
the direct_IO checking to the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:00 -04:00
Miklos Szeredi 5f5daac12a vfs: do_last() common post lookup
Now the post lookup code can be shared between O_CREAT and plain opens since
they are essentially the same.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:00 -04:00
Miklos Szeredi d7fdd7f6e1 vfs: do_last(): add audit_inode before open
This allows this code to be shared between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:59 -04:00
Miklos Szeredi 050ac841ea vfs: do_last(): only return EISDIR for O_CREAT
This allows this code to be shared between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:59 -04:00
Miklos Szeredi af2f55426d vfs: do_last(): check LOOKUP_DIRECTORY
Check for ENOTDIR before finishing open.  This allows this code to be shared
between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:58 -04:00
Miklos Szeredi 54c33e7f95 vfs: do_last(): make ENOENT exit RCU safe
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:58 -04:00
Miklos Szeredi d45ea86792 vfs: make follow_link check RCU safe
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:58 -04:00
Miklos Szeredi decf340087 vfs: do_last(): use inode variable
Use helper variable instead of path->dentry->d_inode before complete_walk().
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:57 -04:00
Miklos Szeredi a1eb331530 vfs: do_last(): inline walk_component()
Copy walk_component() into do_lookup().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:57 -04:00
Miklos Szeredi e276ae672f vfs: do_last(): make exit RCU safe
Allow returning from do_last() with LOOKUP_RCU still set on the "out:" and
"exit:" labels.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:57 -04:00
Miklos Szeredi 697f514df1 vfs: split do_lookup()
Split do_lookup() into two functions:

  lookup_fast() - does cached lookup without i_mutex
  lookup_slow() - does lookup with i_mutex

Both follow managed dentries.

The new functions are needed by atomic_open.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:11:56 -04:00
Josef Bacik e41f941a23 Btrfs: move over to use ->update_time
Btrfs had been doing it's own file_update_time so we could catch ENOSPC
properly, so just update our btrfs_update_time to work with the new stuff and
then we'll be fancy later.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-01 12:07:52 -04:00
Josef Bacik c3b2da3148 fs: introduce inode operation ->update_time
Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-01 12:07:25 -04:00
Linus Torvalds 51eab603f5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This includes a fairly large change from Josef around data writeback
  completion.  Before, the writeback wasn't completed until the metadata
  insertions for the extent were done, and this made for fairly large
  latency spikes on the last page of each ordered extent.

  We already had a separate mechanism for tracking pending metadata
  insertions, so Josef just needed to tweak things a little to end
  writeback earlier on the page.  Overall it makes us much friendly to
  memory reclaim and lowers latencies quite a lot for synchronous IO.

  Jan Schmidt has finished some background work required to track btree
  blocks as they go through changes in ownership.  It's the missing
  piece he needed for both btrfs send/receive and subvolume quotas.
  Neither of those are ready yet, but the new tracking code is included
  here.  Most of the time, the new code is off.  It is only used by
  scrub and other backref walkers.

  Stefan Behrens has added io failure tracking.  This includes counters
  for which drives are causing the most trouble so the admin (or an
  automated tool) can choose to kick them out.  We're tracking IO
  errors, crc errors, and generation checks we do on each metadata
  block.

  RAID5/6 did miss the cut this time because I'm having trouble with
  corruptions.  I'll nail it down next week and post as a beta testing
  before 3.6"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (58 commits)
  Btrfs: fix tree mod log rewinded level and rewinding of moved keys
  Btrfs: fix tree mod log del_ptr
  Btrfs: add tree_mod_dont_log helper
  Btrfs: add missing spin_lock for insertion into tree mod log
  Btrfs: add inodes before dropping the extent lock in find_all_leafs
  Btrfs: use delayed ref sequence numbers for all fs-tree updates
  Btrfs: fix false positive in check-integrity on unmount
  Btrfs: fix runtime warning in check-integrity check data mode
  Btrfs: set ioprio of scrub readahead to idle
  Btrfs: fix return code in drop_objectid_items
  Btrfs: check to see if the inode is in the log before fsyncing
  Btrfs: return value of btrfs_read_buffer is checked correctly
  Btrfs: read device stats on mount, write modified ones during commit
  Btrfs: add ioctl to get and reset the device stats
  Btrfs: add device counters for detected IO and checksum errors
  btrfs: Drop unused function btrfs_abort_devices()
  Btrfs: fix the same inode id problem when doing auto defragment
  Btrfs: fall back to non-inline if we don't have enough space
  Btrfs: fix how we deal with the orphan block rsv
  Btrfs: convert the inode bit field to use the actual bit operations
  ...
2012-06-01 08:37:31 -07:00
Linus Torvalds 419f431949 Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull the rest of the nfsd commits from Bruce Fields:
 "... and then I cherry-picked the remainder of the patches from the
  head of my previous branch"

This is the rest of the original nfsd branch, rebased without the
delegation stuff that I thought really needed to be redone.

I don't like rebasing things like this in general, but in this situation
this was the lesser of two evils.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits)
  nfsd4: fix, consolidate client_has_state
  nfsd4: don't remove rebooted client record until confirmation
  nfsd4: remove some dprintk's and a comment
  nfsd4: return "real" sequence id in confirmed case
  nfsd4: fix exchange_id to return confirm flag
  nfsd4: clarify that renewing expired client is a bug
  nfsd4: simpler ordering of setclientid_confirm checks
  nfsd4: setclientid: remove pointless assignment
  nfsd4: fix error return in non-matching-creds case
  nfsd4: fix setclientid_confirm same_cred check
  nfsd4: merge 3 setclientid cases to 2
  nfsd4: pull out common code from setclientid cases
  nfsd4: merge last two setclientid cases
  nfsd4: setclientid/confirm comment cleanup
  nfsd4: setclientid remove unnecessary terms from a logical expression
  nfsd4: move rq_flavor into svc_cred
  nfsd4: stricter cred comparison for setclientid/exchange_id
  nfsd4: move principal name into svc_cred
  nfsd4: allow removing clients not holding state
  nfsd4: rearrange exchange_id logic to simplify
  ...
2012-06-01 08:32:58 -07:00
Artem Bityutskiy 033369d1af reiserfs: get rid of resierfs_sync_super
This patch stops reiserfs using the VFS 'write_super()' method along with the
s_dirt flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblock using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy 5c5fd81962 reiserfs: mark the superblock as dirty a bit later
The 'journal_mark_dirty()' function currently first marks the superblock as
dirty by setting 's_dirt' to 1, then does various sanity checks and returns,
then actuall does all the magic with the journal.

This is not an ideal order, though. It makes more sense to first do all the
checks, then do all the internal stuff, and at the end notify the VFS that the
superblock is now dirty.

This patch moves the 's_dirt = 1' assignment from the very beginning of this
function to the very end.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy 717f03c4d7 reiserfs: remove useless superblock dirtying
The 'reiserfs_resize()' function marks the superblock as dirty by assigning 1
to 's_dirt' and then calls 'journal_mark_dirty()' which does the same. Thus,
we can remove the assignment from 'reiserfs_resize()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy 25729b0e94 reiserfs: clean-up function return type
Turn 'reiserfs_flush_old_commits()' into a void function because the callers
do not cares about what it returns anyway.

We are going to remove the 'sb->s_dirt' field completely and this patch is a
small step towards this direction.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Artem Bityutskiy efaa33eb13 reiserfs: cleanup reiserfs_fill_super a bit
We have the reiserfs superblock pointer in the 'sbi' variable in this
function, no need to use the 'REISERFS_SB(s)' macro which is the same.
This is jut a small clean-up.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:36 -04:00
Al Viro e3fc629d7b switch aio and shm to do_mmap_pgoff(), make do_mmap() static
after all, 0 bytes and 0 pages is the same thing...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 10:37:17 -04:00
Hugh Dickins 5e44f8c374 ext4: hole-punch use truncate_pagecache_range
When truncating a file, we unmap pages from userspace first, as that's
usually more efficient than relying, page by page, on the fallback in
truncate_inode_page() - particularly if the file is mapped many times.

Do the same when punching a hole: 3.4 added truncate_pagecache_range()
to do the unmap and trunc, so use it in ext4_ext_punch_hole(), instead
of calling truncate_inode_pages_range() directly.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-01 00:15:28 -04:00
Wanlong Gao b2f4edb335 jbd2: use kmem_cache_zalloc wrapper instead of flag
Use kmem_cache_zalloc wrapper instead of flag __GFP_ZERO.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-01 00:10:32 -04:00
Salman Qazi 95599968d1 ext4: remove mb_groups before tearing down the buddy_cache
We can't have references held on pages in the s_buddy_cache while we are
trying to truncate its pages and put the inode.  All the pages must be
gone before we reach clear_inode.  This can only be gauranteed if we
can prevent new users from grabbing references to s_buddy_cache's pages.

The original bug can be reproduced and the bug fix can be verified by:

while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \
	umount /export/hda3/ram0; done &

while true; do cat /proc/fs/ext4/ram0/mb_groups; done

Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-31 23:52:14 -04:00
Salman Qazi 02b7831019 ext4: add ext4_mb_unload_buddy in the error path
ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching
ext4_mb_unload_buddy when it fails a memory allocation.

Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-31 23:51:27 -04:00
Theodore Ts'o 79906964a1 ext4: don't trash state flags in EXT4_IOC_SETFLAGS
In commit 353eb83c we removed i_state_flags with 64-bit longs, But
when handling the EXT4_IOC_SETFLAGS ioctl, we replace i_flags
directly, which trashes the state flags which are stored in the high
32-bits of i_flags on 64-bit platforms.  So use the the
ext4_{set,clear}_inode_flags() functions which use atomic bit
manipulation functions instead.

Reported-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-31 23:46:01 -04:00
Tao Ma 9660755100 ext4: let getattr report the right blocks in delalloc+bigalloc
In delayed allocation, i_reserved_data_blocks now indicates
clusters, not blocks. So report it in the right number.

This can be easily exposed by the following command:
echo foo > blah; du -hc blah; sync; du -hc blah

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-31 22:54:16 -04:00
Linus Torvalds a00b6151a2 Merge branch 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields.

* 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux: (23 commits)
  nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek
  SUNRPC: split upcall function to extract reusable parts
  nfsd: allocate id-to-name and name-to-id caches in per-net operations.
  nfsd: make name-to-id cache allocated per network namespace context
  nfsd: make id-to-name cache allocated per network namespace context
  nfsd: pass network context to idmap init/exit functions
  nfsd: allocate export and expkey caches in per-net operations.
  nfsd: make expkey cache allocated per network namespace context
  nfsd: make export cache allocated per network namespace context
  nfsd: pass pointer to export cache down to stack wherever possible.
  nfsd: pass network context to export caches init/shutdown routines
  Lockd: pass network namespace to creation and destruction routines
  NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches
  nfsd: pass pointer to expkey cache down to stack wherever possible.
  nfsd: use hash table from cache detail in nfsd export seq ops
  nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops
  nfsd: use exp_put() for svc_export_cache put
  nfsd: use cache detail pointer from svc_export structure on cache put
  nfsd: add link to owner cache detail to svc_export structure
  nfsd: use passed cache_detail pointer expkey_parse()
  ...
2012-05-31 18:18:11 -07:00
Linus Torvalds 08615d7d85 Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc patches from Andrew Morton:

 - the "misc" tree - stuff from all over the map

 - checkpatch updates

 - fatfs

 - kmod changes

 - procfs

 - cpumask

 - UML

 - kexec

 - mqueue

 - rapidio

 - pidns

 - some checkpoint-restore feature work.  Reluctantly.  Most of it
   delayed a release.  I'm still rather worried that we don't have a
   clear roadmap to completion for this work.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (78 patches)
  kconfig: update compression algorithm info
  c/r: prctl: add ability to set new mm_struct::exe_file
  c/r: prctl: extend PR_SET_MM to set up more mm_struct entries
  c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
  syscalls, x86: add __NR_kcmp syscall
  fs, proc: introduce /proc/<pid>/task/<tid>/children entry
  sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE
  aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
  eventfd: change int to __u64 in eventfd_signal()
  fs/nls: add Apple NLS
  pidns: make killed children autoreap
  pidns: use task_active_pid_ns in do_notify_parent
  rapidio/tsi721: add DMA engine support
  rapidio: add DMA engine support for RIO data transfers
  ipc/mqueue: add rbtree node caching support
  tools/selftests: add mq_perf_tests
  ipc/mqueue: strengthen checks on mqueue creation
  ipc/mqueue: correct mq_attr_ok test
  ipc/mqueue: improve performance of send/recv
  selftests: add mq_open_tests
  ...
2012-05-31 18:10:18 -07:00
Cyrill Gorcunov 5b172087f9 c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat.  The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Cyrill Gorcunov 818411616b fs, proc: introduce /proc/<pid>/task/<tid>/children entry
When we do checkpoint of a task we need to know the list of children the
task, has but there is no easy and fast way to generate reverse
parent->children chain from arbitrary <pid> (while a parent pid is
provided in "PPid" field of /proc/<pid>/status).

So instead of walking over all pids in the system (creating one big
process tree in memory, just to figure out which children a task has) --
we add explicit /proc/<pid>/task/<tid>/children entry, because the kernel
already has this kind of information but it is not yet exported.

This is a first level children, not the whole process tree.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Christopher Yeoh ac34ebb3a6 aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.

Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to.  This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Sha Zhengju ee62c6b2dc eventfd: change int to __u64 in eventfd_signal()
eventfd_ctx->count is an __u64 counter which is allowed to reach
ULLONG_MAX.  eventfd_write() adds a __u64 value to "count", but the kernel
side eventfd_signal() only adds an int value to it.  Make them consistent.

[akpm@linux-foundation.org: update interface documentation]
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Vladimir Serbinenko 71ca97da9d fs/nls: add Apple NLS
HFS has support for NLS.  However the relevant NLS tables are missing.
Here they are automatically transformed from the tables at unicode.org.
Codepages requiring special handling like CJK, RTL or Brahmic ones are not
included in this patch.

[akpm@linux-foundation.org: add unicode.org copyright and permission notices]
Signed-off-by: Vladimir Serbinenko <phcoder@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Konstantin Khlebnikov bca1554373 proc/smaps: show amount of nonlinear ptes in vma
Currently, nonlinear mappings can not be distinguished from ordinary
mappings.  This patch adds into /proc/pid/smaps line "Nonlinear: <size>
kB", where size is amount of nonlinear ptes in vma, this line appears only
if VM_NONLINEAR is set.  This information may be useful not only for
checkpoint/restore project.

Requested by Pavel Emelyanov.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Konstantin Khlebnikov b1d4d9e0cb proc/smaps: carefully handle migration entries
Currently smaps reports migration entries as "swap", as result "swap" can
appears in shared mapping.

This patch converts migration entries into pages and handles them as usual.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Konstantin Khlebnikov 052fb0d635 proc: report file/anon bit in /proc/pid/pagemap
This is an implementation of Andrew's proposal to extend the pagemap file
bits to report what is missing about tasks' working set.

The problem with the working set detection is multilateral.  In the criu
(checkpoint/restore) project we dump the tasks' memory into image files
and to do it properly we need to detect which pages inside mappings are
really in use.  The mincore syscall I though could help with this did not.
 First, it doesn't report swapped pages, thus we cannot find out which
parts of anonymous mappings to dump.  Next, it does report pages from page
cache as present even if they are not mapped, and it doesn't make that has
not been cow-ed.

Note, that issue with swap pages is critical -- we must dump swap pages to
image file.  But the issues with file pages are optimization -- we can
take all file pages to image, this would be correct, but if we know that a
page is not mapped or not cow-ed, we can remove them from dump file.  The
dump would still be self-consistent, though significantly smaller in size
(up to 10 times smaller on real apps).

Andrew noticed, that the proc pagemap file solved 2 of 3 above issues --
it reports whether a page is present or swapped and it doesn't report not
mapped page cache pages.  But, it doesn't distinguish cow-ed file pages
from not cow-ed.

I would like to make the last unused bit in this file to report whether the
page mapped into respective pte is PageAnon or not.

[comment stolen from Pavel Emelyanov's v1 patch]

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Jan Engelhardt 715be1fce0 procfs: use more apprioriate types when dumping /proc/N/stat
- use int fpr priority and nice, since task_nice()/task_prio() return that

- field 24: get_mm_rss() returns unsigned long

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Alexey Dobriyan af5e617143 proc: pass "fd" by value in /proc/*/{fd,fdinfo} code
Pass "fd" directly, not via pointer -- one less memory read.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Alexey Dobriyan f05ed3f1ab proc: don't do dummy rcu_read_lock/rcu_read_unlock on error path
rcu_read_lock()/rcu_read_unlock() is nop for TINY_RCU, but is not a nop
for, say, PREEMPT_RCU.

proc_fill_cache() is called without RCU lock, there is no need to
lock/unlock on error path, simply jump out of the loop.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Cong Wang 2344bec788 proc: use mm_access() instead of ptrace_may_access()
mm_access() handles this much better, and avoids some race conditions.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:29 -07:00
Cong Wang e7dcd9990e proc: remove mm_for_maps()
mm_for_maps() is a simple wrapper for mm_access(), and the name is
misleading, so just remove it and use mm_access() directly.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Cong Wang b409e578d9 proc: clean up /proc/<pid>/environ handling
Similar to e268337dfe ("proc: clean up and fix /proc/<pid>/mem
handling"), move the check of permission to open(), this will simplify
read() code.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Namjae Jeon f0aac6162e fat: use fat_msg_ratelimit() in fat__get_entry()
If an application tries to lookup (opendir/readdir/stat) 5000 files on a
fatfs USB device and the device is unplugged, many message occur, shown
below.  This makes the application slow.  So use the new
fat_msg_ratelimit() decrease the messaging rate.

  #> ./file_lookup_testcase ./files_directory/
  usb 2-1.4: USB disconnect, device number 4
  FAT-fs (sda1): FAT read failed (blocknr 2631)
  FAT-fs (sda1): Directory bread(block 396816) failed
  FAT-fs (sda1): Directory bread(block 396817) failed
  FAT-fs (sda1): Directory bread(block 396818) failed
  FAT-fs (sda1): Directory bread(block 396819) failed
  FAT-fs (sda1): Directory bread(block 396820) failed
  FAT-fs (sda1): Directory bread(block 396821) failed
  FAT-fs (sda1): Directory bread(block 396822) failed
  FAT-fs (sda1): Directory bread(block 396823) failed
  FAT-fs (sda1): Directory bread(block 406824) failed
  FAT-fs (sda1): Directory bread(block 406825) failed
  FAT-fs (sda1): Directory bread(block 406826) failed
  FAT-fs (sda1): Directory bread(block 406827) failed
  FAT-fs (sda1): Directory bread(block 406828) failed
  FAT-fs (sda1): Directory bread(block 406829) failed
  FAT-fs (sda1): Directory bread(block 406830) failed
  FAT-fs (sda1): Directory bread(block 406831) failed
  FAT-fs (sda1): Directory bread(block 417696) failed
  FAT-fs (sda1): Directory bread(block 417697) failed
  FAT-fs (sda1): Directory bread(block 417698) failed
  FAT-fs (sda1): Directory bread(block 417699) failed
  FAT-fs (sda1): Directory bread(block 417700) failed
  FAT-fs (sda1): Directory bread(block 417701) failed
  FAT-fs (sda1): Directory bread(block 417702) failed
  FAT-fs (sda1): Directory bread(block 417703) failed
  FAT-fs (sda1): FAT read failed (blocknr 2631)
  FAT-fs (sda1): Directory bread(block 396816) failed
  FAT-fs (sda1): Directory bread(block 396817) failed
  FAT-fs (sda1): Directory bread(block 396818) failed
  FAT-fs (sda1): Directory bread(block 396819) failed
  FAT-fs (sda1): Directory bread(block 396820) failed
  FAT-fs (sda1): Directory bread(block 396821) failed

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Namjae Jeon b742c34153 fat: add fat_msg_ratelimit()
Add a fat_msg_ratelimit() to limit the message generation rate.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Artem Bityutskiy 78491189dd fat: switch to fsinfo_inode
Currently FAT file-system maps the VFS "superblock" abstraction to the
FSINFO block.  The FSINFO block contains non-essential data about the
amount of free clusters and the next free cluster.  FAT file-system can
always find out this information by scanning the FAT table, but having it
in the FSINFO block may speed things up sometimes.  So FAT file-system
relies on the VFS superblock write-out services to make sure the FSINFO
block is written out to the media from time to time.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds
and writes out all dirty superblock using the '->write_super()' call-back.
 But the problem with this thread is that it wastes power by waking up the
system every 5 seconds no matter what.  So we want to kill it completely
and thus, we need to make file-systems to stop using the '->write_super'
VFS service, and then remove it together with the kernel thread.

This patch switches the FAT FSINFO block management from
'->write_super()'/'->s_dirt' to 'fsinfo_inode'/'->write_inode'.  Now,
instead of setting the 's_dirt' flag, we just mark the special
'fsinfo_inode' inode as dirty and let VFS invoke the '->write_inode'
call-back when needed, where we write-out the FSINFO block.

This patch also makes sure we do not mark the 'fsinfo_inode' inode as
dirty if we are not FAT32 (FAT16 and FAT12 do not have the FSINFO block)
or if we are in R/O mode.

As a bonus, we can also remove the '->sync_fs()' and '->write_super()' FAT
call-back function because they become unneeded.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Artem Bityutskiy 330fe3c4c6 fat: mark superblock as dirty less often
Preparation for further changes.  It touches few functions in fatent.c and
prevents them from marking the superblock as dirty unnecessarily often.
Namely, instead of marking it as dirty in the internal tight loops - do it
only once at the end of the functions.  And instead of marking it as dirty
while holding the FAT table lock, do it outside the lock.

The reason for this patch is that marking the superblock as dirty will
soon become a little bit heavier operation, so it is cleaner to do this
only when it is necessary.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:28 -07:00
Artem Bityutskiy 90b436657e fat: introduce mark_fsinfo_dirty helper
A preparation patch which introduces a 'mark_fsinfo_dirty()' helper
function which just sets the 's_dirt' flag to 1 so far.  I'll add more
code to this helper later, so I do not mark it as 'inline'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Artem Bityutskiy 020ac5b6be fat: introduce special inode for managing the FSINFO block
This is patchset makes fatfs stop using the VFS '->write_super()' method
for writing out the FSINFO block.

The final goal is to get rid of the 'sync_supers()' kernel thread.  This
kernel thread wakes up every 5 seconds (by default) and calls
'->write_super()' for all mounted file-systems.  And the bad thing is that
this is done even if all the superblocks are clean.  Moreover, some
file-systems do not even need this end they do not register the
'->write_super()' method at all (e.g., btrfs).

So 'sync_supers()' most often just generates useless wake-ups and wastes
power.  I am trying to make all file-systems independent of
'->write_super()' and plan to remove 'sync_supers()' and '->write_super'
completely once there are no more users.

The '->write_supers()' method is mostly used by baroque file-systems like
hfs, udf, etc.  Modern file-systems like btrfs and xfs do not use it.
This justifies removing this stuff from VFS completely and make every FS
self-manage own superblock.

Tested with xfstests.

This patch:

Preparation for further changes.  It introduces a special inode
('fsinfo_inode') in FAT file-system which we'll later use for managing the
FSINFO block.  Note, this there is already one special inode ('fat_inode')
which is used for managing the FAT tables.

Introduce new 'MSDOS_FSINFO_INO' constant for this special inode.  It is
safe to do because FAT file-system does not store inode numbers on the
media but generates them run-time.

I've also cleaned up the comment to existing 'MSDOS_ROOT_INO' constant,
while on it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Dan Carpenter 7bc1bac77a HPFS: remove PRINTK() macro
The PRINTK() macro isn't really used.  Let's just remove it because it
is ugly and out of date.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Ryusuke Konishi 11475975dd nilfs2: flush disk caches in syncing
There are two cases that the cache flush is needed to avoid data loss
against unexpected hang or power failure.  One is sync file function (i.e.
 nilfs_sync_file) and another is checkpointing ioctl.

This issues a cache flush request to device for such cases if barrier
mount option is enabled, and makes sure data really is on persistent
storage on their completion.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Will Deacon a1d494495c pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command
As described in commit 07d106d0a3 ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand.  Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.

This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL
when passed an unknown ioctl command.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:27 -07:00
Xi Wang a3860c1c5d introduce SIZE_MAX
ULONG_MAX is often used to check for integer overflow when calculating
allocation size.  While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Alex Elder <elder@dreamhost.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:26 -07:00
J. Bruce Fields 6eccece90b nfsd4: fix, consolidate client_has_state
Whoops: first, I reimplemented the already-existing has_resources
without noticing; second, I got the test backwards.  I did pick a better
name, though.  Combine the two....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:39 -04:00
J. Bruce Fields b9831b59f3 nfsd4: don't remove rebooted client record until confirmation
In the NFSv4.1 client-reboot case we're currently removing the client's
previous state in exchange_id.  That's wrong--we should be waiting till
the confirming create_session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:34 -04:00
J. Bruce Fields 32f16b3823 nfsd4: remove some dprintk's and a comment
The comment is redundant, and if we really want dprintk's here they'd
probably be better in the common (check-slot_seqid) code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:31 -04:00
J. Bruce Fields 778df3f0fe nfsd4: return "real" sequence id in confirmed case
The client should ignore the returned sequence_id in the case where the
CONFIRMED flag is set on an exchange_id reply--and in the unconfirmed
case "1" is always the right response.  So it shouldn't actually matter
what we return here.

We could continue returning 1 just to catch clients ignoring the spec
here, but I'd rather be generous.  Other things equal, returning the
existing sequence_id seems more informative.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:27 -04:00
J. Bruce Fields 0f1ba0ef21 nfsd4: fix exchange_id to return confirm flag
Otherwise nfsd4_set_ex_flags writes over the return flags.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:21 -04:00
J. Bruce Fields 7447758be7 nfsd4: clarify that renewing expired client is a bug
This can't happen:
	- cl_time is zeroed only by unhash_client_locked, which is only
	  ever called under both the state lock and the client lock.
	- every caller of renew_client() should have looked up a
	  (non-expired) client and then called renew_client() all
	  without dropping the state lock.
	- the only other caller of renew_client_locked() is
	  release_session_client(), which first checks under the
	  client_lock that the cl_time is nonzero.

So make it clear that this is a bug, not something we handle.  I can't
quite bring myself to make this a BUG(), though, as there are a lot of
renew_client() callers, and returning here is probably safer than a
BUG().

We'll consider making it a BUG() after some more cleanup.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:14 -04:00
J. Bruce Fields 90d700b779 nfsd4: simpler ordering of setclientid_confirm checks
The cases here divide into two main categories:

	- if there's an uncomfirmed record with a matching verifier,
	  then this is a "normal", succesful case: we're either creating
	  a new client, or updating an existing one.
	- otherwise, this is a weird case: a replay, or a server reboot.

Reordering to reflect that makes the code a bit more concise and the
logic a lot easier to understand.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:07 -04:00
J. Bruce Fields f3d03b9202 nfsd4: setclientid: remove pointless assignment
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:06 -04:00
J. Bruce Fields 8695b90ac3 nfsd4: fix error return in non-matching-creds case
Note CLID_INUSE is for the case where two clients are trying to use the
same client-provided long-form client identifiers.  But what we're
looking at here is the server-returned shorthand client id--if those
clash there's a bug somewhere.

Fix the error return, pull the check out into common code, and do the
check unconditionally in all cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:04 -04:00
J. Bruce Fields 788c1eba50 nfsd4: fix setclientid_confirm same_cred check
New clients are created only by nfsd4_setclientid(), which always gives
any new client a unique clientid.  The only exception is in the
"callback update" case, in which case it may create an unconfirmed
client with the same clientid as a confirmed client.  In that case it
also checks that the confirmed client has the same credential.

Therefore, it is pointless for setclientid_confirm to check whether a
confirmed and unconfirmed client with the same clientid have matching
credentials--they're guaranteed to.

Instead, it should be checking whether the credential on the
setclientid_confirm matches either of those.  Otherwise, it could be
anyone sending the setclientid_confirm.  Granted, I can't see why anyone
would, but still it's probalby safer to check.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:03 -04:00
J. Bruce Fields 34b232bb37 nfsd4: merge 3 setclientid cases to 2
Boy, is this simpler.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:02 -04:00
J. Bruce Fields 8f9307119d nfsd4: pull out common code from setclientid cases
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:01 -04:00
J. Bruce Fields ad72aae5ad nfsd4: merge last two setclientid cases
The code here is mostly the same.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:00 -04:00
J. Bruce Fields 63db46328a nfsd4: setclientid/confirm comment cleanup
Be a little more concise.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:59 -04:00
J. Bruce Fields e98479b8d6 nfsd4: setclientid remove unnecessary terms from a logical expression
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:58 -04:00
J. Bruce Fields d5497fc693 nfsd4: move rq_flavor into svc_cred
Move the rq_flavor into struct svc_cred, and use it in setclientid and
exchange_id comparisons as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:58 -04:00
J. Bruce Fields 8fbba96e5b nfsd4: stricter cred comparison for setclientid/exchange_id
The typical setclientid or exchange_id will probably be performed with a
credential that maps to either root or nobody, so comparing just uid's
is unlikely to be useful.  So, use everything else we can get our hands
on.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:57 -04:00
J. Bruce Fields 03a4e1f6dd nfsd4: move principal name into svc_cred
Instead of keeping the principal name associated with a request in a
structure that's private to auth_gss and using an accessor function,
move it to svc_cred.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
J. Bruce Fields 631fc9ea05 nfsd4: allow removing clients not holding state
RFC 5661 actually says we should allow an exchange_id to remove a
matching client, even if the exchange_id comes from a different
principal, *if* the victim client lacks any state.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
J. Bruce Fields 136e658d62 nfsd4: rearrange exchange_id logic to simplify
Minor cleanup: it's simpler to have separate code paths for the update
and non-update cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:54 -04:00
J. Bruce Fields 2dbb269dfe nfsd4: exchange_id cleanup: comments
Make these comments a bit more concise and uniform.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:53 -04:00
J. Bruce Fields 83e08fd46c nfsd4: exchange_id cleanup: local shorthands for repeated tests
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:52 -04:00
J. Bruce Fields 1a308118c2 nfsd4: allow an EXCHANGE_ID to kill a 4.0 client
Following rfc 5661 section 2.4.1, we can permit a 4.1 client to remove
an established 4.0 client's state.

(But we don't allow updates.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:52 -04:00
J. Bruce Fields ea236d0704 nfsd4: exchange_id: check creds before killing confirmed client
We mustn't allow a client to destroy another client with established
state unless it has the right credential.

And some minor cleanup.

(Note: our comparison of credentials is actually pretty bogus currently;
that will need to be fixed in another patch.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:51 -04:00
J. Bruce Fields 2786cc3a05 nfsd4: exchange_id error cleanup
There's no point to the dprintk here as the main proc_compound loop
already does this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:50 -04:00
J. Bruce Fields 11ae681052 nfsd4: exchange_id has a pointless copy
We just verified above that these two verifiers are already the same.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:50 -04:00
Weston Andros Adamson 5fb35a3a9b nfsd: return 0 on reads of fault injection files
debugfs read operations were returning the contents of an uninitialized u64.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:48 -04:00
Jeff Layton ce0fc43c5a nfsd: wrap all accesses to st_deny_bmap
Handle the st_deny_bmap in a similar fashion to the st_access_bmap. Add
accessor functions and use those instead of bare bitops.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:48 -04:00
Jeff Layton 82c5ff1b14 nfsd: wrap accesses to st_access_bmap
Currently, we do this for the most part with "bare" bitops, but
eventually we'll need to expand the share mode code to handle access
and deny modes on other nodes.

In order to facilitate that code in the future, move to some generic
accessor functions. For now, these are mostly static inlines, but
eventually we'll want to move these to "real" functions that are
able to handle multi-node configurations or have a way to "swap in"
new operations to be done in lieu of or in conjunction with these
atomic bitops.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:47 -04:00
Jeff Layton 3a3286147f nfsd: make test_share a bool return
All of the callers treat the return that way already.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:46 -04:00
Jeff Layton 5ae037e599 nfsd: consolidate set_access and set_deny
These functions are identical. Also, rename them to bmap_to_share_mode
to better reflect what they do, and have them just return the result
instead of passing in a pointer to the storage location.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:46 -04:00
Chuck Lever f07ea10dc8 NFSD: SETCLIENTID_CONFIRM returns NFS4ERR_CLID_INUSE too often
According to RFC 3530bis, the only items SETCLIENTID_CONFIRM processing
should be concerned with is the clientid, clientid verifier, and
principal.  The client's IP address is not supposed to be interesting.

And, NFS4ERR_CLID_INUSE is meant only for principal mismatches.

I triggered this logic with a prototype UCS client -- one that
uses the same nfs_client_id4 string for all servers.  The client
mounted our server via its IPv4, then via its IPv6 address.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:45 -04:00
Stanislav Kinsbursky 8dbf28e495 LockD: add debug message to start and stop functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:44 -04:00
Stanislav Kinsbursky 3d1221dfa9 LockD: service start function introduced
This is just a code move, which from my POV makes the code look better.
I.e. now on start we have 3 different stages:
1) Service creation.
2) Service per-net data allocation.
3) Service start.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:44 -04:00
Stanislav Kinsbursky 7d13ec761a LockD: move global usage counter manipulation from error path
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:43 -04:00
Stanislav Kinsbursky 2445223909 LockD: service creation function introduced
This function creates service if it doesn't exist, or increases usage
counter if it does, and returns a pointer to it.  The usage counter will
be droppepd by svc_destroy() later in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:42 -04:00
Stanislav Kinsbursky dbf9b5d74c LockD: use existing per-net data function on service creation
This patch also replaces svc_rpcb_setup() with svc_bind().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:42 -04:00
Stanislav Kinsbursky 4db77695bf LockD: pass service to per-net up and down functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:41 -04:00
Stanislav Kinsbursky 786185b5f8 SUNRPC: move per-net operations from svc_destroy()
The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.

Notes:

1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.

2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:40 -04:00
Stanislav Kinsbursky 9793f7c889 SUNRPC: new svc_bind() routine introduced
This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:39 -04:00
Weston Andros Adamson e7a0444aef nfsd: add IPv6 addr escaping to fs_location hosts
The fs_location->hosts list is split on colons, but this doesn't work when
IPv6 addresses are used (they contain colons).
This patch adds the function nfsd4_encode_components_esc() to
allow the caller to specify escape characters when splitting on 'sep'.
In order to fix referrals, this patch must be used with the mountd patch
that similarly fixes IPv6 [] escaping.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:38 -04:00
J. Bruce Fields 45eaa1c1a1 nfsd4: fix change attribute endianness
Though actually this doesn't matter much, as NFSv4.0 clients are
required to treat the change attribute as opaque.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:38 -04:00
J. Bruce Fields d1829b3824 nfsd4: fix free_stateid return endianness
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:37 -04:00
J. Bruce Fields 57b7b43b40 nfsd4: int/__be32 fixes
In each of these cases there's a simple unambiguous correct choice, and
no actual bug.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:37 -04:00
J. Bruce Fields bc1b542be9 nfsd4: preserve __user annotation on cld downcall msg
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:36 -04:00
J. Bruce Fields 2355c59644 nfsd4: fix missing "static"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:35 -04:00
J. Bruce Fields bfa4b36525 nfsd: state.c should include current_stateid.h
OK, admittedly I'm mainly just trying to shut sparse up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:35 -04:00
Chris Mason 1e20932a23 Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus
Conflicts:
	fs/btrfs/ulist.h

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-31 16:49:53 -04:00
Trond Myklebust 1d59d61f60 NFS: Ensure that setattr and getattr wait for O_DIRECT write completion
Use the same mechanism as the block devices are using, but move the
helper functions from fs/direct-io.c into fs/inode.c to remove the
dependency on CONFIG_BLOCK.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 11:41:36 -07:00
Jan Schmidt c31931088f Btrfs: fix tree mod log rewinded level and rewinding of moved keys
When we rewind REMOVE_WHILE_FREEING operations, there's code that allocates
a fresh buffer instead of cloning the old one. Setting that buffer's level
correctly was missing in this case.

When rewinding a MOVE_KEYS operation, btrfs_node_key_ptr_offset(slot) was
missing for memmove_extent_buffer()'s arguments.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:19 +02:00
Jan Schmidt f395694c2c Btrfs: fix tree mod log del_ptr
Logging for del_ptr when we're not deleting the last pointer was wrong. This
fixes both, duplicate log entries and log sequence.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:19 +02:00
Jan Schmidt e9b7fd4d8b Btrfs: add tree_mod_dont_log helper
Replace duplicate code by small inline helper function.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:18 +02:00
Jan Schmidt 926dd8a640 Btrfs: add missing spin_lock for insertion into tree mod log
tree_mod_alloc calls __get_tree_mod_seq and must acquire a spinlock before
doing so.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:18 +02:00
Jan Schmidt 3301958b7c Btrfs: add inodes before dropping the extent lock in find_all_leafs
We must build up the inode list with the extent lock held after following
indirect refs.

This also requires an extension to ulists, which allows to modify the stored
aux value in case a key already exists in the list.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:53:08 +02:00
Al Viro e5467859f7 split ->file_mmap() into ->mmap_addr()/->mmap_file()
... i.e. file-dependent and address-dependent checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-31 13:11:54 -04:00
Theodore Ts'o f3fc0210c0 ext4: add missing save_error_info() to ext4_error()
The ext4_error() function is missing a call to save_error_info().
Since this is the function which marks the file system as containing
an error, this oversight (which was introduced in 2.6.36) is quite
significant, and should be backported to older stable kernels with
high urgency.

Reported-by: Ken Sumrall <ksumrall@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ksumrall@google.com
Cc: stable@kernel.org
2012-05-30 23:00:16 -04:00
Theodore Ts'o 2c0544b235 ext4: add debugging trigger for ext4_error()
Make it easy to test whether or not the error handling subsystem in
ext4 is working correctly.  This allows us to simulate an ext4_error()
by echoing a string to /sys/fs/ext4/<dev>/trigger_fs_error.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ksumrall@google.com
2012-05-30 22:56:46 -04:00
Al Viro 7696e0c37f binfmt_flat: use vm_munmap, we are missing ->mmap_sem there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:56 -04:00
Al Viro 5a5e4c2eca binfmt_elf: switch elf_map() to vm_mmap/vm_munmap
No reason to hold ->mmap_sem over the sequence

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:55 -04:00
Al Viro 63d37a84ab vfs: umount_tree() might be called on subtree that had never made it
__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm()
we'd done back when we set ->mnt_ns non-NULL; it should not be done to
vfsmounts that had never gone through commit_tree() and friends.  Kudos to
lczerner for catching that one...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:55 -04:00
Will Deacon 46ce341b2f pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command
As described in commit 07d106d0a ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand. Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.

This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of
-EINVAL when passed an unknown ioctl command.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:55 -04:00
J. Bruce Fields 3f50fff4da vfs: remove unused __d_splice_alias argument
Nobody sets want_disconn any more.

Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:54 -04:00
J. Bruce Fields 7732a557b1 vfs: stop d_splice_alias creating directory aliases
A directory should never have more than one dentry pointing to it.

But d_splice_alias() will add one if it finds a directory with an
already-existing non-DISCONNECTED dentry.

I can't find an obvious reproducer, but I also can't see what prevents
d_splice_alias() from encountering such a case.

It therefore seems safest to allow d_splice_alias to use any dentry it
finds.

(Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
this could cause an nfsd deadlock like this:

	- Somebody attempts to remove a non-empty directory.
	- The dentry_unhash() in vfs_rmdir() unhashes the dentry
	  pointing to the non-empty directory.
	- ->rmdir() then fails with -ENOTEMPTY
	- Before the vfs_rmdir() caller reaches dput(), an nfsd process
	  in rename looks up the directory by filehandle; at the end of
	  that lookup, this dentry is found by d_alloc_anon(), and a
	  reference is taken on it, preventing dput() from removing it.
	- A regular lookup of the directory calls d_splice_alias(),
	  finds only an unhashed (not a DISCONNECTED) dentry, and
	  insteads adds a new one, so the directory now has two
	  dentries.
	- The nfsd process in rename, which was previously looking up
	  the source directory of the rename, now looks up the target
	  directory (which is the same), and gets the dentry newly
	  created by the previous lookup.
	- The rename, seeing two different dentries, assumes this is a
	  cross-directory rename and attempts to take the i_mutex on the
	  directory twice.

That reproducer no longer exists, but I don't think there was anything
fundamentally incorrect about the vfs_rmdir() behavior there, so I think
the real fault was here in d_splice_alias().)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:54 -04:00
Dan Carpenter fd657170c0 fsnotify: remove unused parameter from send_to_group()
We don't use "mnt" anymore in send_to_group() after 1968f5eed5 ("fanotify:
use both marks when possible") was applied.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:53 -04:00
Dmitry Kasatkin 799243a389 vfs: increment iversion when a file is truncated
When a file is truncated with truncate()/ftruncate() and then closed,
iversion is not updated.  This patch uses ATTR_SIZE flag as an indication
to increment iversion.

Mimi said:

On fput(), i_version is used to detect and flag files that have changed
and need to be re-measured in the IMA measurement policy.  When a file
is truncated with truncate()/ftruncate() and then closed, i_version is
not updated.  As a result, although the file has changed, it will not be
re-measured and added to the IMA measurement list on subsequent access.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:53 -04:00
Shai Fultheim a0a9b04337 fs: Move bh_cachep to the __read_mostly section
bh_cachep is only written to once on initialization, so move it to the
__read_mostly section.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:52 -04:00
Cong Wang 3ed37648e1 fs: move file_remove_suid() to fs/inode.c
file_remove_suid() is a generic function operates on struct file,
it almost has no relations with file mapping, so move it to fs/inode.c.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:52 -04:00
Artem Bityutskiy 8bdc81c506 jffs2: get rid of jffs2_sync_super
Currently JFFS2 file-system maps the VFS "superblock" abstraction to the
write-buffer. Namely, it uses VFS services to synchronize the write-buffer
periodically.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblock using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds no matter what. So we want to kill it completely and thus, we need to
make file-systems to stop using the '->write_super' VFS service, and then
remove it together with the kernel thread.

This patch switches the JFFS2 write-buffer management from
'->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt'
flag we just schedule a delayed work for synchronizing the write-buffer.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:52 -04:00
Artem Bityutskiy 06688905cc jffs2: remove unnecessary GC pass on sync
We do not need to call 'jffs2_write_super()' on sync. This function
causes a GC pass to make sure the current contents is pushed out with
the data which we already have on the media.

But this is not needed on unmount and only slows sync down unnecessarily.
It is enough to just sync the write-buffer.

This call was added by one of the generic VFS rework patch-sets,
see d579ed00aa.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:51 -04:00
Artem Bityutskiy d0490eea14 jffs2: remove unnecessary GC pass on umount
We do not need to call 'jffs2_write_super()' on unmount. This function
causes a GC pass to make sure the current contents is pushed out with
the data which we already have on the media.

But this is not needed on unmount and only slows unmount down unnecessarily.
It is enough to just sync the write-buffer.

This call was added by one of the generic VFS rework patch-sets,
see 8c85e12512.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:51 -04:00
Artem Bityutskiy 3a0c0e26b6 jffs2: remove lock_super
We do not need 'lock_super()'/'unlock_super()' in JFFS2 - kill them.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:51 -04:00
Linus Torvalds af56e0aa35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "There are some updates and cleanups to the CRUSH placement code, a bug
  fix with incremental maps, several cleanups and fixes from Josh Durgin
  in the RBD block device code, a series of cleanups and bug fixes from
  Alex Elder in the messenger code, and some miscellaneous bounds
  checking and gfp cleanups/fixes."

Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the
networking people preferring "unsigned int" over just "unsigned".

* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits)
  libceph: fix pg_temp updates
  libceph: avoid unregistering osd request when not registered
  ceph: add auth buf in prepare_write_connect()
  ceph: rename prepare_connect_authorizer()
  ceph: return pointer from prepare_connect_authorizer()
  ceph: use info returned by get_authorizer
  ceph: have get_authorizer methods return pointers
  ceph: ensure auth ops are defined before use
  ceph: messenger: reduce args to create_authorizer
  ceph: define ceph_auth_handshake type
  ceph: messenger: check return from get_authorizer
  ceph: messenger: rework prepare_connect_authorizer()
  ceph: messenger: check prepare_write_connect() result
  ceph: don't set WRITE_PENDING too early
  ceph: drop msgr argument from prepare_write_connect()
  ceph: messenger: send banner in process_connect()
  ceph: messenger: reset connection kvec caller
  libceph: don't reset kvec in prepare_write_banner()
  ceph: ignore preferred_osd field
  ceph: fully initialize new layout
  ...
2012-05-30 11:17:19 -07:00
Jan Schmidt 95a06077f7 Btrfs: use delayed ref sequence numbers for all fs-tree updates
The sequence number for delayed refs is needed to postpone certain delayed
refs for a very short period while walking backrefs. Before the tree
modification log, we thought we'd only have to hold back those references
that don't have a counter operation.

While now we've the tree mod log, we're rewinding fs tree blocks to a
defined consistent state. We cannot know in advance for which tree block
we'll be doing rewind operations later. Therefore, we must postpone all the
delayed refs for fs-tree blocks, even those having a counter operation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 18:18:21 +02:00
Chris Mason cfc442b696 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into HEAD 2012-05-30 11:55:38 -04:00
Linus Torvalds 0d167518e0 Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block
Merge block/IO core bits from Jens Axboe:
 "This is a bit bigger on the core side than usual, but that is purely
  because we decided to hold off on parts of Tejun's submission on 3.4
  to give it a bit more time to simmer.  As a consequence, it's seen a
  long cycle in for-next.

  It contains:

   - Bug fix from Dan, wrong locking type.
   - Relax splice gifting restriction from Eric.
   - A ton of updates from Tejun, primarily for blkcg.  This improves
     the code a lot, making the API nicer and cleaner, and also includes
     fixes for how we handle and tie policies and re-activate on
     switches.  The changes also include generic bug fixes.
   - A simple fix from Vivek, along with a fix for doing proper delayed
     allocation of the blkcg stats."

Fix up annoying conflict just due to different merge resolution in
Documentation/feature-removal-schedule.txt

* 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
  blkcg: tg_stats_alloc_lock is an irq lock
  vmsplice: relax alignement requirements for SPLICE_F_GIFT
  blkcg: use radix tree to index blkgs from blkcg
  blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
  block: fix elvpriv allocation failure handling
  block: collapse blk_alloc_request() into get_request()
  blkcg: collapse blkcg_policy_ops into blkcg_policy
  blkcg: embed struct blkg_policy_data in policy specific data
  blkcg: mass rename of blkcg API
  blkcg: style cleanups for blk-cgroup.h
  blkcg: remove blkio_group->path[]
  blkcg: blkg_rwstat_read() was missing inline
  blkcg: shoot down blkgs if all policies are deactivated
  blkcg: drop stuff unused after per-queue policy activation update
  blkcg: implement per-queue policy activation
  blkcg: add request_queue->root_blkg
  blkcg: make request_queue bypassing on allocation
  blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  blkcg: make blkg_conf_prep() take @pol and return with queue lock held
  blkcg: remove static policy ID enums
  ...
2012-05-30 08:52:42 -07:00
Stefan Behrens 48235a68a3 Btrfs: fix false positive in check-integrity on unmount
During unmount, it could happen that the integrity checker printed a
warning message "attempt to free ... on umount which is not yet iodone"
which turned out to be a false positive.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:44 -04:00
Stefan Behrens 86ff7ffce0 Btrfs: fix runtime warning in check-integrity check data mode
If a file_extent_item was located at the very end of a leaf and there was
not enough space to hold a full item, but there was enough space to hold
one of type BTRFS_FILE_EXTENT_INLINE or PREALLOC, and it was only such a
short item, a warning was printed anyway. This check is now fixed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:43 -04:00
Stefan Behrens 3d136a1131 Btrfs: set ioprio of scrub readahead to idle
Reduce ioprio class of scrub readahead threads to idle priority.
This setting is fixed. This priority has shown the best performance
during all measurements.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:43 -04:00
Josef Bacik 5bdbeb2187 Btrfs: fix return code in drop_objectid_items
So dpkg fsync()'s the file and the directory containing the file whenever it
writes to a file which is really slow in btrfs.  This is partly because
fsync()'ing a directory _always_ committed the transaction instead of just
going to the tree log.  This is because drop_objectid_items() would return 1
since it does a btrfs_search_slot() which returns 1.  In tree-log jargon
this means that we have to commit the transaction to be safe.  So just check
if ret is greater than 0 and set it to 0 if it does.  With this patch we now
use the tree-log instead of committing the entire transaction, which is
twice as fast on my box.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:42 -04:00
Josef Bacik 22ee6985de Btrfs: check to see if the inode is in the log before fsyncing
We have this check down in the actual logging code, but this is after we
start a transaction and all that good stuff.  So move the helper
inode_in_log() out so we can call it in fsync() and avoid starting a
transaction altogether and just exit if we've already fsync()'ed this file
recently.  You would notice this issue if you fsync()'ed a file over and
over again until the transaction committed.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:42 -04:00
Tsutomu Itoh 018642a1f1 Btrfs: return value of btrfs_read_buffer is checked correctly
btrfs_read_buffer() has the possibility of returning the error.
Therefore, I add the code in which the return value of btrfs_read_buffer()
is checked.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-05-30 10:23:41 -04:00
Stefan Behrens 733f4fbbc1 Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:41 -04:00
Stefan Behrens c11d2c236c Btrfs: add ioctl to get and reset the device stats
An ioctl interface is added to get the device statistic counters.
A second ioctl is added to atomically get and reset these counters.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:40 -04:00
Stefan Behrens 442a4f6308 Btrfs: add device counters for detected IO and checksum errors
The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:39 -04:00
Asias He d07eb91170 btrfs: Drop unused function btrfs_abort_devices()
1) This function is not used anywhere.

2) Using the blk_abort_queue() to abort the queue seems not correct.
blk_abort_queue() is used for timeout handling (block/blk-timeout.c).

Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
2012-05-30 10:23:39 -04:00
Miao Xie 762f226326 Btrfs: fix the same inode id problem when doing auto defragment
Two files in the different subvolumes may have the same inode id, so
The rb-tree which is used to manage the defragment object must take it
into account. This patch fix this problem.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-05-30 10:23:38 -04:00
Josef Bacik 2adcac1a73 Btrfs: fall back to non-inline if we don't have enough space
If cow_file_range_inline fails with ENOSPC we abort the transaction which
isn't very nice.  This really shouldn't be happening anyways but there's no
sense in making it a horrible error when we can easily just go allocate
normal data space for this stuff.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:38 -04:00
Josef Bacik 8a35d95ff4 Btrfs: fix how we deal with the orphan block rsv
Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations.  So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation.  I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph.  Thanks,
Btrfs: fix how we deal with the orphan block rsv

Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations.  So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation.  I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:37 -04:00
Josef Bacik 72ac3c0d79 Btrfs: convert the inode bit field to use the actual bit operations
Miao pointed this out while I was working on an orphan problem that messing
with a bitfield where different ranges are protected by different locks
doesn't work out right.  Turns out we've been doing this forever where we
have different parts of the bit field protected by either no lock at all or
different locks which could cause all sorts of weird problems including the
issue I was hitting.  So instead make a runtime_flags thing that we use the
normal bit operations on that are all atomic so we can keep having our
no/different locking for the different flags and then make force_compress
it's own thing so it can be treated normally.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:36 -04:00
Josef Bacik cd023e7b17 Btrfs: merge contigous regions when loading free space cache
When we write out the free space cache we will write out everything that is
in our in memory tree, and then we will just walk the pinned extents tree
and write anything we see there.  The problem with this is that during
normal operations the pinned extents will be merged back into the free space
tree normally, and then we can allocate space from the merged areas and
commit them to the tree log.  If we crash and replay the tree log we will
crash again because the tree log will try to free up space from what looks
like 2 seperate but contiguous entries, since one entry is from the original
free space cache and the other was a pinned extent that was merged back.  To
fix this we just need to walk the free space tree after we load it and merge
contiguous entries back together.  This will keep the tree log stuff from
breaking and it will make the allocator behave more nicely.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:36 -04:00
Liu Bo 9ba1f6e44e Btrfs: do not do balance in readonly mode
In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs   -----------------------> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs   -----------------------> works!

It should not be designed as an exception, and we'd better add another check for
mnt flags.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:35 -04:00
Liu Bo d1ac6e41d5 Btrfs: use fastpath in extent state ops as much as possible
Fully utilize our extent state's new helper functions to use
fastpath as much as possible.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:34 -04:00
Liu Bo f8c5d0b443 Btrfs: fix wrong error returned by adding a device
Reproduce:
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs dev add /dev/sdb8 /mnt/btrfs
ERROR: error adding the device '/dev/sdb8' - Invalid argument

Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
a readonly notification is preferred.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:34 -04:00
Josef Bacik 5fd0204355 Btrfs: finish ordered extents in their own thread
We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page.  This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done.  Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread.  This makes direct writes
quite a bit faster.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:33 -04:00
Josef Bacik 4e89915220 Btrfs: do not check delalloc when updating disk_i_size
We are checking delalloc to see if it is ok to update the i_size.  There are
2 cases it stops us from updating

1) If there is delalloc between our current disk_i_size and this ordered
extent

2) If there is delalloc between our current ordered extent and the next
ordered extent

These tests are racy however since we can set delalloc for these ranges at
any time.  Also for the first case if we notice there is delalloc between
disk_i_size and our ordered extent we will not update disk_i_size and assume
that when that delalloc bit gets written out it will update everything
properly.  However if we crash before that we will have file extents outside
of our i_size, which is not good, so this test is dangerous as well as racy.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:33 -04:00
Jim Meyering f60d16a892 Btrfs: avoid buffer overrun in mount option handling
There is an off-by-one error: allocating room for a maximal result
string but without room for a trailing NUL.  That, can lead to
returning a transformed string that is not NUL-terminated, and
then to a caller reading beyond end of the malloc'd buffer.

Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
(the result is guaranteed to fit), remove dead strlen at end, and
change a few variable names and comments.

Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:32 -04:00
Jim Meyering a27202fbe9 Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result
A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
would not be NUL-terminated in the DEV_INFO ioctl result buffer.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Jim Meyering f07c9a79f0 Btrfs: avoid buffer overrun in btrfs_printk
The buffer read-overrun would be triggered by a printk format
starting with <N>, where N is a single digit.  NUL-terminate
after strncpy.  Use memcpy, not strncpy, since we know the
string we're copying fits in the destination buffer and
contains no NUL byte.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Daniel J Blueman 2eec6c8102 Fix minor type issues
Address some minor type issues identified by sparse checker.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
2012-05-30 10:23:30 -04:00
Sergei Trofimovich 0d2450abfa btrfs: allow changing 'thread_pool' size at remount time
Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:

maximum amount of worker threads is specified in 2 places:
- in 'strict btrfs_fs_info::thread_pool_size'
- in each worker struct: 'struct btrfs_workers::max_workers'

'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.

Fix it by pushing new maximum value to all created worker structures
as well.

Cc: Josef Bacik <josef@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-05-30 10:23:30 -04:00
Josef Bacik 0885ef5b56 Btrfs: do not do filemap_write_and_wait_range in fsync
We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:29 -04:00
Josef Bacik 551ebb2d34 Btrfs: remove useless waiting and extra filemap work
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting.  Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again.  We will even check to see if
there is delalloc pages on this range and loop again.  So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait().  In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.

Then in the case of the schedule_timeout(1), we don't need it.  All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible.  The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik d7dbe9e7f6 Btrfs: fix compile warnings in extent_io.c
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik 30f8fe3e47 Btrfs: cache no acl on new inodes
When running compilebench I noticed we were spending some time looking up
acls on new inodes, which shouldn't be happening since there were no acls.
This is because when we init acls on the inode after creating them we don't
cache the fact there are no acls if there aren't any.  Doing this adds a
little bit of a bump to my compilebench runs.  Thanks,
Btrfs: cache no acl on new inodes

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Josef Bacik 0c4d2d95d0 Btrfs: use i_version instead of our own sequence
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00