Commit graph

23651 commits

Author SHA1 Message Date
Linus Torvalds
e08dc1325f p9: avoid unused variable warning
Commit 4e34e719e4 ("fs: take the ACL checks to common code") removed
the use of the 'acl' variable in v9fs_iop_get_acl(), but left the
variable definition around.  Remove it to get rid of the warning:

  fs/9p/acl.c: In function ‘v9fs_iop_get_acl’:
  fs/9p/acl.c:101:20: warning: unused variable ‘acl’

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 23:43:53 -07:00
Linus Torvalds
91d44d9999 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: Make ZLIB compression support optional
  Squashfs: Update documentation for XZ and add squashfs-tools devel tree
2011-07-25 22:50:35 -07:00
Linus Torvalds
2dad3206db Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linux
* 'for-3.1' of git://linux-nfs.org/~bfields/linux:
  nfsd: don't break lease on CLAIM_DELEGATE_CUR
  locks: rename lock-manager ops
  nfsd4: update nfsv4.1 implementation notes
  nfsd: turn on reply cache for NFSv4
  nfsd4: call nfsd4_release_compoundargs from pc_release
  nfsd41: Deny new lock before RECLAIM_COMPLETE done
  fs: locks: remove init_once
  nfsd41: check the size of request
  nfsd41: error out when client sets maxreq_sz or maxresp_sz too small
  nfsd4: fix file leak on open_downgrade
  nfsd4: remember to put RW access on stateid destruction
  NFSD: Added TEST_STATEID operation
  NFSD: added FREE_STATEID operation
  svcrpc: fix list-corrupting race on nfsd shutdown
  rpc: allow autoloading of gss mechanisms
  svcauth_unix.c: quiet sparse noise
  svcsock.c: include sunrpc.h to quiet sparse noise
  nfsd: Remove deprecated nfsctl system call and related code.
  NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND

Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
2011-07-25 22:49:19 -07:00
Linus Torvalds
84635d68be vfs: fix check_acl compile error when CONFIG_FS_POSIX_ACL is not set
Commit e77819e57f ("vfs: move ACL cache lookup into generic code")
didn't take the FS_POSIX_ACL config variable into account - when that is
not set, ACL's go away, and the cache helper functions do not exist,
causing compile errors like

  fs/namei.c: In function 'check_acl':
  fs/namei.c:191:10: error: implicit declaration of function 'negative_cached_acl'
  fs/namei.c:196:2: error: implicit declaration of function 'get_cached_acl'
  fs/namei.c:196:6: warning: assignment makes pointer from integer without a cast
  fs/namei.c:212:11: error: implicit declaration of function 'set_cached_acl'

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 22:47:03 -07:00
Linus Torvalds
45b583b10a Merge 'akpm' patch series
* Merge akpm patch series: (122 commits)
  drivers/connector/cn_proc.c: remove unused local
  Documentation/SubmitChecklist: add RCU debug config options
  reiserfs: use hweight_long()
  reiserfs: use proper little-endian bitops
  pnpacpi: register disabled resources
  drivers/rtc/rtc-tegra.c: properly initialize spinlock
  drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
  drivers/rtc: add support for Qualcomm PMIC8xxx RTC
  drivers/rtc/rtc-s3c.c: support clock gating
  drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
  init: skip calibration delay if previously done
  misc/eeprom: add eeprom access driver for digsy_mtc board
  misc/eeprom: add driver for microwire 93xx46 EEPROMs
  checkpatch.pl: update $logFunctions
  checkpatch: make utf-8 test --strict
  checkpatch.pl: add ability to ignore various messages
  checkpatch: add a "prefer __aligned" check
  checkpatch: validate signature styles and To: and Cc: lines
  checkpatch: add __rcu as a sparse modifier
  checkpatch: suggest using min_t or max_t
  ...

Did this as a merge because of (trivial) conflicts in
 - Documentation/feature-removal-schedule.txt
 - arch/xtensa/include/asm/uaccess.h
that were just easier to fix up in the merge than in the patch series.
2011-07-25 21:00:19 -07:00
Akinobu Mita
9d6bf5aa17 reiserfs: use hweight_long()
Use hweight_long() to count free bits in the bitmap.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:17 -07:00
Akinobu Mita
0c2fd1bfb1 reiserfs: use proper little-endian bitops
Using __test_and_{set,clear}_bit_le() with ignoring its return value can
be replaced with __{set,clear}_bit_le().

This introduces reiserfs_{set,clear}_le_bit for __{set,clear}_bit_le and
does the above change with them.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:17 -07:00
Hugh Dickins
708e3508c2 tmpfs: clone shmem_file_splice_read()
Copy __generic_file_splice_read() and generic_file_splice_read() from
fs/splice.c to shmem_file_splice_read() in mm/shmem.c.  Make
page_cache_pipe_buf_ops and spd_release_page() accessible to it.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:11 -07:00
David Rientjes
be8f684d73 oom: make deprecated use of oom_adj more verbose
/proc/pid/oom_adj is deprecated and scheduled for removal in August 2012
according to Documentation/feature-removal-schedule.txt.

This patch makes the warning more verbose by making it appear as a more
serious problem (the presence of a stack trace and being multiline should
attract more attention) so that applications still using the old interface
can get fixed.

Very popular users of the old interface have been converted since the oom
killer rewrite has been introduced.  udevd switched to the
/proc/pid/oom_score_adj interface for v162, kde switched in 4.6.1, and
opensshd switched in 5.7p1.

At the start of 2012, this should be changed into a WARN() to emit all
such incidents and then finally remove the tunable in August 2012 as
scheduled.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:09 -07:00
Becky Bruce
2b37c35e65 fs/hugetlbfs/inode.c: fix pgoff alignment checking on 32-bit
This:

    vma->vm_pgoff & ~(huge_page_mask(h) >> PAGE_SHIFT)

is incorrect on 32-bit.  It causes us to & the pgoff with something that
looks like this (for a 4m hugepage): 0xfff003ff.  The mask should be
flipped and *then* shifted, to give you 0x0000_03fff.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:07 -07:00
Linus Torvalds
14067ff536 vfs: make gcc generate more obvious code for acl permission checking
The "fsuid is the inode owner" case is not necessarily always the likely
case, but it's the case that doesn't do anything odd and that we want in
straight-line code.  Make gcc not generate random "jump around for the
fun of it" code.

This just helps me read profiles.  That thing is one of the hottest
parts of the whole pathname lookup.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 19:55:52 -07:00
Shirish Pargaonkar
14cae3243b cifs: Cleanup: check return codes of crypto api calls
Check return codes of crypto api calls and either log an error or log
an error and return from the calling function with error.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 22:12:10 +00:00
Pavel Shilovsky
f5bc1e755d CIFS: Fix oops while mounting with prefixpath
commit fec11dd9a0 caused
a regression when we have already mounted //server/share/a
and want to mount //server/share/a/b.

The problem is that lookup_one_len calls __lookup_hash
with nd pointer as NULL. Then __lookup_hash calls
do_revalidate in the case when dentry exists and we end
up with NULL pointer deference in cifs_d_revalidate:

if (nd->flags & LOOKUP_RCU)
	return -ECHILD;

Fix this by checking nd for NULL.

Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 22:06:40 +00:00
Steve French
e010a5ef95 [CIFS] Redundant null check after dereference
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 22:04:32 +00:00
Christoph Hellwig
eaf35b1ea8 cifs: use cifs_dirent in cifs_save_resume_key
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 21:43:14 +00:00
Christoph Hellwig
f16d59b417 cifs: use cifs_dirent to replace cifs_get_name_from_search_buf
This allows us to parse the on the wire structures only once in
cifs_filldir.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 21:40:53 +00:00
Christoph Hellwig
cda0ec6a86 cifs: introduce cifs_dirent
Introduce a generic directory entry structure, and factor the parsing
of the various on the wire structures that can represent one into
a common helper.  Switch cifs_entry_is_dot over to use it as a start.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 21:36:44 +00:00
Christoph Hellwig
9feed6f8fb cifs: cleanup cifs_filldir
Use sensible variable names and formatting and remove some superflous
checks on entry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-25 21:05:10 +00:00
Linus Torvalds
d3ec4844d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  fs: Merge split strings
  treewide: fix potentially dangerous trailing ';' in #defined values/expressions
  uwb: Fix misspelling of neighbourhood in comment
  net, netfilter: Remove redundant goto in ebt_ulog_packet
  trivial: don't touch files that are removed in the staging tree
  lib/vsprintf: replace link to Draft by final RFC number
  doc: Kconfig: `to be' -> `be'
  doc: Kconfig: Typo: square -> squared
  doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
  drivers/net: static should be at beginning of declaration
  drivers/media: static should be at beginning of declaration
  drivers/i2c: static should be at beginning of declaration
  XTENSA: static should be at beginning of declaration
  SH: static should be at beginning of declaration
  MIPS: static should be at beginning of declaration
  ARM: static should be at beginning of declaration
  rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
  Update my e-mail address
  PCIe ASPM: forcedly -> forcibly
  gma500: push through device driver tree
  ...

Fix up trivial conflicts:
 - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
 - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
 - drivers/net/r8169.c (just context changes)
2011-07-25 13:56:39 -07:00
Linus Torvalds
0003230e82 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: take the ACL checks to common code
  bury posix_acl_..._masq() variants
  kill boilerplates around posix_acl_create_masq()
  generic_acl: no need to clone acl just to push it to set_cached_acl()
  kill boilerplate around posix_acl_chmod_masq()
  reiserfs: cache negative ACLs for v1 stat format
  xfs: cache negative ACLs if there is no attribute fork
  9p: do no return 0 from ->check_acl without actually checking
  vfs: move ACL cache lookup into generic code
  CIFS: Fix oops while mounting with prefixpath
  xfs: Fix wrong return value of xfs_file_aio_write
  fix devtmpfs race
  caam: don't pass bogus S_IFCHR to debugfs_create_...()
  get rid of create_proc_entry() abuses - proc_mkdir() is there for purpose
  asus-wmi: ->is_visible() can't return negative
  fix jffs2 ACLs on big-endian with 16bit mode_t
  9p: close ACL leaks
  ocfs2_init_acl(): fix a leak
  VFS : mount lock scalability for internal mounts
2011-07-25 12:53:15 -07:00
Christoph Hellwig
4e34e719e4 fs: take the ACL checks to common code
Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss.  This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:30:23 -04:00
Al Viro
edde854e8b bury posix_acl_..._masq() variants
made static; no callers left outside of posix_acl.c.  posix_acl_clone() also
has lost all external callers and became static...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:27:32 -04:00
Al Viro
826cae2f2b kill boilerplates around posix_acl_create_masq()
new helper: posix_acl_create(&acl, gfp, mode_p).  Replaces acl with
modified clone, on failure releases acl and replaces with NULL.
Returns 0 or -ve on error.  All callers of posix_acl_create_masq()
switched.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:27:32 -04:00
Al Viro
95203befa8 generic_acl: no need to clone acl just to push it to set_cached_acl()
In-core acls are copy-on-write, so the reference taken by set_cached_acl() will
do just fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:27:31 -04:00
Al Viro
bc26ab5f65 kill boilerplate around posix_acl_chmod_masq()
new helper: posix_acl_chmod(&acl, gfp, mode).  Replaces acl with modified
clone or with NULL if that has failed; returns 0 or -ve on error.  All
callers of posix_acl_chmod_masq() switched to that - they'd been doing
exactly the same thing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:27:30 -04:00
Christoph Hellwig
4482a087d4 reiserfs: cache negative ACLs for v1 stat format
Always set up a negative ACL cache entry if the inode can't have ACLs.
That behaves much better than doing this check inside ->check_acl.

Also remove the left over MAY_NOT_BLOCK check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:25:38 -04:00
Christoph Hellwig
6311b10800 xfs: cache negative ACLs if there is no attribute fork
Always set up a negative ACL cache entry if the inode doesn't have an
attribute fork.  That behaves much better than doing this check inside
->check_acl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:25:38 -04:00
Christoph Hellwig
ebbb0ef287 9p: do no return 0 from ->check_acl without actually checking
If we do not want to use ACLs we at least need to perform normal Unix
permission checks.  From the comment I'm not quite sure that's what
is intended, but if 0p wants to do permission checks entirely on the
server it needs to do so in ->permission, not in ->check_acl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:25:38 -04:00
Linus Torvalds
e77819e57f vfs: move ACL cache lookup into generic code
This moves logic for checking the cached ACL values from low-level
filesystems into generic code.  The end result is a streamlined ACL
check that doesn't need to load the inode->i_op->check_acl pointer at
all for the common cached case.

The filesystems also don't need to check for a non-blocking RCU walk
case in their acl_check() functions, because that is all handled at a
VFS layer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:23:39 -04:00
Pavel Shilovsky
3ca30d40a9 CIFS: Fix oops while mounting with prefixpath
commit fec11dd9a0 caused
a regression when we have already mounted //server/share/a
and want to mount //server/share/a/b.

The problem is that lookup_one_len calls __lookup_hash
with nd pointer as NULL. Then __lookup_hash calls
do_revalidate in the case when dentry exists and we end
up with NULL pointer deference in cifs_d_revalidate:

if (nd->flags & LOOKUP_RCU)
	return -ECHILD;

Fix this by checking nd for NULL.

Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:23:21 -04:00
Markus Trippelsdorf
340a0a01b9 xfs: Fix wrong return value of xfs_file_aio_write
The fsync prototype change commit 02c24a8218 accidentally overwrote
the ssize_t return value of xfs_file_aio_write with 0 for SYNC type
writes. Fix this by checking if an error occured when calling
xfs_file_fsync and only change the return value in this case.
In addition xfs_file_fsync actually returns a normal negative error, so
fix this, too.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:23:21 -04:00
Linus Torvalds
096a705bbc Merge branch 'for-3.1/core' of git://git.kernel.dk/linux-block
* 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits)
  block: strict rq_affinity
  backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu
  block: fix patch import error in max_discard_sectors check
  block: reorder request_queue to remove 64 bit alignment padding
  CFQ: add think time check for group
  CFQ: add think time check for service tree
  CFQ: move think time check variables to a separate struct
  fixlet: Remove fs_excl from struct task.
  cfq: Remove special treatment for metadata rqs.
  block: document blk_plug list access
  block: avoid building too big plug list
  compat_ioctl: fix make headers_check regression
  block: eliminate potential for infinite loop in blkdev_issue_discard
  compat_ioctl: fix warning caused by qemu
  block: flush MEDIA_CHANGE from drivers on close(2)
  blk-throttle: Make total_nr_queued unsigned
  block: Add __attribute__((format(printf...) and fix fallout
  fs/partitions/check.c: make local symbols static
  block:remove some spare spaces in genhd.c
  block:fix the comment error in blkdev.h
  ...
2011-07-25 10:33:36 -07:00
Al Viro
963945bf93 fix jffs2 ACLs on big-endian with 16bit mode_t
casting int * to mode_t * is not a good thing - on a *lot* of big-endian
architectures mode_t happens to be smaller than int and there it breaks
quite spectaculary...

Fucked-up-by: commit cfc8dc6f6f
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-24 10:12:01 -04:00
Al Viro
1ec95bf34d 9p: close ACL leaks
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-24 10:10:18 -04:00
Al Viro
c0d960f038 ocfs2_init_acl(): fix a leak
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-24 10:10:09 -04:00
Tim Chen
423e0ab086 VFS : mount lock scalability for internal mounts
For a number of file systems that don't have a mount point (e.g. sockfs
and pipefs), they are not marked as long term. Therefore in
mntput_no_expire, all locks in vfs_mount lock are taken instead of just
local cpu's lock to aggregate reference counts when we release
reference to file objects.  In fact, only local lock need to have been
taken to update ref counts as these file systems are in no danger of
going away until we are ready to unregister them.

The attached patch marks file systems using kern_mount without
mount point as long term.  The contentions of vfs_mount lock
is now eliminated.  Before un-registering such file system,
kern_unmount should be called to remove the long term flag and
make the mount point ready to be freed.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-24 10:08:32 -04:00
Wu Fengguang
fcc5c22218 writeback: don't busy retry writeback on new/freeing inodes
Fix a system hang bug introduced by commit b7a2441f99 ("writeback:
remove writeback_control.more_io") and e8dfc3058 ("writeback: elevate
queue_io() into wb_writeback()") easily reproducible with high memory
pressure and lots of file creation/deletions, for example, a kernel
build in limited memory.

It hangs when some inode is in the I_NEW, I_FREEING or I_WILL_FREE 
state, the flusher will get stuck busy retrying that inode, never
releasing wb->list_lock. The lock in turn blocks all kinds of other
tasks when they are trying to grab it.

As put by Jan, it's a safe change regarding data integrity. I_FREEING or
I_WILL_FREE inodes are written back by iput_final() and it is reclaim
code that is responsible for eventually removing them. So writeback code
can safely ignore them. I_NEW inodes should move out of this state when
they are fully set up and in the writeback round following that, we will
consider them for writeback. So the change makes sense.                                                         

CC: Jan Kara <jack@suse.cz> 
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-24 10:46:51 +08:00
Casey Bodley
0c12eaffdf nfsd: don't break lease on CLAIM_DELEGATE_CUR
CLAIM_DELEGATE_CUR is used in response to a broken lease; allowing it
to break the lease and return EAGAIN leaves the client unable to make
progress in returning the delegation

nfs4_get_vfs_file() now takes struct nfsd4_open for access to the
claim type, and calls nfsd_open() with NFSD_MAY_NOT_BREAK_LEASE when
claim type is CLAIM_DELEGATE_CUR

Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-23 14:58:17 -04:00
Aneesh Kumar K.V
48e370ff93 fs/9p: add 9P2000.L unlinkat operation
unlinkat - Remove a directory entry

size[4] Tunlinkat tag[2] dirfid[4] name[s] flag[4]
size[4] Runlinkat tag[2]

older Tremove have the below request format

size[4] Tremove tag[2] fid[4]

The remove message is used to remove a directory entry either file or directory
The remove opreation is actually a directory opertation and should ideally have
dirfid, if not we cannot represent the fid on server with anything other than
name. We will have to derive the directory name from fid in the Tremove request.

NOTE: The operation doesn't clunk the unlink fid.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:52 -05:00
Aneesh Kumar K.V
9e8fb38e7d fs/9p: add 9P2000.L renameat operation
renameat - change name of file or directory

size[4] Trenameat tag[2] olddirfid[4] oldname[s] newdirfid[4] newname[s]
size[4] Rrenameat tag[2]

older Trename have the below request format

size[4] Trename tag[2] fid[4] newdirfid[4] name[s]

The rename message is used to change the name of a file, possibly moving it
to a new directory. The rename opreation is actually a directory opertation
and should ideally have olddirfid, if not we cannot represent the fid on server
with anything other than name. We will have to derive the old directory name
from fid in the Trename request.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:51 -05:00
Aneesh Kumar K.V
ed80fcfac2 fs/9p: Always ask new inode in create
This make sure we don't end up reusing the unlinked inode object.
The ideal way is to use inode i_generation. But i_generation is
not available in userspace always.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:50 -05:00
Prem Karat
a2dd43bb0d fs/9p: Fix invalid mount options/args
Without this fix, if any invalid mount options/args are passed while mouting
the 9p fs, no error (-EINVAL) is returned and default arg value is assigned.

This fix returns -EINVAL when an invalid arguement is found while parsing
mount options.

Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V
fd2421f544 fs/9p: When doing inode lookup compare qid details and inode mode bits.
This make sure we don't use wrong inode from the inode hash. The inode number
of the file deleted is reused by the next file system object created
and if we only use inode number for inode hash lookup we could end up
with wrong struct inode.

Also compare inode generation number. Not all Linux file system provide
st_gen in userspace. So it could be 0;

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:48 -05:00
Aneesh Kumar K.V
2053d67c54 fs/9p: remove rename work around in 9p
Now that VFS does the right thing remove the work around.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-07-23 09:32:47 -05:00
Linus Torvalds
bbd9d6f7fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
  vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
  isofs: Remove global fs lock
  jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
  fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
  mm/truncate.c: fix build for CONFIG_BLOCK not enabled
  fs:update the NOTE of the file_operations structure
  Remove dead code in dget_parent()
  AFS: Fix silly characters in a comment
  switch d_add_ci() to d_splice_alias() in "found negative" case as well
  simplify gfs2_lookup()
  jfs_lookup(): don't bother with . or ..
  get rid of useless dget_parent() in btrfs rename() and link()
  get rid of useless dget_parent() in fs/btrfs/ioctl.c
  fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
  drivers: fix up various ->llseek() implementations
  fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
  Ext4: handle SEEK_HOLE/SEEK_DATA generically
  Btrfs: implement our own ->llseek
  fs: add SEEK_HOLE and SEEK_DATA flags
  reiserfs: make reiserfs default to barrier=flush
  ...

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
2011-07-22 19:02:39 -07:00
Jan Kara
b22570d9ab ext3: Fix data corruption in inodes with journalled data
When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext3_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.

Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.

Signed-off-by: Jan Kara <jack@suse.cz>
2011-07-23 01:49:00 +02:00
Konstantin Khlebnikov
5a9a43646c vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
Replace unclear (struct dentry *) to (struct file *) typecast with ERR_CAST() macro.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:13 -04:00
Jan Kara
d769b3c2ab isofs: Remove global fs lock
sbi->s_mutex isn't needed for isofs at all so we can just remove it. Generally,
since isofs is always mounted read-only, filesystem structure cannot change
under us.  So buffer_head contents stays constant after it's filled in. That
leaves us with possible changes of global data structures. Superblock changes
only during filesystem mount (even remount does not change it), inodes are only
filled in during reading from disk. So there are no changes of these structures
to bother about.

Arguments why sbi->s_mutex can be removed at each place:
isofs_readdir: Accesses sb, inode, filp, local variables => s_mutex not needed
isofs_lookup: Protected by directory's i_mutex. Accesses sb, inode, dentry,
  local variables => s_mutex not needed
rock_ridge_symlink_readpage: Protected by page lock. Accesses sb, inode,
  local variables => s_mutex not needed.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:12 -04:00
Al Viro
22ba747f66 jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
We don't generate IN_DELETE_SELF on victim of overwriting rename() if
it happens to be a directory.  Trivially fixed by doing to ->i_nlink
what we do ->pino_nlink a couple of lines later in jffs2_rename().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:11 -04:00
Al Viro
841590ce16 fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
On ramfs and other simple_rename() users IN_DELETE_SELF is not generated
for victim of overwriting rename() if it's is a directory.  Works on
most of the local filesystems and really trivial to fix...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-22 19:42:11 -04:00
Linus Torvalds
8209f53d79 Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc
* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
  ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
  ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
  connector: add an event for monitoring process tracers
  ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
  ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
  ptrace_init_task: initialize child->jobctl explicitly
  has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
  ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
  ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
  ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
  ptrace: ptrace_reparented() should check same_thread_group()
  redefine thread_group_leader() as exit_signal >= 0
  do not change dead_task->exit_signal
  kill task_detached()
  reparent_leader: check EXIT_DEAD instead of task_detached()
  make do_notify_parent() __must_check, update the callers
  __ptrace_detach: avoid task_detached(), check do_notify_parent()
  kill tracehook_notify_death()
  make do_notify_parent() return bool
  ptrace: s/tracehook_tracer_task()/ptrace_parent()/
  ...
2011-07-22 15:06:50 -07:00
Linus Torvalds
c1f792a5bf Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits)
  xfs: add size update tracepoint to IO completion
  xfs: convert AIL cursors to use struct list_head
  xfs: remove confusing ail cursor wrapper
  xfs: use a cursor for bulk AIL insertion
  xfs: failure mapping nfs fh to inode should return ESTALE
  xfs: Remove the second parameter to xfs_sb_count()
  xfs: remove the dead XFS_DABUF_DEBUG code
  xfs: remove leftovers of the old btree tracing code
  xfs: remove the dead QUOTADEBUG code
  xfs: remove the unused xfs_buf_delwri_sort function
  xfs: remove wrappers around b_iodone
  xfs: remove wrappers around b_fspriv
  xfs: add a proper transaction pointer to struct xfs_buf
  xfs: factor out xfs_da_grow_inode_int
  xfs: factor out xfs_dir2_leaf_find_stale
  xfs: cleanup struct xfs_dir2_free
  xfs: reshuffle dir2 headers
  xfs: start periodic workers later
  Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc"
  xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact()
  ...
2011-07-22 13:16:33 -07:00
Linus Torvalds
6aaf4404ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: don't limit active work items
  dlm: use workqueue for callbacks
  dlm: remove deadlock debug print
  dlm: improve rsb searches
  dlm: keep lkbs in idr
  dlm: fix kmalloc args
  dlm: don't do pointless NULL check, use kzalloc and fix order of arguments
  dlm: dump address of unknown node
  dlm: use vmalloc for hash tables
  dlm: show addresses in configfs
2011-07-22 13:16:07 -07:00
Linus Torvalds
ba1f9db908 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
  hfsplus: ensure bio requests are not smaller than the hardware sectors
  hfsplus: Add additional range check to handle on-disk corruptions
  hfsplus: Add error propagation for hfsplus_ext_write_extent_locked
  hfsplus: add error checking for hfs_find_init()
  hfsplus: lift the 2TB size limit
  hfsplus: fix overflow in hfsplus_read_wrapper
  hfsplus: fix overflow in hfsplus_get_block
  hfsplus: assignments inside `if' condition clean-up
2011-07-22 13:12:17 -07:00
Linus Torvalds
49302baa64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: combine duplicated block freeing routines
  GFS2: Add S_NOSEC support
  GFS2: Automatically adjust glock min hold time
  GFS2: Cache dir hash table in a contiguous buffer
2011-07-22 13:10:41 -07:00
Linus Torvalds
59a7ac1211 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (32 commits)
  MAINTAINERS: change e-mail of Adrian Hunter
  UBIFS: fix master node recovery
  UBIFS: improve power cut emulation testing
  UBIFS: rename recovery testing variables
  UBIFS: remove custom list of superblocks
  UBIFS: stop re-defining UBI operations
  UBIFS: switch to I/O helpers
  UBIFS: switch to ubifs_leb_write
  UBIFS: switch to ubifs_leb_read
  UBIFS: introduce more I/O helpers
  UBIFS: always print stacktrace when switching to R/O mode
  UBIFS: remove unused and unneeded debugging function
  UBIFS: add global debugfs knobs
  UBIFS: introduce debugfs helpers
  UBIFS: re-arrange debugging code a bit
  UBIFS: be more informative in failure mode
  UBIFS: switch self-check knobs to debugfs
  UBIFS: lessen amount of debugging check types
  UBIFS: introduce helper functions for debugging checks and tests
  UBIFS: amend debugging inode size check function prototype
  ...
2011-07-22 13:09:35 -07:00
Wang Sheng-Hui
03b5bb3429 ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
In ext2_xattr_get(), the code will acquire xattr_sem first, later checks
the length of xattr name_len > 255. It's unnecessarily time consuming and
also ext2_xattr_set() checks the length before other checks. So move the
check before acquiring xattr_sem to make these two functions consistent.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-07-22 19:41:16 +02:00
Jean Delvare
df2e301fee fs: Merge split strings
No idea why these were split in the first place...

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-07-22 16:47:15 +02:00
Seth Forshee
6596528e39 hfsplus: ensure bio requests are not smaller than the hardware sectors
Currently all bio requests are 512 bytes, which may fail for media
whose physical sector size is larger than this. Ensure these
requests are not smaller than the block device logical block size.

BugLink: http://bugs.launchpad.net/bugs/734883
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-07-22 16:37:44 +02:00
Naohiro Aota
aac4e4198e hfsplus: Add additional range check to handle on-disk corruptions
'recoff' is read from disk and used for an argument to memcpy, so if
the value read from disk is larger than the page size, it result to
"general protection fault". This patch add additional range check for
the value, so that disk fuzz won't cause such fault.

Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-07-22 16:36:56 +02:00
Oleg Nesterov
eac1b5e57d ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
Test-case:

	void *tfunc(void *arg)
	{
		execvp("true", NULL);
		return NULL;
	}

	int main(void)
	{
		int pid;

		if (fork()) {
			pthread_t t;

			kill(getpid(), SIGSTOP);

			pthread_create(&t, NULL, tfunc, NULL);

			for (;;)
				pause();
		}

		pid = getppid();
		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);

		while (wait(NULL) > 0)
			ptrace(PTRACE_CONT, pid, 0,0);

		return 0;
	}

It is racy, exit_notify() does __wake_up_parent() too. But in the
likely case it triggers the problem: de_thread() does release_task()
and the old leader goes away without the notification, the tracer
sleeps in do_wait() without children/tracees.

Change de_thread() to do __wake_up_parent(traced_leader->parent).
Since it is already EXIT_DEAD we can do this without ptrace_unlink(),
EXIT_DEAD threads do not exist from do_wait's pov.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-07-22 15:10:49 +02:00
Phillip Lougher
cc6d349714 Squashfs: Make ZLIB compression support optional
Squashfs now supports XZ and LZO compression in addition to ZLIB.
As such it no longer makes sense to always include ZLIB support.
In particular embedded systems may only use LZO or XZ compression, and
the ability to exclude ZLIB support will reduce kernel size.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2011-07-22 03:01:28 +01:00
Linus Torvalds
2bafc7a275 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  CIFS: Fix wrong length in cifs_iovec_read
2011-07-21 14:28:01 -07:00
Linus Torvalds
b91da88fed vfs: drop conditional inode prefetch in __do_lookup_rcu
It seems to hurt performance in real life.  Yes, the inode will be used
later, but the conditional doesn't seem to predict all that well
(negative dentries are not uncommon) and it looks like the cost of
prefetching is simply higher than depending on the cache doing the right
thing.

As usual.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-21 11:01:42 -07:00
Jan Beulich
b307d4655a FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop
The compiler, at least for ix86 and m68k, validly warns that the
comparison:

	next <= (loff_t)-1

is always true (and it's always true also for x86-64 and probably all
other arches - as long as pgoff_t isn't wider than loff_t).  The
intention appears to be to avoid wrapping of "next", so rather than
eliminating the pointless comparison, fix the loop to indeed get exited
when "next" would otherwise wrap.

On m68k the following warning is observed:

  fs/fscache/page.c: In function '__fscache_uncache_all_inode_pages':
  fs/fscache/page.c:979: warning: comparison is always false due to limited range of data type

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Suresh Jayaraman <sjayaraman@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-21 10:59:16 -07:00
Pavel Shilovsky
2cebaa58b7 CIFS: Fix wrong length in cifs_iovec_read
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-21 00:48:05 +00:00
Al Viro
86c98e8cdb Remove dead code in dget_parent()
->d_parent is never NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:04 -04:00
David Howells
e4b9f00581 AFS: Fix silly characters in a comment
Fix silly characters in a comment in AFS code (some weird characters replaced
the word 'flag' some point way back).

Reported-by: viro@ZenIV.linux.org.uk
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:03 -04:00
Al Viro
4513d899c4 switch d_add_ci() to d_splice_alias() in "found negative" case as well
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:02 -04:00
Al Viro
6c673ab393 simplify gfs2_lookup()
d_splice_alias() will DTRT when given NULL or ERR_PTR

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:02 -04:00
Al Viro
79ac5a46c5 jfs_lookup(): don't bother with . or ..
they'll never be passed to ->lookup()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:01 -04:00
Al Viro
10d9f309d8 get rid of useless dget_parent() in btrfs rename() and link()
->d_parent is locked and stable there...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:00 -04:00
Al Viro
2fbe8c8ad1 get rid of useless dget_parent() in fs/btrfs/ioctl.c
both callers there have dentry->d_parent stabilized by the fact that
their caller had obtained dentry from lookup_one_len() and had not
dropped ->i_mutex on parent since then.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:48:00 -04:00
Josef Bacik
02c24a8218 fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2.  For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:59 -04:00
Josef Bacik
06222e491e fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly.  In some cases
we just return -EINVAL, in others we do the normal generic thing, and in others
we're simply making sure that the properly due-dilligence is done.  For example
in NFS/CIFS we need to make sure the file size is update properly for the
SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself
that is all we have to do.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:58 -04:00
Josef Bacik
c334b1138b Ext4: handle SEEK_HOLE/SEEK_DATA generically
Since Ext4 has its own lseek we need to make sure it handles
SEEK_HOLE/SEEK_DATA.  For now just do the same thing that is done in the generic
case, somebody else can come along and make it do fancy things later.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:57 -04:00
Josef Bacik
b26751575a Btrfs: implement our own ->llseek
In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek.
Basically for the normal SEEK_*'s we will just defer to the generic helper, and
for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest
hole or data.  Currently this helper doesn't check for delalloc bytes for
prealloc space, so for now treat prealloc as data until that is fixed.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:56 -04:00
Josef Bacik
982d816581 fs: add SEEK_HOLE and SEEK_DATA flags
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  We need to match solaris
here, and the definitions are

*o* If /whence/ is SEEK_HOLE, the offset of the start of the
next hole greater than or equal to the supplied offset
is returned. The definition of a hole is provided near
the end of the DESCRIPTION.

*o* If /whence/ is SEEK_DATA, the file pointer is set to the
start of the next non-hole file region greater than or
equal to the supplied offset.

So in the generic case the entire file is data and there is a virtual hole at
the end.  That means we will just return i_size for SEEK_HOLE and will return
the same offset for SEEK_DATA.  This is how Solaris does it so we have to do it
the same way.

Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:56 -04:00
Christoph Hellwig
b4d5b10fb2 reiserfs: make reiserfs default to barrier=flush
Change the default reiserfs mount option to barrier=flush.  Based on a patch
from Jeff Mahoney in the SuSE tree.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:55 -04:00
Christoph Hellwig
00eacd66cd ext3: make ext3 mount default to barrier=1
This patch turns on barriers by default for ext3.  mount -o barrier=0
will turn them off.  Based on a patch from Chris Mason in the SuSE tree.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Ted Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:54 -04:00
Al Viro
b85fd6bdc9 don't open-code parent_ino() in assorted ->readdir()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:54 -04:00
Al Viro
2def9e4ec7 minix_getattr(): don't bother with ->d_parent
we can find superblock easier, TYVM...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:53 -04:00
Al Viro
ee60498f3e coda_venus_readdir(): use offsetof()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:52 -04:00
Kay Sievers
f15146380d fs: seq_file - add event counter to simplify poll() support
Moving the event counter into the dynamically allocated 'struc seq_file'
allows poll() support without the need to allocate its own tracking
structure.

All current users are switched over to use the new counter.

Requested-by: Andrew Morton akpm@linux-foundation.org
Acked-by: NeilBrown <neilb@suse.de>
Tested-by: Lucas De Marchi lucas.demarchi@profusion.mobi
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:50 -04:00
Christoph Hellwig
72c5052ddc fs: move inode_dio_done to the end_io handler
For filesystems that delay their end_io processing we should keep our
i_dio_count until the the processing is done.  Enable this by moving
the inode_dio_done call to the end_io handler if one exist.  Note that
the actual move to the workqueue for ext4 and XFS is not done in
this patch yet, but left to the filesystem maintainers.  At least
for XFS it's not needed yet either as XFS has an internal equivalent
to i_dio_count.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:50 -04:00
Christoph Hellwig
aacfc19c62 fs: simplify the blockdev_direct_IO prototype
Simple filesystems always pass inode->i_sb_bdev as the block device
argument, and never need a end_io handler.  Let's simply things for
them and for my grepping activity by dropping these arguments.  The
only thing not falling into that scheme is ext4, which passes and
end_io handler without needing special flags (yet), but given how
messy the direct I/O code there is use of __blockdev_direct_IO
in one instead of two out of three cases isn't going to make a large
difference anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:49 -04:00
Christoph Hellwig
df2d6f2658 fs: always maintain i_dio_count
Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
This these filesystems to also protect truncate against direct I/O requests
by using common code.  Right now the only non-DIO_LOCKING filesystem that
appears to do so is XFS, which uses an opencoded variant of the i_dio_count
scheme.

Behaviour doesn't change for filesystems never calling inode_dio_wait.
For ext4 behaviour changes when using the dioread_nonlock option, which
previously was missing any protection between truncate and direct I/O reads.
For ocfs2 that handcrafted i_dio_count manipulations are replaced with
the common code now enable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:48 -04:00
Christoph Hellwig
562c72aa57 fs: move inode_dio_wait calls into ->setattr
Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand.  This means filesystem-specific locks to prevent
new dio referenes from appearing can be held.  This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:47 -04:00
Christoph Hellwig
bd5fe6c5eb fs: kill i_alloc_sem
i_alloc_sem is a rather special rw_semaphore.  It's the last one that may
be released by a non-owner, and it's write side is always mirrored by
real exclusion.  It's intended use it to wait for all pending direct I/O
requests to finish before starting a truncate.

Replace it with a hand-grown construct:

 - exclusion for truncates is already guaranteed by i_mutex, so it can
   simply fall way
 - the reader side is replaced by an i_dio_count member in struct inode
   that counts the number of pending direct I/O requests.  Truncate can't
   proceed as long as it's non-zero
 - when i_dio_count reaches non-zero we wake up a pending truncate using
   wake_up_bit on a new bit in i_flags
 - new references to i_dio_count can't appear while we are waiting for
   it to read zero because the direct I/O count always needs i_mutex
   (or an equivalent like XFS's i_iolock) for starting a new operation.

This scheme is much simpler, and saves the space of a spinlock_t and a
struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
system).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:46 -04:00
Christoph Hellwig
f9b5570d7f fs: simplify handling of zero sized reads in __blockdev_direct_IO
Reject zero sized reads as soon as we know our I/O length, and don't
borther with locks or allocations that might have to be cleaned up
otherwise.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:45 -04:00
Jan Kara
9ea7df534e ext4: Rewrite ext4_page_mkwrite() to use generic helpers
Rewrite ext4_page_mkwrite() to use __block_page_mkwrite() helper. This
removes the need of using i_alloc_sem to avoid races with truncate which
seems to be the wrong locking order according to lock ordering documented in
mm/rmap.c. Also calling ext4_da_write_begin() as used by the old code seems to
be problematic because we can decide to flush delay-allocated blocks which
will acquire s_umount semaphore - again creating unpleasant lock dependency
if not directly a deadlock.

Also add a check for frozen filesystem so that we don't busyloop in page fault
when the filesystem is frozen.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:45 -04:00
Christoph Hellwig
5826869158 fat: remove i_alloc_sem abuse
Add a new rw_semaphore to protect bmap against truncate.  Previous
i_alloc_sem was abused for this, but it's going away in this series.

Note that we can't simply use i_mutex, given that the swapon code
calls ->bmap under it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:44 -04:00
Tobias Klauser
8c5dc70aae VFS: Fixup kerneldoc for generic_permission()
The flags parameter went away in
d749519b444db985e40b897f73ce1898b11f997e

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:43 -04:00
Dave Chinner
8daaa83145 xfs: make use of new shrinker callout for the inode cache
Convert the inode reclaim shrinker to use the new per-sb shrinker
operations. This allows much bigger reclaim batches to be used, and
allows the XFS inode cache to be shrunk in proportion with the VFS
dentry and inode caches. This avoids the problem of the VFS caches
being shrunk significantly before the XFS inode cache is shrunk
resulting in imbalances in the caches during reclaim.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:42 -04:00
Dave Chinner
8ab47664d5 vfs: increase shrinker batch size
Now that the per-sb shrinker is responsible for shrinking 2 or more
caches, increase the batch size to keep econmies of scale for
shrinking each cache.  Increase the shrinker batch size to 1024
objects.

To allow for a large increase in batch size, add a conditional
reschedule to prune_icache_sb() so that we don't hold the LRU spin
lock for too long. This mirrors the behaviour of the
__shrink_dcache_sb(), and allows us to increase the batch size
without needing to worry about problems caused by long lock hold
times.

To ensure that filesystems using the per-sb shrinker callouts don't
cause problems, document that the object freeing method must
reschedule appropriately inside loops.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:41 -04:00
Dave Chinner
0e1fdafd93 superblock: add filesystem shrinker operations
Now we have a per-superblock shrinker implementation, we can add a
filesystem specific callout to it to allow filesystem internal
caches to be shrunk by the superblock shrinker.

Rather than perpetuate the multipurpose shrinker callback API (i.e.
nr_to_scan == 0 meaning "tell me how many objects freeable in the
cache), two operations will be added. The first will return the
number of objects that are freeable, the second is the actual
shrinker call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:41 -04:00
Dave Chinner
4f8c19fdf3 inode: remove iprune_sem
Now that we have per-sb shrinkers with a lifecycle that is a subset
of the superblock lifecycle and can reliably detect a filesystem
being unmounted, there is not longer any race condition for the
iprune_sem to protect against. Hence we can remove it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:40 -04:00
Dave Chinner
b0d40c92ad superblock: introduce per-sb cache shrinker infrastructure
With context based shrinkers, we can implement a per-superblock
shrinker that shrinks the caches attached to the superblock. We
currently have global shrinkers for the inode and dentry caches that
split up into per-superblock operations via a coarse proportioning
method that does not batch very well.  The global shrinkers also
have a dependency - dentries pin inodes - so we have to be very
careful about how we register the global shrinkers so that the
implicit call order is always correct.

With a per-sb shrinker callout, we can encode this dependency
directly into the per-sb shrinker, hence avoiding the need for
strictly ordering shrinker registrations. We also have no need for
any proportioning code for the shrinker subsystem already provides
this functionality across all shrinkers. Allowing the shrinker to
operate on a single superblock at a time means that we do less
superblock list traversals and locking and reclaim should batch more
effectively. This should result in less CPU overhead for reclaim and
potentially faster reclaim of items from each filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:10 -04:00
J. Bruce Fields
8fb47a4fbf locks: rename lock-manager ops
Both the filesystem and the lock manager can associate operations with a
lock.  Confusingly, one of them (fl_release_private) actually has the
same name in both operation structures.

It would save some confusion to give the lock-manager ops different
names.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-20 20:23:19 -04:00
Dave Chinner
55fb25d5b3 xfs: add size update tracepoint to IO completion
For improving insight into IO completion behaviour.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-07-20 18:38:04 -05:00