linux

Commit Graph

Author	SHA1	Message	Date
Christoph Hellwig	29d104af0a	xfs: kill the unused struct xfs_sync_work Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:51 +02:00
Christoph Hellwig	f3ca87389d	xfs: remove i_transp Remove the transaction pointer in the inode. It's only used to avoid passing down an argument in the bmap code, and for a few asserts in the transaction code right now. Also use the local variable ip in a few more places in xfs_inode_item_unlock, so that it isn't only used for debug builds after the above change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:47 +02:00
Christoph Hellwig	7a249cf83d	xfs: fix filesystsem freeze race in xfs_trans_alloc As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem freeze when it sleeps during the memory allocation. Fix this by moving the wait_for_freeze call after the memory allocation. This means moving the freeze into the low-level _xfs_trans_alloc helper, which thus grows a new argument. Also fix up some comments in that area while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <david@fromorbit.com>	2011-07-08 14:34:42 +02:00
Christoph Hellwig	33b8f7c247	xfs: improve sync behaviour in the face of aggressive dirtying The following script from Wu Fengguang shows very bad behaviour in XFS when aggressively dirtying data during a sync on XFS, with sync times up to almost 10 times as long as ext4. A large part of the issue is that XFS writes data out itself two times in the ->sync_fs method, overriding the livelock protection in the core writeback code, and another issue is the lock-less xfs_ioend_wait call, which doesn't prevent new ioend from being queue up while waiting for the count to reach zero. This patch removes the XFS-internal sync calls and relies on the VFS to do it's work just like all other filesystems do. Note that the i_iocount wait which is rather suboptimal is simply removed here. We already do it in ->write_inode, which keeps the current supoptimal behaviour. We'll eventually need to remove that as well, but that's material for a separate commit. ------------------------------ snip ------------------------------ #!/bin/sh umount /dev/sda7 mkfs.xfs -f /dev/sda7 # mkfs.ext4 /dev/sda7 # mkfs.btrfs /dev/sda7 mount /dev/sda7 /fs echo $((50<<20)) > /proc/sys/vm/dirty_bytes pid= for i in `seq 10` do dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 & pid="$pid $!" done sleep 1 tic=$(date +'%s') sync tac=$(date +'%s') echo echo sync time: $((tac-tic)) egrep '(Dirty\|Writeback\|NFS_Unstable)' /proc/meminfo pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; } ------------------------------ snip ------------------------------ Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:39 +02:00
Christoph Hellwig	8f04c47aa9	xfs: split xfs_itruncate_finish Split the guts of xfs_itruncate_finish that loop over the existing extents and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs. Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish, which allows to simplify the latter a lot, by only letting it deal with the data fork. As a result xfs_itruncate_finish is renamed to xfs_itruncate_data to make its use case more obvious. Also remove the sync parameter from xfs_itruncate_data, which has been unessecary since the introduction of the busy extent list in 2002, and completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was made a no-op. I can't actually see why the xfs_attr_inactive needs to set the transaction sync, but let's keep this patch simple and without changes in behaviour. Also avoid passing a useless argument to xfs_isize_check, and make it private to xfs_inode.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:34 +02:00
Christoph Hellwig	857b9778d8	xfs: kill xfs_itruncate_start xfs_itruncate_start is a rather length wrapper that evaluates to a call to xfs_ioend_wait and xfs_tosspages, and only has two callers. Instead of using the complicated checks left over from IRIX where we can to truncate the pagecache just call xfs_tosspages (aka truncate_inode_pages) directly as we want to get rid of all data after i_size, and truncate_inode_pages handles incorrect alignments and too large offsets just fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:30 +02:00
Christoph Hellwig	681b120018	xfs: always log timestamp updates in xfs_setattr_size Get rid of the special case where we use unlogged timestamp updates for a truncate to the current inode size, and just call xfs_setattr_nonsize for it to treat it like a utimes calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:26 +02:00
Christoph Hellwig	c4ed4243c4	xfs: split xfs_setattr Split up xfs_setattr into two functions, one for the complex truncate handling, and one for the trivial attribute updates. Also move both new routines to xfs_iops.c as they are fairly Linux-specific. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:23 +02:00
Christoph Hellwig	dec58f1dfd	xfs: work around bogus gcc warning in xfs_allocbt_init_cursor GCC 4.6 complains about an array subscript is above array bounds when using the btree index to index into the agf_levels array. The only two indices passed in are 0 and 1, and we have an assert insuring that. Replace the trick of using the array index directly with using constants in the already existing branch for assigning the XFS_BTREE_LASTREC_UPDATE flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:18 +02:00
Christoph Hellwig	dbcdde3e76	xfs: re-enable non-blocking behaviour in xfs_map_blocks The non-blockig behaviour in xfs_vm_writepage currently is conditional on having both the WB_SYNC_NONE sync_mode and the nonblocking flag set. The latter used to be used by both pdflush, kswapd and a few other places in older kernels, but has been fading out starting with the introduction of the per-bdi flusher threads. Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back the behaviour we want. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:14 +02:00
Christoph Hellwig	680a647b49	xfs: PF_FSTRANS should never be set in ->writepage Now that we reject direct reclaim in addition to always using GFP_NOFS allocation there's no chance we'll ever end up in ->writepage with PF_FSTRANS set. Add a WARN_ON if we hit this case, and stop checking if we'd actually need to start a transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:05 +02:00
Anatolij Gustschin	19495f70d1	UBIFS: fix master node recovery When the 1st LEB was unmapped and written but 2nd LEB not, the master node recovery doesn't succeed after power cut. We see following error when mounting UBIFS partition on NOR flash: UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node Correct 2nd master node offset check is needed to fix the problem. If the 2nd master node is at the end in the 2nd LEB, first master node is used for recovery. When checking for this condition we should check whether the master node is exactly at the end of the LEB (without remaining empty space) or whether it is followed by an empty space less than the master node size. Artem: when the error happened, offs2 = 261120, sz = 512, c->leb_size = 262016. Signed-off-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>	2011-07-08 06:53:18 +03:00
Jeff Layton	04db79b015	cifs: factor smb_vol allocation out of cifs_setup_volume_info Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-08 03:51:23 +00:00
David Howells	c902ce1bfb	FS-Cache: Add a helper to bulk uncache pages on an inode Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-07 13:21:56 -07:00
Alexey Khoroshilov	dd7f3d5458	hfsplus: Add error propagation for hfsplus_ext_write_extent_locked Implement error propagation through the callers of hfsplus_ext_write_extent_locked(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-07 17:45:46 +02:00
Alexey Khoroshilov	5bd9d99d10	hfsplus: add error checking for hfs_find_init() hfs_find_init() may fail with ENOMEM, but there are places, where the returned value is not checked. The consequences can be very unpleasant, e.g. kfree uninitialized pointer and inappropriate mutex unlocking. The patch adds checks for errors in hfs_find_init(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-07-07 17:45:46 +02:00
Miao Xie	149e2d76b4	btrfs: fix oops when doing space balance We need to make sure the data relocation inode doesn't go through the delayed metadata updates, otherwise we get an oops during balance: kernel BUG at fs/btrfs/relocation.c:4303! [SNIP] Call Trace: [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs] [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs] [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs] [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs] [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs] [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs] [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs] [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs] [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs] [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs] [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs] [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs] [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:51:53 -04:00
Josef Bacik	508794eb5e	Btrfs: don't panic if we get an error while balancing V2 A user reported an error where if we try to balance an fs after a device has been removed it will blow up. This is because we get an EIO back and this is where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks, Reported-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:46:43 -04:00
David Sterba	0942caa373	btrfs: add missing options displayed in mount output There are three missed mount options settable by user which are not currently displayed in mount output. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-07-06 18:46:43 -04:00
Masatake YAMATO	bcaadf5c1a	dlm: dump address of unknown node When the dlm fails to make a network connection to another node, include the address of the node in the error message. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-06 16:37:23 -05:00
Dave Chinner	1316d4da3f	xfs: unpin stale inodes directly in IOP_COMMITTED When inodes are marked stale in a transaction, they are treated specially when the inode log item is being inserted into the AIL. It tries to avoid moving the log item forward in the AIL due to a race condition with the writing the underlying buffer back to disk. The was "fixed" in commit `de25c18` ("xfs: avoid moving stale inodes in the AIL"). To avoid moving the item forward, we return a LSN smaller than the commit_lsn of the completing transaction, thereby trying to trick the commit code into not moving the inode forward at all. I'm not sure this ever worked as intended - it assumes the inode is already in the AIL, but I don't think the returned LSN would have been small enough to prevent moving the inode. It appears that the reason it worked is that the lower LSN of the inodes meant they were inserted into the AIL and flushed before the inode buffer (which was moved to the commit_lsn of the transaction). The big problem is that with delayed logging, the returning of the different LSN means insertion takes the slow, non-bulk path. Worse yet is that insertion is to a position -before- the commit_lsn so it is doing a AIL traversal on every insertion, and has to walk over all the items that have already been inserted into the AIL. It's expensive. To compound the matter further, with delayed logging inodes are likely to go from clean to stale in a single checkpoint, which means they aren't even in the AIL at all when we come across them at AIL insertion time. Hence these were all getting inserted into the AIL when they simply do not need to be as inodes marked XFS_ISTALE are never written back. Transactional/recovery integrity is maintained in this case by the other items in the unlink transaction that were modified (e.g. the AGI btree blocks) and committed in the same checkpoint. So to fix this, simply unpin the stale inodes directly in xfs_inode_item_committed() and return -1 to indicate that the AIL insertion code does not need to do any further processing of these inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-07-06 15:44:40 -05:00
Jeff Layton	f9e59bcba2	cifs: have cifs_cleanup_volume_info not take a double pointer ...as that makes for a cumbersome interface. Make it take a regular smb_vol pointer and rely on the caller to zero it out if needed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-06 20:03:05 +00:00
Jeff Layton	b2a0fa1520	cifs: fix build_unc_path_to_root to account for a prefixpath Regression introduced by commit `f87d39d951`. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-06 20:03:05 +00:00
Jeff Layton	677d8537d8	cifs: remove bogus call to cifs_cleanup_volume_info This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's called the following call to cifs_get_tcp_session will oops as the volume_info pointer will then be NULL. The caller of cifs_mount should clean up this data since it passed it in. There's no need for us to call this here. Regression introduced by commit `724d9f1cfb`. Reported-by: Adam Williamson <awilliam@redhat.com> Cc: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-06 20:03:04 +00:00
Davidlohr Bueso	bcb65a797e	FDPIC: Fix memory leak The shdr4extnum variable isn't being freed in the cleanup process of elf_fdpic_core_dump(). Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 12:15:16 -07:00
Miklos Szeredi	a51cb91d81	fs: fix lock initialization locks_alloc_lock() assumed that the allocated struct file_lock is already initialized to zero members. This is only true for the first allocation of the structure, after reuse some of the members will have random values. This will for example result in passing random fl_start values to userspace in fuse for FL_FLOCK locks, which is an information leak at best. Fix by reinitializing those members which may be non-zero after freeing. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-06 10:41:13 -07:00
Linus Torvalds	121782a248	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix sync and dio writes across stripe boundaries libceph: fix page calculation for non-page-aligned io ceph: fix page alignment corrections	2011-07-05 13:15:57 -07:00
Linus Torvalds	a8728d3554	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: Fix double iput of the same inode in hfsplus_fill_super() hfsplus: add missing call to bio_put()	2011-07-05 10:04:27 -07:00
Artem Bityutskiy	a7fa94a9fe	UBIFS: improve power cut emulation testing This patch cleans-up and improves the power cut testing: 1. Kill custom 'simple_random()' function and use 'random32()' instead. 2. Make timeout larger 3. When cutting the buffer - fill the end with random data sometimes, not only with 0xFFs. 4. Some times cut in the middle of the buffer, not always at the end. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:34 +03:00
Artem Bityutskiy	d27462a518	UBIFS: rename recovery testing variables Since the recovery testing is effectively about emulating power cuts by UBIFS, use "power cut" as the base term for all the related variables and name them correspondingly. This is just a minor clean-up for the sake of readability. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:34 +03:00
Artem Bityutskiy	f57cb188cc	UBIFS: remove custom list of superblocks This is a clean-up of the power-cut emulation code - remove the custom list of superblocks which we maintained to find the superblock by the UBI volume descriptor. We do not need that crud any longer, because now we can get the superblock as a function argument. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	0a541b14e8	UBIFS: stop re-defining UBI operations Now when we use UBIFS helpers for all the I/O, we can remove the horrible hack of re-defining UBI I/O functions. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	d3b2578f56	UBIFS: switch to I/O helpers Switch the rest of direct UBI calls to UBIFS helper functions. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	987226a5d3	UBIFS: switch to ubifs_leb_write Stop using 'ubi_leb_write()' directly and switch to the 'ubifs_leb_write()' helper. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	d304820a1f	UBIFS: switch to ubifs_leb_read Instead of using 'ubi_read()' function directly, used the 'ubifs_leb_read()' helper function instead. This allows to get rid of several redundant error messages and make sure that we always have a stack dump on read errors. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	83cef708c6	UBIFS: introduce more I/O helpers Introduce the following I/O helper functions: 'ubifs_leb_read()', 'ubifs_leb_write()', 'ubifs_leb_change()', 'ubifs_leb_unmap()', 'ubifs_leb_map()', 'ubifs_is_mapped(). The idea is to wrap all UBI I/O functions in order to encapsulate various assertions and error path handling (error message, stack dump, switching to R/O mode). And there are some other benefits of this which will be used in the following patches. This patch does not switch whole UBIFS to use these functions yet. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	d033c98b17	UBIFS: always print stacktrace when switching to R/O mode When switching to R/O mode due to an I/O error, always dump the stack, not only when debugging is enabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:33 +03:00
Artem Bityutskiy	891a54a153	UBIFS: remove unused and unneeded debugging function This patch contains several minor clean-up and preparational cahnges. 1. Remove 'dbg_read()', 'dbg_write()', 'dbg_change()', and 'dbg_leb_erase()' functions as they are not used. 2. Remove 'dbg_leb_read()' and 'dbg_is_mapped()' as they are not really needed, it is fine to let reads go through in failure mode. 3. Rename 'offset' argument to 'offs' to be consistent with the rest of UBIFS code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	e7717060dd	UBIFS: add global debugfs knobs Now we have per-FS (superblock) debugfs knobs, but they have one drawback - you have to first mount the FS and only after this you can switch self-checks on/off. But often we want to have the checks enabled during the mount. Introduce global debugging knobs for this purpose. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	28488fc28a	UBIFS: introduce debugfs helpers Separate out pieces of code from the debugfs file read/write functions and create separate 'interpret_user_input()'/'provide_user_output()' helpers. These helpers will be needed in one of the following patches, so this is just a preparational change. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	7dae997de6	UBIFS: re-arrange debugging code a bit Move 'dbg_debugfs_init()' and 'dbg_debugfs_exit()' functions which initialize debugfs for whole UBIFS subsystem below the code which initializes debugfs for a particular UBIFS instance. And do the same for 'ubifs_debugging_init()' and 'ubifs_debugging_exit()' functions. This layout is a bit better for the next patches, so this is just a preparation. Also, rename 'open_debugfs_file()' into 'dfs_file_open()' for consistency. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:32 +03:00
Artem Bityutskiy	24a4f8009e	UBIFS: be more informative in failure mode When we are testing UBIFS recovery, it is better to print in which eraseblock we are going to fail. Currently UBIFS prints it only if recovery debugging messages are enabled, but this is not very practical. So change 'dbg_rcvry()' messages to 'ubifs_warn()' messages. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:30 +03:00
Artem Bityutskiy	81e79d38df	UBIFS: switch self-check knobs to debugfs UBIFS has many built-in self-check functions which can be enabled using the debug_chks module parameter or the corresponding sysfs file (/sys/module/ubifs/parameters/debug_chks). However, this is not flexible enough because it is not per-filesystem. This patch moves this to debugfs interfaces. We already have debugfs support, so this patch just adds more debugfs files. While looking at debugfs support I've noticed that it is racy WRT file-system unmount, and added a TODO entry for that. This problem has been there for long time and it is quite standard debugfs PITA. The plan is to fix this later. This patch is simple, but it is large because it changes many places where we check if a particular type of checks is enabled or disabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:28 +03:00
Artem Bityutskiy	8d7819b4af	UBIFS: lessen amount of debugging check types We have too many different debugging checks - lessen the amount by merging all index-related checks into one. At the same time, move the "force in-the-gap" test to the "index checks" class, because it is too heavy for the "general" class. This patch merges TNC, Old index, and Index size check and calles this just "index checks". Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:28 +03:00
Artem Bityutskiy	2b1844a8c9	UBIFS: introduce helper functions for debugging checks and tests This patch introduces helper functions for all debugging checks, so instead of doing if (!(ubifs_chk_flags & UBIFS_CHK_GEN)) we now do if (!dbg_is_chk_gen(c)) This is a preparation to further changes where the flags will go away, and we'll need to only change the helper functions, but the code which utilizes them won't be touched. At the same time this patch removes 'dbg_force_in_the_gaps()', 'dbg_force_in_the_gaps_enabled()', and dbg_failure_mode helpers for consistency. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:28 +03:00
Artem Bityutskiy	d808efb407	UBIFS: amend debugging inode size check function prototype Add 'const struct ubifs_info *c' parameter to 'dbg_check_synced_i_size()' function because we'll need it in the next patch when we switch to debugfs. So this patch is just a preparation. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	bb2615d4d1	UBIFS: amend debugging name check function prototype Add 'struct ubifs_info *c' parameter to the 'dbg_check_name()' debugging function - it will be needed in one of the following commits where we switch to debugfs. So this is just a preparation. Mark parameters as 'const' while on it. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	06b282a4cc	UBIFS: add few commentaries about TNC Add a couple of comments - while looking into TNC I could not easily figure out few facts, so it is a good idea to document them in the code. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	3766244769	UBIFS: use correct flags in lprops The UBIFS lpt tree is in many aspects similar to the TNC tree, and we have similar flags for these trees. And by mistake we use the COW_ZNODE flag for LPT in some places, instead of the right flag COW_CNODE. And this works only because these two constants have the same value. This patch makes all the LPT code to use COW_CNODE and also changes COW_CNODE constant value to make sure we do not misuse the flags any more. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	f42eed7cba	UBIFS: harmonize znode flag helpers We have 3 znode flags: cow, obsolete, dirty. For the last flag we have a 'ubifs_zn_dirty()' helper function, but for the other 2 flags we use 'test_bit()' directly. This patch makes the situation more consistent and introduces helpers for the other 2 flags: 'ubifs_zn_cow()' and 'ubifs_zn_obsolete()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	1f42596ec0	UBIFS: remove dead code Remove dead pieces of code under "if (c->min_io_size == 1)" statement - we never execute it because in UBIFS 'c->min_io_size' is always at least 8. This are leftovers from old pre-mainline prototype. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	12e776a088	UBIFS: remove unnecessary brackets Remove unnecessary brackets in "inode->i_flags \|= (S_NOCMTIME)" statement to make the code not look silly. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:27 +03:00
Artem Bityutskiy	a29fa9dfa4	UBIFS: minor cleanup: use S_ISREG helper Instead of using long "(inode->i_mode & S_IFMT) != S_IFREG" expression, use shorted "!S_ISREG(inode->i_mode)". Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	1b51e98365	UBIFS: rename dbg_check_dir_size function Since this function is not only about size checking, rename it to 'dbg_check_dir()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	4315fb4072	UBIFS: improve inode dumping function Teach 'dbg_dump_inode()' dump directory entries for directory inodes. This requires few additional changes: 1. The 'c' argument of 'dbg_dump_inode()' cannot be const any more. 2. Users of 'dbg_dump_inode()' should not have 'tnc_mutex' locked. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	bfcf677dec	UBIFS: dump stack when pnode or nnode reading fails When we fail to read a pnode or nnode - print stacktrace if debugging is enabled. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	ae380ce047	UBIFS: lessen the size of debugging info data structure This patch lessens the 'struct ubifs_debug_info' size by 90 bytes by allocating less bytes for the debugfs root directory name. It introduces macros for the name patter an length instead of hard-coding 100 bytes. It also makes UBIFS use 'snprintf()' and teaches it to gracefully catch situations when the name array is too short. Additionally, this patch makes 2 unrelated changes - I just thought they do not deserve separate commits: simplifies 'ubifs_assert()' for non-debugging case and makes 'dbg_debugfs_init()' properly verify debugfs return code which may be an error code or NULL, so we should you 'IS_ERR_OR_NULL()' instead of 'IS_ERR()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Artem Bityutskiy	549c999a76	UBIFS: return EROFS in case of broken commit If commit failed and it is in broken state, UBIFS switches to R/O mode. Most operations return -EROFS in this case, except of commit which returns -EINVAL. Make it return -EROFS too for consistency. This is also important for our power cut emulation testing. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-07-04 10:54:26 +03:00
Bryn M. Reeves	c282af4990	dlm: use vmalloc for hash tables Allocate dlm hash tables in the vmalloc area to allow a greater maximum size without restructuring of the hash table code. Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2011-07-01 15:49:23 -05:00
Denys Vlasenko	bb188d7e64	ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop When multithreaded program execs under ptrace, all traced threads report WIFEXITED status, except for thread group leader and the thread which execs. Unless tracer tracks thread group relationship between tracees, which is a nontrivial task, it will not detect that execed thread no longer exists. This patch allows tracer to figure out which thread performed this exec, by requesting PTRACE_GETEVENTMSG in PTRACE_EVENT_EXEC stop. Another, samller problem which is solved by this patch is that tracer now can figure out which of the several concurrent execs in multithreaded program succeeded. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-07-01 18:51:49 +02:00
Jeff Layton	ee1b3ea9e6	cifs: set socket send and receive timeouts before attempting connect Benjamin S. reported that he was unable to suspend his machine while it had a cifs share mounted. The freezer caused this to spew when he tried it: -----------------------[snip]------------------ PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.01 seconds) done. Freezing remaining freezable tasks ... Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): cifsd S ffff880127f7b1b0 0 1821 2 0x00800000 ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000 ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010 Call Trace: [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19 [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445 [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f [<ffffffff811e81c7>] ? release_sock+0x19/0xef [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs] [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs] [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs] [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs] [<ffffffff810444a1>] ? kthread+0x7a/0x82 [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10 [<ffffffff81044427>] ? kthread+0x0/0x82 [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10 Restarting tasks ... done. -----------------------[snip]------------------ We do attempt to perform a try_to_freeze in cifs_reconnect, but the connection attempt itself seems to be taking longer than 20s to time out. The connect timeout is governed by the socket send and receive timeouts, so we can shorten that period by setting those timeouts before attempting the connect instead of after. Adam Williamson tested the patch and said that it seems to have fixed suspending on his laptop when a cifs share is mounted. Reported-by: Benjamin S <da_joind@gmx.net> Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-07-01 16:15:30 +00:00
Masatake YAMATO	55b3286d3d	dlm: show addresses in configfs Display all addresses the dlm is using for the local node from the configfs file config/dlm/<cluster>/comms/<comm>/addr_list Also make the addr file write only. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2011-06-30 14:45:28 -05:00
Christoph Hellwig	c6d5f5fa65	hfsplus: lift the 2TB size limit Replace the hardcoded 2TB limit with a dynamic limit based on the block size now that we have fixed the few overflows preventing operation with large volumes. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:59 +02:00
Christoph Hellwig	4ba2d5fdcf	hfsplus: fix overflow in hfsplus_read_wrapper For partitions larger than 2TB or at such an offset the hfs wrapper code in hfsplus might overflow the range representable in a 32-bit data type. Make sure we use a sector_t for the arithmetics leading to it. I'm not sure this code can be readed at all as hfs itself never supported such large volumes. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:59 +02:00
Christoph Hellwig	bf1a1b31fa	hfsplus: fix overflow in hfsplus_get_block For filesystems larger than 2TB the final sector number passed to map_bh might overflow the range representable in a 32-bit data type. Make sure we use a sector_t for it and the arithmetics calculating it. Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:58 +02:00
Anton Salikhmetov	2b4f9ca8a5	hfsplus: assignments inside `if' condition clean-up Make assignments outside `if' conditions for unicode.c module where the checkpatch.pl script reported this coding style error. Signed-off-by: Anton Salikhmetov <alexo@tuxera.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-06-30 13:40:58 +02:00
Alexey Khoroshilov	032016a56a	hfsplus: Fix double iput of the same inode in hfsplus_fill_super() There is a misprint in resource deallocation code on error path in hfsplus_fill_super(): the sbi->alloc_file inode is iput twice, while the root inode in not iput at all. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-06-30 13:38:39 +02:00
Seth Forshee	50176ddefa	hfsplus: add missing call to bio_put() hfsplus leaks bio objects by failing to call bio_put() on the bios it allocates. Add the missing call to fix the leak. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Cc: <stable@kernel.org> # .38.x, .39.x Signed-off-by: Christoph Hellwig <hch@lst.de>	2011-06-30 13:28:32 +02:00
Boaz Harrosh	2bea038c52	pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs In current pnfs tree, all the layouts set mds_offset in their .write_pagelist member. mds_offset is only used by generic layer and should be handled by it. This patch is for upstream. It is needed in this -rc series to fix a bug in objects layout_commit. I'll send patches for objects and blocks to be squashed into current pnfs tree. TODO: It looks like the read path needs the same patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-28 14:12:11 -04:00
Vasiliy Kulikov	1d1221f375	proc: restrict access to /proc/PID/io /proc/PID/io may be used for gathering private information. E.g. for openssh and vsftpd daemons wchars/rchars may be used to learn the precise password length. Restrict it to processes being able to ptrace the target process. ptrace_may_access() is needed to prevent keeping open file descriptor of "io" file, executing setuid binary and gathering io information of the setuid'ed process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-28 09:39:11 -07:00
Jan Kara	08142579b6	mm: fix assertion mapping->nrpages == 0 in end_writeback() Under heavy memory and filesystem load, users observe the assertion mapping->nrpages == 0 in end_writeback() trigger. This can be caused by page reclaim reclaiming the last page from a mapping in the following race: CPU0 CPU1 ... shrink_page_list() __remove_mapping() __delete_from_page_cache() radix_tree_delete() evict_inode() truncate_inode_pages() truncate_inode_pages_range() pagevec_lookup() - finds nothing end_writeback() mapping->nrpages != 0 -> BUG page->mapping = NULL mapping->nrpages-- Fix the problem by doing a reliable check of mapping->nrpages under mapping->tree_lock in end_writeback(). Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out by Miklos Szeredi <mszeredi@suse.de>. Cc: Jay <jinshan.xiong@whamcloud.com> Cc: Miklos Szeredi <mszeredi@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-27 18:00:13 -07:00
Bob Liu	2b4b2482e7	romfs: fix romfs_get_unmapped_area() argument check romfs_get_unmapped_area() checks argument `len' without considering PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after commit `f67d9b1576` ("nommu: add page_align to mmap"). Fix the check by changing it in same way ramfs_nommu_get_unmapped_area() was changed in ramfs/file-nommu.c. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-27 18:00:12 -07:00
Linus Torvalds	af4087e0e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix inconsonant inode information Btrfs: make sure to update total_bitmaps when freeing cache V3 Btrfs: fix type mismatch in find_free_extent() Btrfs: make sure to record the transid in new inodes	2011-06-27 13:32:14 -07:00
Oleg Nesterov	087806b128	redefine thread_group_leader() as exit_signal >= 0 Change de_thread() to set old_leader->exit_signal = -1. This is good for the consistency, it is no longer the leader and all sub-threads have exit_signal = -1 set by copy_process(CLONE_THREAD). And this allows us to micro-optimize thread_group_leader(), it can simply check exit_signal >= 0. This also makes sense because we should move ->group_leader from task_struct to signal_struct. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org>	2011-06-27 20:30:10 +02:00
Linus Torvalds	4699d4423c	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent bogus assert when trying to remove non-existent attribute xfs: clear XFS_IDIRTY_RELEASE on truncate down xfs: reset inode per-lifetime state when recycling it	2011-06-27 09:01:29 -07:00
Miao Xie	2f7e33d432	btrfs: fix inconsonant inode information When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-27 11:34:27 -04:00
Linus Torvalds	258e43fdb0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN cifs: free blkcipher in smbhash	2011-06-26 19:40:31 -07:00
Linus Torvalds	804a007f54	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: cifs: propagate errors from cifs_get_root() to mount(2) cifs: tidy cifs_do_mount() up a bit cifs: more breakage on mount failures cifs: close sget() races cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount() cifs: move cifs_umount() call into ->kill_sb() cifs: pull cifs_mount() call up sanitize cifs_umount() prototype cifs: initialize ->tlink_tree in cifs_setup_cifs_sb() cifs: allocate mountdata earlier cifs: leak on mount if we share superblock cifs: don't pass superblock to cifs_mount() cifs: don't leak nls on mount failure cifs: double free on mount failure take bdi setup/destruction into cifs_mount/cifs_umount Acked-by: Steve French <smfrench@gmail.com>	2011-06-26 19:39:22 -07:00
Josef Bacik	9b90f51353	Btrfs: make sure to update total_bitmaps when freeing cache V3 A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-25 09:31:06 -04:00
Ilya Dryomov	e0f5406727	Btrfs: fix type mismatch in find_free_extent() data parameter should be u64 because a full-sized chunk flags field is passed instead of 0/1 for distinguishing data from metadata. All underlying functions expect u64. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-25 09:31:06 -04:00
Al Viro	9403c9c598	cifs: propagate errors from cifs_get_root() to mount(2) ... instead of just failing with -EINVAL Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:43 -04:00
Al Viro	5c4f1ad7c6	cifs: tidy cifs_do_mount() up a bit Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	fa18f1bdce	cifs: more breakage on mount failures if cifs_get_root() fails, we end up with ->mount() returning NULL, which is not what callers expect. Moreover, in case of superblock reuse we end up leaking a superblock reference... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	ee01a14d9d	cifs: close sget() races have ->s_fs_info set by the set() callback passed to sget() Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	d757d71bfc	cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount() all callers of cifs_umount() proceed to do the same thing; pull it into cifs_umount() itself. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	98ab494dd1	cifs: move cifs_umount() call into ->kill_sb() instead of calling it manually in case if cifs_read_super() fails to set ->s_root, just call it from ->kill_sb(). cifs_put_super() is gone now and we have cifs_sb shutdown and destruction done after the superblock is gone from ->s_instances. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	97d1152ace	cifs: pull cifs_mount() call up ... to the point prior to sget(). Now we have cifs_sb set up early enough. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	2a9b99516c	sanitize cifs_umount() prototype a) superblock argument is unused b) it always returns 0 Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	2ced6f6935	cifs: initialize ->tlink_tree in cifs_setup_cifs_sb() no need to wait until cifs_read_super() and we need it done by the time cifs_mount() will be called. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:42 -04:00
Al Viro	5d3bc605ca	cifs: allocate mountdata earlier pull mountdata allocation up, so that it won't stand in the way when we lift cifs_mount() to location before sget(). Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	d687ca380f	cifs: leak on mount if we share superblock cifs_sb and nls end up leaked... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	2c6292ae4b	cifs: don't pass superblock to cifs_mount() To close sget() races we'll need to be able to set cifs_sb up before we get the superblock, so we'll want to be able to do cifs_mount() earlier. Fortunately, it's easy to do - setting ->s_maxbytes can be done in cifs_read_super(), ditto for ->s_time_gran and as for putting MS_POSIXACL into ->s_flags, we can mirror it in ->mnt_cifs_flags until cifs_read_super() is called. Kill unused 'devname' argument, while we are at it... Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	ca171baaad	cifs: don't leak nls on mount failure if cifs_sb allocation fails, we still need to drop nls we'd stashed into volume_info - the one we would've copied to cifs_sb if we could allocate the latter. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	6d6861757d	cifs: double free on mount failure if we get to out_super with ->s_root already set (e.g. with cifs_get_root() failure), we'll end up with cifs_put_super() called and ->mountdata freed twice. We'll also get cifs_sb freed twice and cifs_sb->local_nls dropped twice. The problem is, we can get to out_super both with and without ->s_root, which makes ->put_super() a bad place for such work. Switch to ->kill_sb(), have all that work done there after kill_anon_super(). Unlike ->put_super(), ->kill_sb() is called by deactivate_locked_super() whether we have ->s_root or not. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Al Viro	dd85446619	take bdi setup/destruction into cifs_mount/cifs_umount Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-24 18:39:41 -04:00
Jeff Layton	9b8e072a31	cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN This does not work properly with CIFS as current servers do not enable support for the FILE_OPEN_BY_FILE_ID on SMB NTCreateX and not all NFS clients handle ESTALE. For now, it just plain doesn't work. Mark it BROKEN to discourage distros from enabling it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-24 17:33:30 +00:00
Chris Mason	1973f0faeb	Btrfs: make sure to record the transid in new inodes When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-24 13:13:29 -04:00
Jeff Layton	e4fb0edb7c	cifs: free blkcipher in smbhash This is currently leaked in the rc == 0 case. Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-24 17:03:55 +00:00
Linus Torvalds	5220cc9382	Merge branch 'for-linus' of git://git.kernel.dk/linux-block * 'for-linus' of git://git.kernel.dk/linux-block: block: add REQ_SECURE to REQ_COMMON_MASK block: use the passed in @bdev when claiming if partno is zero block: Add __attribute__((format(printf...) and fix fallout block: make disk_block_events() properly wait for work cancellation block: remove non-syncing __disk_block_events() and fold it into disk_block_events() block: don't use non-syncing event blocking in disk_check_events() cfq-iosched: fix locking around ioc->ioc_data assignment	2011-06-24 08:42:35 -07:00
Linus Torvalds	143e859d05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix wsize negotiation to respect max buffer size and active signing (try #4) CIFS: Fix problem with 3.0-rc1 null user mount failure	2011-06-24 08:35:04 -07:00
Jesper Juhl	46e4edbf7e	Remove unneeded version.h includes from fs/ It was pointed out by 'make versioncheck' that some includes of linux/version.h were not needed in fs/ (fs/btrfs/ctree.h and fs/omfs/file.c). This patch removes them. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-24 08:34:22 -07:00
Dave Chinner	4a33821236	xfs: prevent bogus assert when trying to remove non-existent attribute If the attribute fork on an inode is in btree format and has multiple levels (i.e node format rather than leaf format), then a lookup failure will trigger an assert failure in xfs_da_path_shift if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to indicate to the directory btree code that not finding an entry is not a fatal error. In the case of doing a lookup for a directory name removal, this is valid as a user cannot insert an arbitrary name to remove from the directory btree. However, in the case of the attribute tree, a user has direct control over the attribute name and can ask for any random name to be removed without any validation. In this case, fsstress is asking for a non-existent user.selinux attribute to be removed, and that is causing xfs_da_path_shift() to fall off the bottom of the tree where it asserts that a lookup failure is allowed. Because the flag is not set, we die a horrible death on a debug enable kernel. Prevent this assert from firing on attribute removes by adding the op_flag XFS_DA_OP_OKNOENT to atribute removal operations. Discovered when testing on a SELinux enabled system by fsstress in test 070 by trying to remove a non-existent user.selinux attribute. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:51 -05:00
Dave Chinner	df4368a146	xfs: clear XFS_IDIRTY_RELEASE on truncate down When an inode is truncated down, speculative preallocation is removed from the inode. This should also reset the state bits for controlling whether preallocation is subsequently removed when the file is next closed. The flag is not being cleared, so repeated operations on a file that first involve a truncate (e.g. multiple repeated dd invocations on a file) give different file layouts for the second and subsequent invocations. Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure that speculative delalloc is removed on files that have been truncated down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:46 -05:00
Dave Chinner	778e24bb6d	xfs: reset inode per-lifetime state when recycling it XFS inodes has several per-lifetime state fields that determine the behaviour of the inode. These state fields are not all reset when an inode is reused from the reclaimable state. This can lead to unexpected behaviour of the new inode such as speculative preallocation not being truncated away in the expected manner for local files until the inode is subsequently truncated, freed or cycles out of the cache. It can also lead to an inode being considered to be a filestream inode or having been truncated when that is not the case. Rework the reinitialisation of the inode when it is recycled to ensure that it is pristine before it is reused. While there, also fix the resetting of state flags in the recycling error paths so the inode does not become unreclaimable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:31 -05:00
Jeff Layton	1190f6a067	cifs: fix wsize negotiation to respect max buffer size and active signing (try #4 ) Hopefully last version. Base signing check on CAP_UNIX instead of tcon->unix_ext, also clean up the comments a bit more. According to Hongwei Sun's blog posting here: http://blogs.msdn.com/b/openspecification/archive/2009/04/10/smb-maximum-transmit-buffer-size-and-performance-tuning.aspx CAP_LARGE_WRITEX is ignored when signing is active. Also, the maximum size for a write without CAP_LARGE_WRITEX should be the maxBuf that the server sent in the NEGOTIATE request. Fix the wsize negotiation to take this into account. While we're at it, alter the other wsize definitions to use sizeof(WRITE_REQ) to allow for slightly larger amounts of data to potentially be written per request. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-23 17:54:39 +00:00
Linus Torvalds	bccaeafd7c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: jfs: agstart field must be 64 bits JFS: Don't save agno in the inode jfs: Update agstart when resizing volume jfs: old_agsize should be 64 bits in jfs_extendfs	2011-06-22 21:49:07 -07:00
Pavel Shilovsky	446b23a758	CIFS: Fix problem with 3.0-rc1 null user mount failure Figured it out: it was broken by `b946845a9d` commit - "cifs: cifs_parse_mount_options: do not tokenize mount options in-place". So, as a quick fix I suggest to apply this patch. [PATCH] CIFS: Fix kfree() with constant string in a null user case Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-22 21:43:56 +00:00
Tejun Heo	06d984737b	ptrace: s/tracehook_tracer_task()/ptrace_parent()/ tracehook.h is on the way out. Rename tracehook_tracer_task() to ptrace_parent() and move it from tracehook.h to ptrace.h. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-22 19:26:29 +02:00
Tejun Heo	4b9d33e6d8	ptrace: kill clone/exec tracehooks At this point, tracehooks aren't useful to mainline kernel and mostly just add an extra layer of obfuscation. Although they have comments, without actual in-kernel users, it is difficult to tell what are their assumptions and they're actually trying to achieve. To mainline kernel, they just aren't worth keeping around. This patch kills the following clone and exec related tracehooks. tracehook_prepare_clone() tracehook_finish_clone() tracehook_report_clone() tracehook_report_clone_complete() tracehook_unsafe_exec() The changes are mostly trivial - logic is moved to the caller and comments are merged and adjusted appropriately. The only exception is in check_unsafe_exec() where LSM_UNSAFE_PTRACE* are OR'd to bprm->unsafe instead of setting it, which produces the same result as the field is always zero on entry. It also tests p->ptrace instead of (p->ptrace & PT_PTRACED) for consistency, which also gives the same result. This doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-22 19:26:29 +02:00
Tejun Heo	a288eecce5	ptrace: kill trivial tracehooks At this point, tracehooks aren't useful to mainline kernel and mostly just add an extra layer of obfuscation. Although they have comments, without actual in-kernel users, it is difficult to tell what are their assumptions and they're actually trying to achieve. To mainline kernel, they just aren't worth keeping around. This patch kills the following trivial tracehooks. * Ones testing whether task is ptraced. Replace with ->ptrace test. tracehook_expect_breakpoints() tracehook_consider_ignored_signal() tracehook_consider_fatal_signal() * ptrace_event() wrappers. Call directly. tracehook_report_exec() tracehook_report_exit() tracehook_report_vfork_done() * ptrace_release_task() wrapper. Call directly. tracehook_finish_release_task() * noop tracehook_prepare_release_task() tracehook_report_death() This doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-22 19:26:28 +02:00
Linus Torvalds	2992c4bd57	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix decode_secinfo_maxsz NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test NFSv4.1: Fix some issues with pnfs_generic_pg_test NFSv4.1: file layout must consider pg_bsize for coalescing pnfs-obj: No longer needed to take an extra ref at add_device SUNRPC: Ensure the RPC client only quits on fatal signals NFSv4: Fix a readdir regression nfs4.1: mark layout as bad on error path in _pnfs_return_layout nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path NFS: (d)printks should use %zd for ssize_t arguments NFSv4.1: fix break condition in pnfs_find_lseg nfs4.1: fix several problems with _pnfs_return_layout NFSv4.1: allow zero fh array in filelayout decode layout NFSv4.1: allow nfs_fhget to succeed with mounted on fileid NFSv4.1: Fix a refcounting issue in the pNFS device id cache NFSv4.1: deprecate headerpadsz in CREATE_SESSION NFS41: do not update isize if inode needs layoutcommit NLM: Don't hang forever on NLM unlock requests NFS: fix umount of pnfs filesystems	2011-06-21 18:20:55 -07:00
Linus Torvalds	890879cfa0	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: Fix oops in jbd2_journal_remove_journal_head() jbd2: Remove obsolete parameters in the comments for some jbd2 functions ext4: fixed tracepoints cleanup ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap ext4: Fix max file size and logical block counting of extent format file ext4: correct comments for ext4_free_blocks()	2011-06-21 10:22:35 -07:00
Bryan Schumaker	1650add235	NFS: Fix decode_secinfo_maxsz I initially did the calculation in bytes, and not words Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-21 11:54:07 -04:00
Trond Myklebust	19982ba856	NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test And document what is going on there... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-21 11:54:06 -04:00
Trond Myklebust	8f7d5efbef	NFSv4.1: Fix some issues with pnfs_generic_pg_test 1. If the intention is to coalesce requests 'prev' and 'req' then we have to ensure at least that we have a layout starting at req_offset(prev). 2. If we're only requesting a minimal layout of length desc->pg_count, we need to test the length actually returned by the server before we allow the coalescing to occur. 3. We need to deal correctly with (pgio->lseg == NULL) 4. Fixup the test guarding the pnfs_update_layout. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-21 11:54:05 -04:00
Linus Torvalds	eda0841094	Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: nfsd4: fix break_lease flags on nfsd open nfsd: link returns nfserr_delay when breaking lease nfsd: v4 support requires CRYPTO nfsd: fix dependency of nfsd on auth_rpcgss	2011-06-20 20:10:52 -07:00
Linus Torvalds	3669820650	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper fix comment in generic_permission() kill obsolete comment for follow_down() proc_sys_permission() is OK in RCU mode reiserfs_permission() doesn't need to bail out in RCU mode proc_fd_permission() is doesn't need to bail out in RCU mode nilfs2_permission() doesn't need to bail out in RCU mode logfs doesn't need ->permission() at all coda_ioctl_permission() is safe in RCU mode cifs_permission() doesn't need to bail out in RCU mode bad_inode_permission() is safe from RCU mode ubifs: dereferencing an ERR_PTR in ubifs_mount()	2011-06-20 20:09:15 -07:00
Dave Kleikamp	ecc90462b4	jfs: agstart field must be 64 bits The previous patch added the agstart field to jfs_ip, but declared it a long. We need to make sure its 64 bits on every platform. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 17:53:24 -05:00
Benny Halevy	19345cb299	NFSv4.1: file layout must consider pg_bsize for coalescing Otherwise we end up overflowing the rpc buffer size on the receive end. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-20 16:12:26 -04:00
Linus Torvalds	90a800de0a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: avoid delayed metadata items during commits btrfs: fix uninitialized return value btrfs: fix wrong reservation when doing delayed inode operations btrfs: Remove unused sysfs code btrfs: fix dereference of ERR_PTR value Btrfs: fix relocation races Btrfs: set no_trans_join after trying to expand the transaction Btrfs: protect the pending_snapshots list with trans_lock Btrfs: fix path leakage on subvol deletion Btrfs: drop the delalloc_bytes check in shrink_delalloc Btrfs: check the return value from set_anon_super	2011-06-20 08:58:53 -07:00
Dave Kleikamp	d31b53e3cd	JFS: Don't save agno in the inode Resizing the file system can result in an in-memory inode being remapped to a different aggregate group (AG). A cached AG number can cause problems when trying to free or allocate inodes. Instead, save the IAG's agstart address and calculate the agno when we need it. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 10:53:46 -05:00
Dave Kleikamp	28e0fa894c	jfs: Update agstart when resizing volume A comment indicates that the IAG's agstart does not need to be updated since it will always point to a block in the same aggregate group, but jfs_fsck isn't so forgiving and reports it as an error. I'm fixing this in jfsutils as well, so either a new kernel or new utilities will be sufficient to fix the problem. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 10:32:46 -05:00
Dave Kleikamp	206b6310fd	jfs: old_agsize should be 64 bits in jfs_extendfs Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-20 10:30:04 -05:00
Al Viro	8e833fd2e1	fix comment in generic_permission() CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if no exec bits are set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:56 -04:00
Al Viro	6291176bcd	kill obsolete comment for follow_down() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:49 -04:00
Al Viro	1aec7036d0	proc_sys_permission() is OK in RCU mode nothing blocking there, since all instances of sysctl ->permissions() method are non-blocking - both of them, that is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:25 -04:00
Al Viro	1d29b5a2ed	reiserfs_permission() doesn't need to bail out in RCU mode nothing blocking other than generic_permission() (and check_acl callback does bail out in RCU mode). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:45:21 -04:00
Al Viro	cf12791116	proc_fd_permission() is doesn't need to bail out in RCU mode nothing blocking except generic_permission() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:50 -04:00
Al Viro	730e908f35	nilfs2_permission() doesn't need to bail out in RCU mode Nothing blocking except for generic_permission(). Which will DTRT. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:33 -04:00
Al Viro	a63ab94d67	logfs doesn't need ->permission() at all ... and never did, what with its ->permission() being what we do by default when ->permission is NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:26 -04:00
Al Viro	6b419951f1	coda_ioctl_permission() is safe in RCU mode return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:19 -04:00
Al Viro	ec12781f19	cifs_permission() doesn't need to bail out in RCU mode nothing potentially blocking except generic_permission(), which will DTRT Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:07 -04:00
Al Viro	1712c20dae	bad_inode_permission() is safe from RCU mode return -EIO; is not a blocking operation, thank you very much. Nick, what the hell have you been smoking? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:44:00 -04:00
Dan Carpenter	185bf87393	ubifs: dereferencing an ERR_PTR in ubifs_mount() `d251ed271d` "ubifs: fix sget races" left out the goto from this error path so the static checkers complain that we're dereferencing "sb" when it's an ERR_PTR. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-20 10:42:34 -04:00
J. Bruce Fields	105f462210	nfsd4: fix break_lease flags on nfsd open Thanks to Casey Bodley for pointing out that on a read open we pass 0, instead of O_RDONLY, to break_lease, with the result that a read open is treated like a write open for the purposes of lease breaking! Reported-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-06-20 10:38:01 -04:00
Boaz Harrosh	df18d127f4	pnfs-obj: No longer needed to take an extra ref at add_device Andy's last device_cache patches, already take an extra reference on the newly inserted device_id. So we can remove it from obj-io. Without this patch the device_ids are leaked. Andy's patches are not in Linus tree yet. So I'm not sure if they are scheduled for this Kernel or the next. This patch should be added as part of these. CC: Andy Adamson <andros@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-19 14:49:51 -04:00
Linus Torvalds	8816ead9d8	Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tools/perf: Fix static build of perf tool tracing: Fix regression in printk_formats file * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: Make watchdog robust vs. interruption timerfd: Fix wakeup of processes when timer is cancelled on clock change * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, MAINTAINERS: Add x86 MCE people x86, efi: Do not reserve boot services regions within reserved areas	2011-06-19 09:00:18 -07:00
Linus Torvalds	c11760c6d8	isofs: fix bh leak in isofs_fill_super() error case In isofs_fill_super(), when an iso_primary_descriptor is found, it is kept in pri_bh. The error cases don't properly release it. Fix it. Reported-and-tested-by: 김원석 <stanley.will.kim@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-18 07:25:42 -07:00
Chris Mason	e999376f09	Btrfs: avoid delayed metadata items during commits Snapshot creation has two phases. One is the initial snapshot setup, and the second is done during commit, while nobody is allowed to modify the root we are snapshotting. The delayed metadata insertion code can break that rule, it does a delayed inode update on the inode of the parent of the snapshot, and delayed directory item insertion. This makes sure to run the pending delayed operations before we record the snapshot root, which avoids corruptions. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 16:38:47 -04:00
David Sterba	35a30d7ce5	btrfs: fix uninitialized return value When allocation fails in btrfs_read_fs_root_no_name, ret is not set although it is returned, holding a garbage value. Signed-off-by: David Sterba <dsterba@suse.cz> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:18 -04:00
Miao Xie	19fd294957	btrfs: fix wrong reservation when doing delayed inode operations We have migrated the space for the delayed inode items from trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to global_block_rsv when we doing delayed inode operations, and the following Oops happened: [ 9792.654889] ------------[ cut here ]------------ [ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681 btrfs_alloc_free_block+0xca/0x27c [btrfs]() [ 9792.654899] Hardware name: To Be Filled By O.E.M. [ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [ 9792.654919] Pid: 2762, comm: rm Tainted: G W 2.6.39+ #1 [ 9792.654920] Call Trace: [ 9792.654922] [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b [ 9792.654925] [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c [ 9792.654933] [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs] [ 9792.654945] [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs] [ 9792.654953] [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs] [ 9792.654963] [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs] [ 9792.654970] [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs] [ 9792.654978] [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs] [ 9792.654986] [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs] [ 9792.654997] [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs] [ 9792.655022] [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs] [ 9792.655025] [<ffffffff8147afac>] ? _cond_resched+0xe/0x22 [ 9792.655027] [<ffffffff8147b892>] ? mutex_lock+0x29/0x50 [ 9792.655039] [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs] [ 9792.655051] [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs] [ 9792.655062] [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs] [ 9792.655064] [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a [ 9792.655075] [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs] [ 9792.655077] [<ffffffff81132bd6>] evict+0x71/0x111 [ 9792.655079] [<ffffffff81132de0>] iput+0x12a/0x132 [ 9792.655081] [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155 [ 9792.655083] [<ffffffff81127b83>] ? path_put+0x1f/0x23 [ 9792.655085] [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171 [ 9792.655087] [<ffffffff81128410>] ? putname+0x34/0x36 [ 9792.655090] [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b [ 9792.655092] [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b [ 9792.655093] ---[ end trace 02b696eb02b3f768 ]--- This patch fix it by setting the reservation of the transaction handle to the correct one. Reported-by: Josef Bacik <josef@redhat.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:18 -04:00
Maarten Lankhorst	9fe6a50fb7	btrfs: Remove unused sysfs code Removes code no longer used. The sysfs file itself is kept, because the btrfs developers expressed interest in putting new entries to sysfs. Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:18 -04:00
David Sterba	3ed4498caf	btrfs: fix dereference of ERR_PTR value smatch reports: btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing possible ERR_PTR() Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:54:17 -04:00
Chris Mason	e038dca803	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 14:16:13 -04:00
Linus Torvalds	01eff85b09	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: make log devices with write back caches work xfs: fix ->mknod() return value on xfs_get_acl() failure	2011-06-17 10:37:41 -07:00
Chris Mason	7585717f30	Btrfs: fix relocation races The recent commit to get rid of our trans_mutex introduced some races with block group relocation. The problem is that relocation needs to do some record keeping about each root, and it was relying on the transaction mutex to coordinate things in subtle ways. This fix adds a mutex just for the relocation code and makes sure it doesn't have a big impact on normal operations. The race is really fixed in btrfs_record_root_in_trans, which is where we step back and wait for the relocation code to finish accounting setup. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-17 13:36:58 -04:00
David Howells	879669961b	KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring ____call_usermodehelper() now erases any credentials set by the subprocess_inf::init() function. The problem is that commit `17f60a7da1` ("capabilites: allow the application of capability limits to usermode helpers") creates and commits new credentials with prepare_kernel_cred() after the call to the init() function. This wipes all keyrings after umh_keys_init() is called. The best way to deal with this is to put the init() call just prior to the commit_creds() call, and pass the cred pointer to init(). That means that umh_keys_init() and suchlike can modify the credentials _before_ they are published and potentially in use by the rest of the system. This prevents request_key() from working as it is prevented from passing the session keyring it set up with the authorisation token to /sbin/request-key, and so the latter can't assume the authority to instantiate the key. This causes the in-kernel DNS resolver to fail with ENOKEY unconditionally. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-17 09:40:48 -07:00
Linus Torvalds	8b97b21e0f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd: proc: Fix Oops on stat of /proc/<zombie pid>/ns/net	2011-06-16 15:02:20 -07:00
Trond Myklebust	ee7b75fc4f	NFSv4: Fix a readdir regression Commit `7ebb9315` (NFS: use secinfo when crossing mountpoints) introduces a regression when decoding an NFSv4 readdir entry that sets the rdattr_error field. By treating the resulting value as if it is a decoding error, the current code may cause us to skip valid readdir entries. Reported-by: Andy Adamson <andros@netapp.com> Cc: stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-16 13:24:47 -04:00
Linus Torvalds	8dac6bee32	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: AFS: Use i_generation not i_version for the vnode uniquifier AFS: Set s_id in the superblock to the volume name vfs: Fix data corruption after failed write in __block_write_begin() afs: afs_fill_page reads too much, or wrong data VFS: Fix vfsmount overput on simultaneous automount fix wrong iput on d_inode introduced by `e6bc45d65d` Delay struct net freeing while there's a sysfs instance refering to it afs: fix sget() races, close leak on umount ubifs: fix sget races ubifs: split allocation of ubifs_info into a separate function fix leak in proc_set_super()	2011-06-16 10:21:59 -07:00
Christoph Hellwig	a27a263bae	xfs: make log devices with write back caches work There's no reason not to support cache flushing on external log devices. The only thing this really requires is flushing the data device first both in fsync and log commits. A side effect is that we also have to remove the barrier write test during mount, which has been superflous since the new FLUSH+FUA code anyway. Also use the chance to flush the RT subvolume write cache before the fsync commit, which is required for correct semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-16 10:52:39 -05:00
David Howells	d6e43f751f	AFS: Use i_generation not i_version for the vnode uniquifier Store the AFS vnode uniquifier in the i_generation field, not the i_version field of the inode struct. i_version can then be given the AFS data version number. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:48 -04:00
David Howells	2e41ae225f	AFS: Set s_id in the superblock to the volume name Set s_id in the superblock to the name of the AFS volume that this superblock corresponds to. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:47 -04:00
Jan Kara	f9f07b6c13	vfs: Fix data corruption after failed write in __block_write_begin() I've got a report of a file corruption from fsxlinux on ext3. The important operations to the page were: mapwrite to a hole partial write to the page read - found the page zeroed from the end of the normal write The culprit seems to be that if get_block() fails in __block_write_begin() (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page). Thus when we retry the write, the logic in __block_write_begin() thinks zeroing of the page is needed and overwrites old data. In fact, I don't see why we should ever need to zero the uptodate bit here - either the page was uptodate when we entered __block_write_begin() and it should stay so when we leave it, or it was not uptodate and noone had right to set it uptodate during __block_write_begin() so it remains !uptodate when we leave as well. So just remove clearing of the bit. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:46 -04:00
Anton Blanchard	5e7f23373b	afs: afs_fill_page reads too much, or wrong data afs_fill_page should read the page that is about to be written but the current implementation has a number of issues. If we aren't extending the file we always read PAGE_CACHE_SIZE at offset 0. If we are extending the file we try to read the entire file. Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset, clamped to i_size. While here, avoid calling afs_fill_page when we are doing a PAGE_CACHE_SIZE write. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:44:46 -04:00
Al Viro	8aef188452	VFS: Fix vfsmount overput on simultaneous automount [Kudos to dhowells for tracking that crap down] If two processes attempt to cause automounting on the same mountpoint at the same time, the vfsmount holding the mountpoint will be left with one too few references on it, causing a BUG when the kernel tries to clean up. The problem is that lock_mount() drops the caller's reference to the mountpoint's vfsmount in the case where it finds something already mounted on the mountpoint as it transits to the mounted filesystem and replaces path->mnt with the new mountpoint vfsmount. During a pathwalk, however, we don't take a reference on the vfsmount if it is the same as the one in the nameidata struct, but do_add_mount() doesn't know this. The fix is to make sure we have a ref on the vfsmount of the mountpoint before calling do_add_mount(). However, if lock_mount() doesn't transit, we're then left with an extra ref on the mountpoint vfsmount which needs releasing. We can handle that in follow_managed() by not making assumptions about what we can and what we cannot get from lookup_mnt() as the current code does. The callers of follow_managed() expect that reference to path->mnt will be grabbed iff path->mnt has been changed. follow_managed() and follow_automount() keep track of whether such reference has been grabbed and assume that it'll happen in those and only those cases that'll have us return with changed path->mnt. That assumption is almost correct - it breaks in case of racing automounts and in even harder to hit race between following a mountpoint and a couple of mount --move. The thing is, we don't need to make that assumption at all - after the end of loop in follow_manage() we can check if path->mnt has ended up unchanged and do mntput() if needed. The BUG can be reproduced with the following test program: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/wait.h> int main(int argc, char **argv) { int pid, ws; struct stat buf; pid = fork(); stat(argv[1], &buf); if (pid > 0) wait(&ws); return 0; } and the following procedure: (1) Mount an NFS volume that on the server has something else mounted on a subdirectory. For instance, I can mount / from my server: mount warthog:/ /mnt -t nfs4 -r On the server /data has another filesystem mounted on it, so NFS will see a change in FSID as it walks down the path, and will mark /mnt/data as being a mountpoint. This will cause the automount code to be triggered. !!! Do not look inside the mounted fs at this point !!! (2) Run the above program on a file within the submount to generate two simultaneous automount requests: /tmp/forkstat /mnt/data/testfile (3) Unmount the automounted submount: umount /mnt/data (4) Unmount the original mount: umount /mnt At this point the kernel should throw a BUG with something like the following: BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12] Note that the bug appears on the root dentry of the original mount, not the mountpoint and not the submount because sys_umount() hasn't got to its final mntput_no_expire() yet, but this isn't so obvious from the call trace: [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs] [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs] [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b [<ffffffff81186261>] mntput_no_expire+0x18d/0x199 [<ffffffff811862a8>] mntput+0x3b/0x44 [<ffffffff81186d87>] release_mounts+0xa2/0xbf [<ffffffff811876af>] sys_umount+0x47a/0x4ba [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b as do_umount() is inlined. However, you can see release_mounts() in there. Note also that it may be necessary to have multiple CPU cores to be able to trigger this bug. Tested-by: Jeff Layton <jlayton@redhat.com> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:28:16 -04:00
Török Edwin	50338b889d	fix wrong iput on d_inode introduced by `e6bc45d65d` Git bisection shows that commit `e6bc45d65d` causes BUG_ONs under high I/O load: kernel BUG at fs/inode.c:1368! [ 2862.501007] Call Trace: [ 2862.501007] [<ffffffff811691d8>] d_kill+0xf8/0x140 [ 2862.501007] [<ffffffff81169c19>] dput+0xc9/0x190 [ 2862.501007] [<ffffffff8115577f>] fput+0x15f/0x210 [ 2862.501007] [<ffffffff81152171>] filp_close+0x61/0x90 [ 2862.501007] [<ffffffff81152251>] sys_close+0xb1/0x110 [ 2862.501007] [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b A reliable way to reproduce this bug is: Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk, and apt-get remove openjdk-6-jdk. The buggy part of the patch is this: struct inode inode = NULL; ..... - if (nd.last.name[nd.last.len]) - goto slashes; inode = dentry->d_inode; - if (inode) - ihold(inode); + if (nd.last.name[nd.last.len] \|\| !inode) + goto slashes; + ihold(inode) ... if (inode) iput(inode); / truncate the inode here */ If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken), and dentry->d_inode is non-NULL, then this code now does an additional iput on the inode, which is wrong. Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0. Reference: https://lkml.org/lkml/2011/6/15/50 Reported-by: Norbert Preining <preining@logic.at> Reported-by: Török Edwin <edwintorok@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-16 11:27:39 -04:00
Linus Torvalds	13fca640bb	Revert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP" This reverts commit `7f81c8890c`. It turns out that it's not actually a build-time check on x86-64 UML, which does some seriously crazy stuff with VM_STACK_FLAGS. The VM_STACK_FLAGS define depends on the arch-supplied VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have arch/um/sys-x86_64/shared/sysdep/vm-flags.h: #define VM_STACK_DEFAULT_FLAGS \ (test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags) #define VM_STACK_DEFAULT_FLAGS vm_stack_flags (yes, seriously: two different #define's for that thing, with the first one being inside an "#ifdef TIF_IA32") It's possible that it is UML that should just be fixed in this area, but for now let's just undo the (very small) optimization. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 21:53:52 -07:00
Michal Hocko	7f81c8890c	fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP Commit `a8bef8ff6e` ("mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks") introduced a BUG_ON() to ensure that VM_STACK_FLAGS and VM_STACK_INCOMPLETE_SETUP do not overlap. The check is a compile time one, so BUILD_BUG_ON is more appropriate. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:03:59 -07:00
Eric W. Biederman	793925334f	proc: Fix Oops on stat of /proc/<zombie pid>/ns/net Don't call iput with the inode half setup to be a namespace filedescriptor. Instead rearrange the code so that we don't initialize ei->ns_ops until after I ns_ops->get succeeds, preventing us from invoking ns_ops->put when ns_ops->get failed. Reported-by: Ingo Saitz <Ingo.Saitz@stud.uni-hannover.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-06-15 14:35:29 -07:00
Fred Isaman	9e2dfdb308	nfs4.1: mark layout as bad on error path in _pnfs_return_layout Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 14:36:33 -04:00
Josef Bacik	ed0ca14021	Btrfs: set no_trans_join after trying to expand the transaction We can lockup if we try to allow new writers join the transaction and we have flushoncommit set or have a pending snapshot. This is because we set no_trans_join and then loop around and try to wait for ordered extents again. The problem is the ordered endio stuff needs to join the transaction, which it can't do because no_trans_join is set. So instead wait until after this loop to set no_trans_join and then make sure to wait for num_writers == 1 in case anybody got started in between us exiting the loop and setting no_trans_join. This could easily be reproduced by mounting -o flushoncommit and running xfstest 13. It cannot be reproduced with this patch. Thanks, Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-15 13:24:47 -04:00
Josef Bacik	8351583e3f	Btrfs: protect the pending_snapshots list with trans_lock Currently there is nothing protecting the pending_snapshots list on the transaction. We only hold the directory mutex that we are snapshotting and a read lock on the subvol_sem, so we could race with somebody else creating a snapshot in a different directory and end up with list corruption. So protect this list with the trans_lock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-15 13:24:46 -04:00
Josef Bacik	71d7aed014	Btrfs: fix path leakage on subvol deletion The delayed ref patch accidently removed the btrfs_free_path in btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-15 13:24:45 -04:00
Fred Isaman	ea0ded748b	nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout mark_matching_lsegs_invalid could put the last ref to the layout, so the get_layout_hdr needs to be called first. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 12:39:23 -04:00
Benny Halevy	1ed3a8539a	NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path We always get a reference on the layout header and we rely on nfs4_layoutreturn_release to put it. If we hit an allocation error before starting the rpc proc we bail out early without dereferncing the layout header properly. Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:52:15 -04:00
David Howells	c7fd06228b	NFS: (d)printks should use %zd for ssize_t arguments (d)printks should use %zd for ssize_t arguments not %ld, otherwise they might get a warning. I see the following with MN10300. fs/nfs/objlayout/objlayout.c: In function 'objlayout_read_done': fs/nfs/objlayout/objlayout.c:294: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t' Signed-off-by: David Howells <dhowells@redhat.com> cc: Trond Myklebust <Trond.Myklebust@netapp.com> cc: linux-nfs@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:31 -04:00
Benny Halevy	d771e3a43e	NFSv4.1: fix break condition in pnfs_find_lseg The break condition to skip out of the loop got broken when cmp_layout was change. Essentially, we want to stop looking once we know no layout on the remainder of the list can match the first byte of the looked-up range. Reported-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:31 -04:00
Fred Isaman	a2e1d4f2e5	nfs4.1: fix several problems with _pnfs_return_layout _pnfs_return_layout had the following problems: - it did not call pnfs_free_lseg_list on all paths - it unintentionally did a forgetful return when there was no outstanding io - it raced with concurrent LAYOUTGETS Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:30 -04:00
Andy Adamson	cec765cf58	NFSv4.1: allow zero fh array in filelayout decode layout Signed-off-by: Andy Adamson <andros@netapp.com> cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:30 -04:00
Andy Adamson	533eb4611c	NFSv4.1: allow nfs_fhget to succeed with mounted on fileid Commit `28331a46d8` "Ensure we request the ordinary fileid when doing readdirplus" changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when FATTR4_WORD1_MOUNTED_ON_FILED was requested. Allow nfs_fhget to succeed with only a mounted on fileid when crossing a mountpoint or a referral. Ask for the fileid of the absent file system if mounted_on_fileid is not supported. Signed-off-by: Andy Adamson <andros@netapp.com> cc:stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:29 -04:00
Trond Myklebust	1d92a08da2	NFSv4.1: Fix a refcounting issue in the pNFS device id cache When we add something to the global device id cache, we need to bump the reference count, so that the cache itself holds a reference. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:29 -04:00
Benny Halevy	c9c30dd5f7	NFSv4.1: deprecate headerpadsz in CREATE_SESSION We don't support header padding yet so better off ditching it Reported-by: Sid Moore <learnmost@gmail.com> Signed-off-by: Benny Halevy <benny@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:28 -04:00
Peng Tao	0f66b5984d	NFS41: do not update isize if inode needs layoutcommit nfs_update_inode will update isize if there is no queued pages. For pNFS, layoutcommit is supposed to change file size on server, the same effect as queued pages. nfs_update_inode may be called when dirty pages are written back (nfsi->npages==0) but layoutcommit is not sent, and it will change client file size according to server file size. Then client ends up losing what it just writes back in pNFS path. So we should skip updating client file size if file needs layoutcommit. Signed-off-by: Peng Tao <peng_tao@emc.com> Cc: stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:24:28 -04:00
Trond Myklebust	0b760113a3	NLM: Don't hang forever on NLM unlock requests If the NLM daemon is killed on the NFS server, we can currently end up hanging forever on an 'unlock' request, instead of aborting. Basically, if the rpcbind request fails, or the server keeps returning garbage, we really want to quit instead of retrying. Tested-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-06-15 11:24:27 -04:00
Weston Andros Adamson	9e3bd4e24e	NFS: fix umount of pnfs filesystems Unmounting a pnfs filesystem hangs using filelayout and possibly others. This fixes the use of the rcu protected node by making use of a new 'tmpnode' for the temporary purge list. Also, the spinlock shouldn't be held when calling synchronize_rcu(). Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-06-15 11:23:02 -04:00
Steve French	1252b3013b	[CIFS] update cifs version to 1.73 Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-14 16:19:54 +00:00
Al Viro	c46a131c0c	xfs: fix ->mknod() return value on xfs_get_acl() failure ->mknod() should return negative on errors and PTR_ERR() gives already negative value... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-14 11:02:13 -05:00
Steve French	040d15c867	[CIFS] trivial cleanup fscache cFYI and cERROR messages ... for uniformity and cleaner debug logs. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-14 15:51:18 +00:00
Max Asbock	1123d93963	timerfd: Fix wakeup of processes when timer is cancelled on clock change Currently processes waiting with poll on cancelable timerfd timers are not woken up when the timers are canceled. When the system time is set the clock_was_set() function calls timerfd_clock_was_set() to cancel and wake up processes waiting on potential cancelable timerfd timers. However the wake up currently has no effect because in the case of timerfd_read it is dependent on ctx->ticks not being 0. timerfd_poll also requires ctx->ticks being non zero. As a consequence processes waiting on cancelable timers only get woken up when the timers expire. This patch fixes this by incrementing ctx->ticks before calling wake_up. Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com> Cc: kay.sievers@vrfy.org Cc: virtuoso@slind.org Cc: johnstul <johnstul@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1307985512.4710.41.camel@w-amax.beaverton.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-06-14 11:46:14 +02:00
Sage Weil	d7f124f129	ceph: fix sync and dio writes across stripe boundaries We were iterating across stripe boundaries properly, but not moving the write buffer pointer forward. This caused us to rewrite the same data after the break. Fix by adjusting the data pointer forward, and recalculating the io and buffer alignment after the break. Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-13 16:26:22 -07:00
Sage Weil	773e9b4426	ceph: fix page alignment corrections dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1 Reported-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-13 16:26:10 -07:00
Jeff Layton	8d1bca328b	cifs: correctly handle NULL tcon pointer in CIFSTCon Long ago (in commit `00e485b0`), I added some code to handle share-level passwords in CIFSTCon. That code ignored the fact that it's legit to pass in a NULL tcon pointer when connecting to the IPC$ share on the server. This wasn't really a problem until recently as we only called CIFSTCon this way when the server returned -EREMOTE. With the introduction of commit `c1508ca2` however, it gets called this way on every mount, causing an oops when share-level security is in effect. Fix this by simply treating a NULL tcon pointer as if user-level security were in effect. I'm not aware of any servers that protect the IPC$ share with a specific password anyway. Also, add a comment to the top of CIFSTCon to ensure that we don't make the same mistake again. Cc: <stable@kernel.org> Reported-by: Martijn Uffing <mp3project@sarijopen.student.utwente.nl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-13 20:34:34 +00:00
Jeff Layton	3e71551364	cifs: show sec= option in /proc/mounts Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-13 20:34:34 +00:00
Jeff Layton	7fdbaa1b8d	cifs: don't allow cifs_reconnect to exit with NULL socket pointer It's possible for the following set of events to happen: cifsd calls cifs_reconnect which reconnects the socket. A userspace process then calls cifs_negotiate_protocol to handle the NEGOTIATE and gets a reply. But, while processing the reply, cifsd calls cifs_reconnect again. Eventually the GlobalMid_Lock is dropped and the reply from the earlier NEGOTIATE completes and the tcpStatus is set to CifsGood. cifs_reconnect then goes through and closes the socket and sets the pointer to zero, but because the status is now CifsGood, the new socket is not created and cifs_reconnect exits with the socket pointer set to NULL. Fix this by only setting the tcpStatus to CifsGood if the tcpStatus is CifsNeedNegotiate, and by making sure that generic_ip_connect is always called at least once in cifs_reconnect. Note that this is not a perfect fix for this issue. It's still possible that the NEGOTIATE reply is handled after the socket has been closed and reconnected. In that case, the socket state will look correct but it no NEGOTIATE was performed on it be for the wrong socket. In that situation though the server should just shut down the socket on the next attempted send, rather than causing the oops that occurs today. Cc: <stable@kernel.org> # .38.x: fd88ce9: [CIFS] cifs: clarify the meaning of tcpStatus == CifsGood Reported-and-Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-13 20:34:33 +00:00
Pavel Shilovsky	cd51875d53	CIFS: Fix sparse error cifs_sb_master_tlink was declared as inline, but without a definition. Remove the declaration and move the definition up. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-13 20:34:33 +00:00
Jan Kara	de1b794130	jbd2: Fix oops in jbd2_journal_remove_journal_head() jbd2_journal_remove_journal_head() can oops when trying to access journal_head returned by bh2jh(). This is caused for example by the following race: TASK1 TASK2 jbd2_journal_commit_transaction() ... processing t_forget list __jbd2_journal_refile_buffer(jh); if (!jh->b_transaction) { jbd_unlock_bh_state(bh); jbd2_journal_try_to_free_buffers() jbd2_journal_grab_journal_head(bh) jbd_lock_bh_state(bh) __journal_try_to_free_buffer() jbd2_journal_put_journal_head(jh) jbd2_journal_remove_journal_head(bh); jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not part of any transaction and thus frees journal_head before TASK1 gets to doing so. Note that even buffer_head can be released by try_to_free_buffers() after jbd2_journal_put_journal_head() which adds even larger opportunity for oops (but I didn't see this happen in reality). Fix the problem by making transactions hold their own journal_head reference (in b_jcount). That way we don't have to remove journal_head explicitely via jbd2_journal_remove_journal_head() and instead just remove journal_head when b_jcount drops to zero. The result of this is that [__]jbd2_journal_refile_buffer(), [__]jbd2_journal_unfile_buffer(), and __jdb2_journal_remove_checkpoint() can free journal_head which needs modification of a few callers. Also we have to be careful because once journal_head is removed, buffer_head might be freed as well. So we have to get our own buffer_head reference where it matters. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-13 15:38:22 -04:00
Linus Torvalds	ffdb8f1bfb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: unwind canceled flock state ceph: fix ENOENT logic in striped_read ceph: fix short sync reads from the OSD ceph: fix sync vs canceled write ceph: use ihold when we already have an inode ref	2011-06-13 11:21:50 -07:00
Linus Torvalds	c78a9b9b8e	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: Revert `8ab2b7efd` ftrace: Remove unnecessary disabling of irqs kprobes/trace: Fix kprobe selftest for gcc 4.6 ftrace: Fix possible undefined return code oprofile, dcookies: Fix possible circular locking dependency oprofile: Fix locking dependency in sync_start() oprofile: Free potentially owned tasks in case of errors oprofile, x86: Add comments to IBS LVT offset initialization	2011-06-13 10:45:49 -07:00
Linus Torvalds	33a538833f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix problem in setting checkpoint interval nilfs2: fix missing block address termination in btree node shrinking nilfs2: fix incorrect block address termination in node concatenation	2011-06-13 10:32:24 -07:00
Chris Mason	f4c4401621	Btrfs: drop the delalloc_bytes check in shrink_delalloc Even when delalloc_bytes is zero, we may need to sleep while waiting for delalloc space. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-13 11:30:47 -04:00
Chris Mason	ac08aedfa5	Btrfs: check the return value from set_anon_super Al Viro noticed we weren't checking for set_anon_super failures. This adds the required checks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-13 11:28:50 -04:00
Tejun Heo	d4c208b86b	block: use the passed in @bdev when claiming if partno is zero `6b4517a791` (block: implement bd_claiming and claiming block) introduced claiming block to support O_EXCL blkdev opens properly. bd_start_claiming() looks up the part 0 bdev and starts claiming block. The function assumed that there is only one part 0 bdev and always used bdget_disk(disk, 0) to look it up; unfortunately, this isn't true for some drivers (floppy) which use multiple block devices to denote different operating parameters for the same physical device. There can be multiple part 0 bdev's for the same device number. This incorrect assumption caused the wrong bdev to be used during claiming leading to unbalanced bd_holders as reported in the following bug. https://bugzilla.kernel.org/show_bug.cgi?id=28522 This patch updates bd_start_claiming() such that it uses the bdev specified as argument if its partno is zero. Note that this means that different bdev's can be used for the same device and O_EXCL check can be effectively bypassed. It has always been broken that way and floppy is fortunately on its way out. Leave that breakage alone. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec> Tested-by: Alex Villacis Lasso <avillaci@ceibo.fiec.espol.edu.ec> Cc: stable@kernel.org # >= v2.6.36 Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-06-13 12:45:48 +02:00
Tao Ma	1fb74cda1b	jbd2: Remove obsolete parameters in the comments for some jbd2 functions credits isn't a parameter for jbd2_journal_get_write_access and jbd2_journal_get_undo_access. So remove the corresponding comments. Acked-by: Jan Kara <jack@suse.cz> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-12 22:44:10 -04:00
Al Viro	a685e08987	Delay struct net freeing while there's a sysfs instance refering to it * new refcount in struct net, controlling actual freeing of the memory * new method in kobj_ns_type_operations (->drop_ns()) * ->current_ns() semantics change - it's supposed to be followed by corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps the new refcount; net_drop_ns() decrements it and calls net_free() if the last reference has been dropped. Method renamed to ->grab_current_ns(). * old net_free() callers call net_drop_ns() instead. * sysfs_exit_ns() is gone, along with a large part of callchain leading to it; now that the references stored in ->ns[...] stay valid we do not need to hunt them down and replace them with NULL. That fixes problems in sysfs_lookup() and sysfs_readdir(), along with getting rid of sb->s_instances abuse. Note that struct net shutdown logics has not changed - net_cleanup() is called exactly when it used to be called. The only thing postponed by having a sysfs instance refering to that struct net is actual freeing of memory occupied by struct net. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-12 17:45:41 -04:00
Al Viro	dde194a64b	afs: fix sget() races, close leak on umount * set ->s_fs_info in set() callback passed to sget() * allocate the thing and set it up enough for afs_test_super() before making it visible * have it freed in ->kill_sb() (current tree simply leaks it) * have ->put_super() leave ->s_fs_info->volume alone; it's too early for dropping it; do that from ->kill_sb() after having called kill_anon_super(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-12 17:45:36 -04:00
Al Viro	d251ed271d	ubifs: fix sget races * allocate ubifs_info in ->mount(), fill it enough for sb_test() and set ->s_fs_info to it in set() callback passed to sget(). * do not free it in ->put_super(); do that in ->kill_sb() after we'd done kill_anon_super(). * don't free it in ubifs_fill_super() either - deactivate_locked_super() done by caller when ubifs_fill_super() returns an error will take care of that sucker. * get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info; we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need only ubifs_info. Which it will find in ->s_fs_info just fine, no need to reassign anything... As the result, sb_test() becomes safe to apply to all superblocks that can be found by sget() (and a kludge with temporary use of ->s_fs_info to store a pointer to very different structure goes away). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-12 17:45:34 -04:00
Al Viro	b1c27ab3f9	ubifs: split allocation of ubifs_info into a separate function preparation to ubifs sget() race fixes Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-12 17:45:32 -04:00
Al Viro	ff78fca2a0	fix leak in proc_set_super() set_anon_super() can fail... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-12 17:45:28 -04:00
Linus Torvalds	3c25fa740e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: use join_transaction in btrfs_evict_inode() Btrfs - use %pU to print fsid Btrfs: fix extent state leak on failed nodatasum reads btrfs: fix unlocked access of delalloc_inodes Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline Btrfs: clear current->journal_info on async transaction commit Btrfs: make sure to recheck for bitmaps in clusters btrfs: remove unneeded includes from scrub.c btrfs: reinitialize scrub workers btrfs: scrub: errors in tree enumeration Btrfs: don't map extent buffer if path->skip_locking is set Btrfs: unlock the trans lock properly Btrfs: don't map extent buffer if path->skip_locking is set Btrfs: fix duplicate checking logic Btrfs: fix the allocator loop logic Btrfs: fix bitmap regression Btrfs: don't commit the transaction if we dont have enough pinned bytes Btrfs: noinline the cluster searching functions Btrfs: cache bitmaps when searching for a cluster	2011-06-12 11:06:36 -07:00
Li Zefan	30b4caf5d7	Btrfs: use join_transaction in btrfs_evict_inode() The WARN_ON() in start_transaction() was triggered while balancing. The cause is btrfs_relocate_chunk() started a transaction and then called iput() on the inode that stores free space cache, and iput() called btrfs_start_transaction() again. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-11 08:31:55 -04:00
Ryusuke Konishi	071d73cfe5	nilfs2: fix problem in setting checkpoint interval Checkpoint generation interval of nilfs goes wrong after user has changed the interval parameter with nilfs-tune tool. segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds segctord starting. Construction interval = 0 seconds, CP frequency < 30 seconds This turned out to be caused by a trivial bug in initialization code of log writer. This will fix it. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-06-11 15:51:15 +09:00
Ryusuke Konishi	d40990537c	nilfs2: fix missing block address termination in btree node shrinking nilfs_btree_delete function does not terminate part of virtual block addresses when shrinking the last remaining child node into the root node. The missing address termination causes that dead btree node blocks persist and chip away free disk space. This fixes the leak bug on the btree node deletion. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-06-11 15:51:15 +09:00
Ryusuke Konishi	fe744fdb74	nilfs2: fix incorrect block address termination in node concatenation nilfs_btree_delete function wrongly terminates virtual block address of the btree node held by its parent at index 0. When concatenating the index-0 node with its right sibling node, nilfs_btree_delete terminates the block address of index-0 node instead of the right sibling node which should be deleted. This bug not only wears disk space in the long run, but also causes file system corruption. This will fix it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-06-11 15:51:15 +09:00
Ilya Dryomov	22b63a2971	Btrfs - use %pU to print fsid Get rid of FIXME comment. Uuids from dmesg are now the same as uuids given by btrfs-progs. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 19:02:04 -04:00
Jan Schmidt	08d2f347e8	Btrfs: fix extent state leak on failed nodatasum reads When encountering an EIO while reading from a nodatasum extent, we insert an error record into the inode's failure tree. btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd better clear the failure tree in that case, otherwise the kernel complains about BUG extent_state: Objects remaining on kmem_cache_close() on rmmod. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 19:00:53 -04:00
Chris Mason	0e735872fb	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into for-linus	2011-06-10 18:58:08 -04:00
David Sterba	5be76758f3	btrfs: fix unlocked access of delalloc_inodes list_splice_init will make delalloc_inodes empty, but without a spinlock around, this may produce corrupted list head, accessed in many placess, The race window is very tight and nobody seems to have hit it so far. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 18:57:11 -04:00
Li Zefan	027ed2f004	Btrfs: avoid stack bloat in btrfs_ioctl_fs_info() The size of struct btrfs_ioctl_fs_info_args is as big as 1KB, so don't declare the variable on stack. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 18:57:10 -04:00
richard kennedy	9eb9104c66	btrfs: remove 64bit alignment padding to allow extent_buffer to fit into one fewer cacheline Reorder extent_buffer to remove 8 bytes of alignment padding on 64 bit builds. This shrinks its size to 128 bytes allowing it to fit into one fewer cache lines and allows more objects per slab in its kmem_cache. slabinfo extent_buffer reports :- before:- Sizes (bytes) Slabs ---------------------------------- Object : 136 Total : 123 SlabObj: 136 Full : 121 SlabSiz: 4096 Partial: 0 Loss : 0 CpuSlab: 2 Align : 8 Objects: 30 after :- Object : 128 Total : 4 SlabObj: 128 Full : 2 SlabSiz: 4096 Partial: 0 Loss : 0 CpuSlab: 2 Align : 8 Objects: 32 Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 18:57:10 -04:00
Sage Weil	38e880540f	Btrfs: clear current->journal_info on async transaction commit Normally current->jouranl_info is cleared by commit_transaction. For an async snap or subvol creation, though, it runs in a work queue. Clear it in btrfs_commit_transaction_async() to avoid leaking a non-NULL journal_info when we return to userspace. When the actual commit runs in the other thread it won't care that it's current->journal_info is already NULL. Signed-off-by: Sage Weil <sage@newdream.net> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 16:42:29 -04:00
Chris Mason	38e8788066	Btrfs: make sure to recheck for bitmaps in clusters Josef recently changed the free extent cache to look in the block group cluster for any bitmaps before trying to add a new bitmap for the same offset. This avoids BUG_ON()s due covering duplicate ranges. But it didn't go quite far enough. A given free range might span between one or more bitmaps or free space entries. The code has looping to cover this, but it doesn't check for clustered bitmaps every time. This shuffles our gotos to check for a bitmap in the cluster for every new bitmap entry we try to add. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 16:36:57 -04:00
Arne Jansen	6eef312588	btrfs: remove unneeded includes from scrub.c Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-06-10 14:59:52 +02:00
Arne Jansen	632dd772fc	btrfs: reinitialize scrub workers Scrub starts the workers each time a scrub starts and stops them after it finished. This patch adds an initialization for the workers before each start, otherwise the workers behave strangely. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-06-10 12:14:13 +02:00
Arne Jansen	8c51032f97	btrfs: scrub: errors in tree enumeration due to the semantics of btrfs_search_slot the path can point to an invalid slot when ret > 0. This condition went unnoticed, which in turn could have led to an incomplete scrubbing. Signed-off-by: Arne Jansen <sensille@gmx.net>	2011-06-10 12:14:13 +02:00
Josef Bacik	ad3e34bba4	Btrfs: don't map extent buffer if path->skip_locking is set Arne's scrub stuff exposed a problem with mapping the extent buffer in reada_for_search. He searches the commit root with multiple threads and with skip_locking set, so we can race and overwrite node->map_token since node isn't locked. So fix this so that we only map the extent buffer if we don't already have a map_token and skip_locking isn't set. Without this patch scrub would panic almost immediately, with the patch it doesn't panic anymore. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-10 12:14:12 +02:00
Mathias Krause	dac853ae89	exec: delay address limit change until point of no return Unconditionally changing the address limit to USER_DS and not restoring it to its old value in the error path is wrong because it prevents us using kernel memory on repeated calls to this function. This, in fact, breaks the fallback of hard coded paths to the init program from being ever successful if the first candidate fails to load. With this patch applied switching to USER_DS is delayed until the point of no return is reached which makes it possible to have a multi-arch rootfs with one arch specific init binary for each of the (hard coded) probed paths. Since the address limit is already set to USER_DS when start_thread() will be invoked, this redundancy can be safely removed. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-09 12:50:05 -07:00
Josef Bacik	3473f3c06a	Btrfs: unlock the trans lock properly In btrfs_wait_for_commit if we came upon a transaction that had committed we just exited, but that's bad since we are holding the trans_lock. So break instead so that the lock is dropped. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-09 10:15:17 -04:00
Josef Bacik	25b8b936ed	Btrfs: don't map extent buffer if path->skip_locking is set Arne's scrub stuff exposed a problem with mapping the extent buffer in reada_for_search. He searches the commit root with multiple threads and with skip_locking set, so we can race and overwrite node->map_token since node isn't locked. So fix this so that we only map the extent buffer if we don't already have a map_token and skip_locking isn't set. Without this patch scrub would panic almost immediately, with the patch it doesn't panic anymore. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-09 10:12:07 -04:00
Ingo Molnar	5f127133ee	Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2011-06-09 09:14:34 +02:00
Linus Torvalds	d21131bb0a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: trivial: add space in fsc error message cifs: silence printk when establishing first session on socket CIFS ACL support needs CONFIG_KEYS, so depend on it possible memory corruption in cifs_parse_mount_options() cifs: make CIFS depend on CRYPTO_ECB cifs: fix the kernel release version in the default security warning message	2011-06-08 13:54:29 -07:00
Josef Bacik	f6a398298d	Btrfs: fix duplicate checking logic When merging my code into the integration test the second check for duplicate entries got screwed up. This patch fixes it by dropping ret2 and just using ret for the return value, and checking if we got an error before adding the bitmap to the local list. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 16:37:29 -04:00
Josef Bacik	723bda2083	Btrfs: fix the allocator loop logic I was testing with empty_cluster = 0 to try and reproduce a problem and kept hitting early enospc panics. This was because our loop logic was a little confused. So this is what I did 1) Make the loop variable the ultimate decider on wether we should loop again isntead of checking to see if we had an uncached bg, empty size or empty cluster. 2) Increment loop before checking to see what we are on to make the loop definitions make more sense. 3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0 unless we didn't actually allocate a chunk. If we did allocate a chunk we should be able to easily setup a new cluster so clearing empty_size/empty_cluster makes us less efficient. This kept me from hitting panics while trying to reproduce the other problem. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 16:37:29 -04:00
Josef Bacik	2cdc342c20	Btrfs: fix bitmap regression In cleaning up the clustering code I accidently introduced a regression by adding bitmap entries to the cluster rb tree. The problem is if we've maxed out the number of bitmaps we can have for the block group we can only add free space to the bitmaps, but since the bitmap is on the cluster we can't find it and we try to create another one. This would result in a panic because the total bitmaps was bigger than the max bitmaps that were allowed. This patch fixes this by checking to see if we have a cluster, and then looking at the cluster rb tree to see if it has a bitmap entry and if it does and that space belongs to that bitmap, go ahead and add it to that bitmap. I could hit this panic every time with an fs_mark test within a couple of minutes. With this patch I no longer hit the panic and fs_mark goes to completion. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 16:37:28 -04:00
Josef Bacik	f2bb8f5cfb	Btrfs: don't commit the transaction if we dont have enough pinned bytes I noticed when running an enospc test that we would get stuck committing the transaction in check_data_space even though we truly didn't have enough space. So check to see if bytes_pinned is bigger than num_bytes, if it's not don't commit the transaction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:31 -04:00
Josef Bacik	3de85bb95c	Btrfs: noinline the cluster searching functions When profiling the find cluster code it's hard to tell where we are spending our time because the bitmap and non-bitmap functions get inlined by the compiler, so make that not happen. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:30 -04:00
Josef Bacik	86d4a77ba3	Btrfs: cache bitmaps when searching for a cluster If we are looking for a cluster in a particularly sparse or fragmented block group, we will do a lot of looping through the free space tree looking for various things, and if we need to look at bitmaps we will endup doing the whole dance twice. So instead add the bitmap entries to a temporary list so if we have to do the bitmap search we can just look through the list of entries we've found quickly instead of having to loop through the entire tree again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:28 -04:00
Jeff Layton	83fb086e0e	cifs: trivial: add space in fsc error message Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-08 16:03:29 +00:00
Sage Weil	0c1f91f271	ceph: unwind canceled flock state If we request a lock and then abort (e.g., ^C), we need to send a matching unlock request to the MDS to unwind our lock attempt to avoid indefinitely blocking other clients. Reported-by: Brian Chrisman <brchrisman@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-07 21:36:45 -07:00
Sage Weil	0e98728fa3	ceph: fix ENOENT logic in striped_read Getting ENOENT is equivalent to reading 0 bytes. Make that correction before setting up the hit_stripe and was_short flags. Fixes the following case: dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0 dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct Reported-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-07 21:34:16 -07:00
Sage Weil	c3cd62839a	ceph: fix short sync reads from the OSD If we get a short read from the OSD because the object is small, we need to zero the remainder of the buffer. For O_DIRECT reads, the attempted range is not trimmed to i_size by the VFS, so we were actually looping indefinitely. Fix by trimming by i_size, and the unconditionally zeroing the trailing range. Reported-by: Jeff Wu <cpwu@tnsoft.com.cn> Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-07 21:34:14 -07:00
Sage Weil	70b666c3b4	ceph: use ihold when we already have an inode ref We should use ihold whenever we already have a stable inode ref, even when we aren't holding i_lock. This avoids adding new and unnecessary locking dependencies. Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-07 21:34:11 -07:00
Linus Torvalds	9c125d2af8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6: fat: Fix corrupt inode flags when remove ATTR_SYS flag	2011-06-07 19:04:21 -07:00
Linus Torvalds	d205df9955	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Processes waiting on inode glock that no processes are holding	2011-06-07 18:44:10 -07:00
Linus Torvalds	8397345172	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: make unlink() and rmdir() return ENOENT in preference to EROFS lmLogOpen() broken failure exit usb: remove bad dput after dentry_unhash more conservative S_NOSEC handling	2011-06-07 18:36:59 -07:00
Theodore Ts'o	e6bc45d65d	vfs: make unlink() and rmdir() return ENOENT in preference to EROFS If user space attempts to remove a non-existent file or directory, and the file system is mounted read-only, return ENOENT instead of EROFS. Either error code is arguably valid/correct, but ENOENT is a more specific error message. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-06-07 08:51:14 -04:00
Al Viro	9054760ff5	lmLogOpen() broken failure exit Callers of lmLogOpen() expect it to return -E... on failure exits, which is what it returns, except for the case of blkdev_get_by_dev() failure. It that case lmLogOpen() return the error with the wrong sign... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2011-06-07 08:50:59 -04:00
Jeff Layton	9c4843ea57	cifs: silence printk when establishing first session on socket When signing is enabled, the first session that's established on a socket will cause a printk like this to pop: CIFS VFS: Unexpected SMB signature This is because the key exchange hasn't happened yet, so the signature field is bogus. Don't try to check the signature on the socket until the first session has been established. Also, eliminate the specific check for SMB_COM_NEGOTIATE since this check covers that case too. Cc: stable@kernel.org Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-07 00:57:05 +00:00
Casey Bodley	7d751f6f8c	nfsd: link returns nfserr_delay when breaking lease fix for commit `4795bb37ef`, nfsd: break lease on unlink, link, and rename if the LINK operation breaks a delegation, it returns NFS4ERR_NOENT (which is not a valid error in rfc 5661) instead of NFS4ERR_DELAY. the return value of nfsd_break_lease() in nfsd_link() must be converted from host_err to err Signed-off-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-06-06 18:46:56 -04:00
Randy Dunlap	be1f4084b4	nfsd: v4 support requires CRYPTO nfsd V4 support uses crypto interfaces, so select CRYPTO to fix build errors in 2.6.39: ERROR: "crypto_destroy_tfm" [fs/nfsd/nfsd.ko] undefined! ERROR: "crypto_alloc_base" [fs/nfsd/nfsd.ko] undefined! Reported-by: Wakko Warner <wakko@animx.eu.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-06-06 18:37:35 -04:00
J. Bruce Fields	b084f598df	nfsd: fix dependency of nfsd on auth_rpcgss Commit `b0b0c0a26e` "nfsd: add proc file listing kernel's gss_krb5 enctypes" added an nunnecessary dependency of nfsd on the auth_rpcgss module. It's a little ad hoc, but since the only piece of information nfsd needs from rpcsec_gss_krb5 is a single static string, one solution is just to share it with an include file. Cc: stable@kernel.org Reported-by: Michael Guntsche <mike@it-loops.com> Cc: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-06-06 15:07:15 -04:00
Darren Salt	243e2dd38e	CIFS ACL support needs CONFIG_KEYS, so depend on it Build fails if CONFIG_KEYS is not selected. Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-06 16:58:16 +00:00
Vasily Averin	957df4535d	possible memory corruption in cifs_parse_mount_options() error path after mountdata check frees uninitialized mountdata_copy Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-06-06 15:31:29 +00:00
Lukas Czerner	a9c667f8f0	ext4: fixed tracepoints cleanup While creating fixed tracepoints for ext3, basically by porting them from ext4, I found a lot of useless retyping, wrong type usage, useless variable passing and other inconsistencies in the ext4 fixed tracepoint code. This patch cleans the fixed tracepoint code for ext4 and also simplify some of them. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-06 09:51:52 -04:00
Lukas Czerner	c03f8aa9ab	ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap Currently we are not marking the extent as the last one (FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is because we just do not check for it right now and continue searching for next extent. But at the point we hit the hole at the end of the file, it is too late. This commit adds check for the allocated block in subsequent extent and if there is no more extents (block = EXT_MAX_BLOCKS) just flag the current one as the last one. This behaviour has been spotted unintentionally by 252 xfstest, when the test hangs out, because of wrong loop condition. However on other filesystems (like xfs) it will exit anyway, because we notice the last extent flag and exit. With this patch xfstest 252 does not hang anymore, ext4 fiemap implementation still reports bad extent type in some cases, however this seems to be different issue. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-06 00:06:52 -04:00
Lukas Czerner	f17722f917	ext4: Fix max file size and logical block counting of extent format file Kazuya Mio reported that he was able to hit BUG_ON(next == lblock) in ext4_ext_put_gap_in_cache() while creating a sparse file in extent format and fill the tail of file up to its end. We will hit the BUG_ON when we write the last block (2^32-1) into the sparse file. The root cause of the problem lies in the fact that we specifically set s_maxbytes so that block at s_maxbytes fit into on-disk extent format, which is 32 bit long. However, we are not storing start and end block number, but rather start block number and length in blocks. It means that in order to cover extent from 0 to EXT_MAX_BLOCK we need EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) - and it does not. The only way to fix it without changing the meaning of the struct ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes by one fs block so we can cover the whole extent we can get by the on-disk extent format. Also in many places EXT_MAX_BLOCK is used as length instead of maximum logical block number as the name suggests, it is all a bit messy. So this commit renames it to EXT_MAX_BLOCKS and change its usage in some places to actually be maximum number of blocks in the extent. The bug which this commit fixes can be reproduced as follows: dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((232-2)) sync dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((232-1)) Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-06 00:05:17 -04:00
Yongqiang Yang	5def136025	ext4: correct comments for ext4_free_blocks() metadata is not parameter of ext4_free_blocks() any more. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-05 23:26:40 -04:00
Linus Torvalds	e6ece70732	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) btrfs: fix uninitialized variable warning btrfs: add helper for fs_info->closing Btrfs: add mount -o inode_cache btrfs: scrub: add explicit plugging btrfs: use btrfs_ino to access inode number Btrfs: don't save the inode cache if we are deleting this root btrfs: false BUG_ON when degraded Btrfs: don't save the inode cache in non-FS roots Btrfs: make sure we don't overflow the free space cache crc page Btrfs: fix uninit variable in the delayed inode code btrfs: scrub: don't reuse bios and pages Btrfs: leave spinning on lookup and map the leaf Btrfs: check for duplicate entries in the free space cache Btrfs: don't try to allocate from a block group that doesn't have enough space Btrfs: don't always do readahead Btrfs: try not to sleep as much when doing slow caching Btrfs: kill BTRFS_I(inode)->block_group Btrfs: don't look at the extent buffer level 3 times in a row Btrfs: map the node block when looking for readahead targets Btrfs: set range_start to the right start in count_range_bits ...	2011-06-05 06:17:23 +09:00
Tejun Heo	6dfca32984	job control: make task_clear_jobctl_pending() clear TRAPPING automatically JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to (re)transit into TRACED. task_clear_jobctl_pending() must be called when either tracee enters TRACED or the transition is cancelled for some reason. The former is achieved by explicitly calling task_clear_jobctl_pending() in ptrace_stop() and the latter by calling it at the end of do_signal_stop(). Calling task_clear_jobctl_trapping() at the end of do_signal_stop() limits the scope TRAPPING can be used and is fragile in that seemingly unrelated changes to tracee's control flow can lead to stuck TRAPPING. We already have task_clear_jobctl_pending() calls on those cancelling events to clear JOBCTL_STOP_PENDING. Cancellations can be handled by making those call sites use JOBCTL_PENDING_MASK instead and updating task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is called automatically if no stop/trap is pending. This patch makes the above changes and removes the fallback task_clear_jobctl_trapping() call from do_signal_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-04 18:17:11 +02:00
Tejun Heo	3759a0d94c	job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() This patch introduces JOBCTL_PENDING_MASK and replaces task_clear_jobctl_stop_pending() with task_clear_jobctl_pending() which takes an extra @mask argument. JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but future patches will add more bits. recalc_sigpending_tsk() is updated to use JOBCTL_PENDING_MASK instead. task_clear_jobctl_pending() takes @mask which in subset of JOBCTL_PENDING_MASK and clears the relevant jobctl bits. If JOBCTL_STOP_PENDING is set, other STOP bits are cleared together. All task_clear_jobctl_stop_pending() users are updated to call task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is functionally identical to task_clear_jobctl_stop_pending(). This patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com>	2011-06-04 18:17:10 +02:00

... 3 4 5 6 7 ...

23523 Commits (6b27f62fc750d85bc6fc3718b3b38ec60edc2d74)