linux

Commit Graph

Author	SHA1	Message	Date
Trond Myklebust	44950b67a6	NFSv4.1: Ensure that we initialise the session when following a referral Put the code that is common to both the referral and ordinary mount cases into a common helper routine. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-06-22 13:22:53 -04:00
Andy Adamson	f799bdb355	nfs4 use mandatory attribute file type in nfs4_get_root S_ISDIR(fsinfo.fattr->mode) checks the file type rather than the mode bits, so we should be checking for the NFS_ATTR_FATTR_TYPE fattr property. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-06-22 13:17:43 -04:00
Sage Weil	17c688c3df	ceph: delay umount until all mds requests drop inode+dentry refs This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-21 16:11:50 -07:00
Sage Weil	d69ed05a80	ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulate Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-21 16:04:10 -07:00
Sage Weil	cebc5be6b6	ceph: fix crush map update decoding If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-17 10:22:48 -07:00
Jeff Layton	8a224d4894	cifs: remove bogus first_time check in NTLMv2 session setup code This bug appears to be the result of a cut-and-paste mistake from the NTLMv1 code. The function to generate the MAC key was commented out, but not the conditional above it. The conditional then ended up causing the session setup key not to be copied to the buffer unless this was the first session on the socket, and that made all but the first NTLMv2 session setup fail. Fix this by removing the conditional and all of the commented clutter that made it difficult to see. Cc: Stable <stable@kernel.org> Reported-by: Gunther Deschner <gdeschne@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2010-06-16 13:40:18 -04:00
Jeff Layton	47c78b7f40	cifs: don't call cifs_new_fileinfo unless cifs_open succeeds It's currently possible for cifs_open to fail after it has already called cifs_new_fileinfo. In that situation, the new fileinfo will be leaked as the caller doesn't call fput. That in turn leads to a busy inodes after umount problem since the fileinfo holds an extra inode reference now. Shuffle cifs_open around a bit so that it only calls cifs_new_fileinfo if it's going to succeed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:17 -04:00
Suresh Jayaraman	d9d5d8df95	cifs: don't ignore cifs_posix_open_inode_helper return value ...and ensure that we propagate the error back to avoid any surprises. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>	2010-06-16 13:40:17 -04:00
Jeff Layton	db460242bf	cifs: clean up arguments to cifs_open_inode_helper ...which takes a ton of unneeded arguments and does a lot more pointer dereferencing than is really needed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:17 -04:00
Jeff Layton	6ca9f3bae8	cifs: pass instantiated filp back after open call The current scheme of sticking open files on a list and assuming that cifs_open will scoop them off of it is broken and leads to "Busy inodes after umount..." errors at unmount time. The problem is that there is no guarantee that cifs_open will always be called after a ->lookup or ->create operation. If there are permissions or other problems, then it's quite likely that it won't be called. Fix this by fully instantiating the filp whenever the file is created and pass that filp back to the VFS. If there is a problem, the VFS can clean up the references. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:16 -04:00
Jeff Layton	2422f676fb	cifs: move cifs_new_fileinfo call out of cifs_posix_open Having cifs_posix_open call cifs_new_fileinfo is problematic and inconsistent with how "regular" opens work. It's also buggy as cifs_reopen_file calls this function on a reconnect, which creates a new struct cifsFileInfo that just gets leaked. Push it out into the callers. This also allows us to get rid of the "mnt" arg to cifs_posix_open. Finally, in the event that a cifsFileInfo isn't or can't be created, we always want to close the filehandle out on the server as the client won't have a record of the filehandle and can't actually use it. Make sure that CIFSSMBClose is called in those cases. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>	2010-06-16 13:40:16 -04:00
Steve French	0933a95dfd	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2010-06-16 13:19:36 +00:00
Tao Ma	1739da4054	ocfs2: Limit default local alloc size within bitmap range. In commit `6b82021b9e`, we increase our local alloc size and calculate how much megabytes we can get according to group size and volume size. But we also need to check the maximum bits a local alloc block bitmap can have. With a bs=512, cs=32K, local volume with 160G, it calculate 96MB while the maximum local alloc size is only 76M. So the bitmap will overflow and corrupt the system truncate log file. See bug http://oss.oracle.com/bugzilla/show_bug.cgi?id=1262 Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-06-15 16:50:43 -07:00
Tao Ma	40f165f416	ocfs2: Move orphan scan work to ocfs2_wq. We used to let orphan scan work in the default work queue, but there is a corner case which will make the system deadlock. The scenario is like this: 1. set heartbeat threadshold to 200. this will allow us to have a great chance to have a orphan scan work before our quorum decision. 2. mount node 1. 3. after 1~2 minutes, mount node 2(in order to make the bug easier to reproduce, better add maxcpus=1 to kernel command line). 4. node 1 do orphan scan work. 5. node 2 do orphan scan work. 6. node 1 do orphan scan work. After this, node 1 hold the orphan scan lock while node 2 know node 1 is the master. 7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection). Now when node 2 begins orphan scan, the system queue is blocked. The root cause is that both orphan scan work and quorum decision work will use the system event work queue. orphan scan has a chance of blocking the event work queue(in dlm_wait_for_node_death) so that there is no chance for quorum decision work to proceed. This patch resolve it by moving orphan scan work to ocfs2_wq. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-06-15 15:43:48 -07:00
Julia Lawall	6469272c35	fs/ocfs2/dlm: Add missing spin_unlock Add a spin_unlock missing on the error path. Unlock as in the other code that leads to the leave label. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-06-15 15:43:46 -07:00
Jens Axboe	575f552012	Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-linus	2010-06-14 12:54:57 +02:00
Michael Ellerman	9f069af5b6	of: Drop properties with "/" in their name Some bogus firmwares include properties with "/" in their name. This causes problems when creating the /proc/device-tree file system, because the slash is taken to indicate a directory. We don't care about those properties, and we don't want to encourage them, so just throw them away when creating /proc/device-tree. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2010-06-13 18:12:24 -06:00
Sage Weil	ae32be3134	ceph: fix message memory leak, uninitialized variable We need to properly initialize skip, as not all alloc_msg op instances set it. Also, BUG if someone says skip but also allocates a message. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-13 10:34:36 -07:00
Sage Weil	4a32f93d29	ceph: fix map handler error path Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-13 10:34:36 -07:00
Yehuda Sadeh	0cf5537b15	ceph: some endianity fixes Fix some problems that came up with sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-13 10:34:36 -07:00
Jeff Layton	12420ac341	cifs: implement drop_inode superblock op The standard behavior for drop_inode is to delete the inode when the last reference to it is put and the nlink count goes to 0. This helps keep inodes that are still considered "not deleted" in cache as long as possible even when there aren't dentries attached to them. When server inode numbers are disabled, it's not possible for cifs_iget to ever match an existing inode (since inode numbers are generated via iunique). In this situation, cifs can keep a lot of inodes in cache that will never be used again. Implement a drop_inode routine that deletes the inode if server inode numbers are disabled on the mount. This helps keep the cifs inode caches down to a more manageable size when server inode numbers are disabled. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-06-12 02:06:52 +00:00
Jeff Layton	ed0e3ace57	cifs: don't attempt busy-file rename unless it's in same directory Busy-file renames don't actually work across directories, so we need to limit this code to renames within the same dir. This fixes the bug detailed here: https://bugzilla.redhat.com/show_bug.cgi?id=591938 Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-06-12 01:45:36 +00:00
Linus Torvalds	b25b550bb1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: The file argument for fsync() is never null Btrfs: handle ERR_PTR from posix_acl_from_xattr() Btrfs: avoid BUG when dropping root and reference in same transaction Btrfs: prohibit a operation of changing acl's mask when noacl mount option used Btrfs: should add a permission check for setfacl Btrfs: btrfs_lookup_dir_item() can return ERR_PTR Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs Btrfs: unwind after btrfs_start_transaction() errors Btrfs: btrfs_iget() returns ERR_PTR Btrfs: handle kzalloc() failure in open_ctree() Btrfs: handle error returns from btrfs_lookup_dir_item() Btrfs: Fix BUG_ON for fs converted from extN Btrfs: Fix null dereference in relocation.c Btrfs: fix remap_file_pages error Btrfs: uninitialized data is check_path_shared() Btrfs: fix fallocate regression Btrfs: fix loop device on top of btrfs	2010-06-11 14:18:47 -07:00
Dan Carpenter	6f902af400	Btrfs: The file argument for fsync() is never null The "file" argument for fsync is never null so we can remove this check. What drew my attention here is that 7ea8085910e: "drop unused dentry argument to ->fsync" introduced an unconditional dereference at the start of the function and that generated a smatch warning. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:40 -04:00
Dan Carpenter	834e74759a	Btrfs: handle ERR_PTR from posix_acl_from_xattr() posix_acl_from_xattr() returns both ERR_PTRs and null, but it's OK to pass null values to set_cached_acl() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:39 -04:00
Sage Weil	15e7000095	Btrfs: avoid BUG when dropping root and reference in same transaction If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes with end_transaction(), the cleaner kthread may come in and drop the root in the same transaction. If that's the case, the root's refs still == 1 in the tree when btrfs_del_root() deletes the item, because commit_fs_roots() hasn't updated it yet (that happens during the commit). This wasn't a problem before only because btrfs_ioctl_snap_destroy() would commit the transaction before dropping the dentry reference, so the dead root wouldn't get queued up until after the fs root item was updated in the btree. Since it is not an error to drop the root reference and the root in the same transaction, just drop the BUG_ON() in btrfs_del_root(). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:39 -04:00
Shi Weihua	731e3d1b43	Btrfs: prohibit a operation of changing acl's mask when noacl mount option used when used Posix File System Test Suite(pjd-fstest) to test btrfs, some cases about setfacl failed when noacl mount option used. I simplified used commands in pjd-fstest, and the following steps can reproduce it. ------------------------ # cd btrfs-part/ # mkdir aaa # setfacl -m m::rw aaa <- successed, but not expected by pjd-fstest. ------------------------ I checked ext3, a warning message occured, like as: setfacl: aaa/: Operation not supported Certainly, it's expected by pjd-fstest. So, i compared acl.c of btrfs and ext3. Based on that, a patch created. Fortunately, it works. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:38 -04:00
Shi Weihua	2f26afba46	Btrfs: should add a permission check for setfacl On btrfs, do the following ------------------ # su user1 # cd btrfs-part/ # touch aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rw- group::rw- other::r-- # su user2 # cd btrfs-part/ # setfacl -m u::rwx aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rwx <- successed to setfacl group::rw- other::r-- ------------------ but we should prohibit it that user2 changing user1's acl. In fact, on ext3 and other fs, a message occurs: setfacl: aaa: Operation not permitted This patch fixed it. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:37 -04:00
Dan Carpenter	cf1e99a4e0	Btrfs: btrfs_lookup_dir_item() can return ERR_PTR btrfs_lookup_dir_item() can return either ERR_PTRs or null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:37 -04:00
Dan Carpenter	3140c9a34b	Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs btrfs_read_fs_root_no_name() returns ERR_PTRs on error so I added a check for that. It's not clear to me if it can also return NULL pointers or not so I left the original NULL pointer check as is. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:36 -04:00
Dan Carpenter	d327099a23	Btrfs: unwind after btrfs_start_transaction() errors This was added by a22285a6a3: "Btrfs: Integrate metadata reservation with start_transaction". If we goto out here then we skip all the unwinding and there are locks still held etc. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:35 -04:00
Dan Carpenter	4cbd1149fb	Btrfs: btrfs_iget() returns ERR_PTR btrfs_iget() returns an ERR_PTR() on failure and not null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:35 -04:00
Dan Carpenter	676e4c8639	Btrfs: handle kzalloc() failure in open_ctree() Unwind and return -ENOMEM if the allocation fails here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:34 -04:00
Dan Carpenter	fb4f6f910c	Btrfs: handle error returns from btrfs_lookup_dir_item() If btrfs_lookup_dir_item() fails, we should can just let the mount fail with an error. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:57:33 -04:00
Yan, Zheng	3bf84a5a83	Btrfs: Fix BUG_ON for fs converted from extN Tree blocks can live in data block groups in FS converted from extN. So it's easy to trigger the BUG_ON. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:48:35 -04:00
Yan, Zheng	046f264f6b	Btrfs: Fix null dereference in relocation.c Fix a potential null dereference in relocation.c Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Acked-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 15:48:34 -04:00
Linus Torvalds	63c70a0d7b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: try to send partial cap release on cap message on missing inode ceph: release cap on import if we don't have the inode ceph: fix misleading/incorrect debug message ceph: fix atomic64_t initialization on ia64 ceph: fix lease revocation when seq doesn't match ceph: fix f_namelen reported by statfs ceph: fix memory leak in statfs ceph: fix d_subdirs ordering problem	2010-06-11 09:52:23 -07:00
Miao Xie	058a457ef0	Btrfs: fix remap_file_pages error when we use remap_file_pages() to remap a file, remap_file_pages always return error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 11:46:12 -04:00
Dan Carpenter	0e4dcbef1c	Btrfs: uninitialized data is check_path_shared() refs can be used with uninitialized data if btrfs_lookup_extent_info() fails on the first pass through the loop. In the original code if that happens then check_path_shared() probably returns 1, this patch changes it to return 1 for safety. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 11:46:12 -04:00
Josef Bacik	8360977972	Btrfs: fix fallocate regression Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we dropped passing the mode to the function that actually does the preallocation. This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 11:46:12 -04:00
Miao Xie	4a001071d3	Btrfs: fix loop device on top of btrfs We cannot use the loop device which has been connected to a file in the btrf The reproduce steps is following: # dd if=/dev/zero of=vdev0 bs=1M count=1024 # losetup /dev/loop0 vdev0 # mkfs.btrfs /dev/loop0 ... failed to zero device start -5 The reason is that the btrfs don't implement either ->write_begin or ->write the VFS API, so we fix it by setting ->write to do_sync_write(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-06-11 11:46:11 -04:00
Christoph Hellwig	29cb48594b	writeback: fix pin_sb_for_writeback We need to check for s_instances to make sure we don't bother working against a filesystem that is beeing unmounted, and we need to call put_super to make sure a superblock is freed when we race against umount. Also no need to keep sb_lock after we got a reference on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:08 +02:00
Christoph Hellwig	334132ae92	writeback: add missing requeue_io in writeback_inodes_wb In "writeback: fix writeback_inodes_wb from writeback_inodes_sb" I accidentally removed the requeue_io if we need to skip a superblock because we can't pin it. Add it back, otherwise we're getting spurious lockups after multiple xfstests runs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:08 +02:00
Christoph Hellwig	c5444198ca	writeback: simplify and split bdi_start_writeback bdi_start_writeback now never gets a superblock passed, so we can just remove that case. And to further untangle the code and flatten the call stack split it into two trivial helpers for it's two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:08 +02:00
Christoph Hellwig	b8c2f3474f	writeback: simplify wakeup_flusher_threads bdi_writeback_all only has one caller, so fold it to simplify the code and flatten the call stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:07 +02:00
Christoph Hellwig	d19de7edf5	writeback: fix writeback_inodes_wb from writeback_inodes_sb When we call writeback_inodes_wb from writeback_inodes_sb we always have s_umount held, which currently makes the whole operation a no-op. But if we are called to write out inodes for a specific superblock we always have s_umount held, so replace the incorrect logic checking for WB_SYNC_ALL which only worked by coincidence with the proper check for an explicit superblock argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:07 +02:00
Christoph Hellwig	cf37e97247	writeback: enforce s_umount locking in writeback_inodes_sb Make sure that not only sync_filesystem but all callers of writeback_inodes_sb have the superblock protected against remount. As-is this disables all functionality for these callers, but the next patch relies on this locking to fix writeback_inodes_sb for sync_filesystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:07 +02:00
Christoph Hellwig	3c4d716538	writeback: queue work on stack in writeback_inodes_sb If we want to rely on s_umount in the caller we need to wait for completion of the I/O submission before returning to the caller. Refactor bdi_sync_writeback into a bdi_queue_work_onstack helper and use it for this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:07 +02:00
Christoph Hellwig	7f0e7bed93	writeback: fix writeback completion notifications The code dealing with bdi_work->state and completion of a bdi_work is a major mess currently. This patch makes sure we directly use one set of flags to deal with it, and use it consistently, which means: - always notify about completion from the rcu callback. We only ever wait for it from on-stack callers, so this simplification does not even cause a theoretical slowdown currently. It also makes sure we don't miss out on the notification if we ever add other callers to wait for it. - make earlier completion notification depending on the on-stack allocation, not the sync mode. If we introduce new callers that want to do WB_SYNC_NONE writeback from on-stack callers this will be nessecary. Also rename bdi_wait_on_work_clear to bdi_wait_on_work_done and inline a few small functions into their only caller to make the code understandable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-11 12:58:07 +02:00
Sage Weil	2b2300d62e	ceph: try to send partial cap release on cap message on missing inode If we have enough memory to allocate a new cap release message, do so, so that we can send a partial release message immediately. This keeps us from making the MDS wait when the cap release it needs is in a partially full release message. If we fail because of ENOMEM, oh well, they'll just have to wait a bit longer. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:30:25 -07:00
Sage Weil	3d7ded4d81	ceph: release cap on import if we don't have the inode If we get an IMPORT that give us a cap, but we don't have the inode, queue a release (and try to send it immediately) so that the MDS doesn't get stuck waiting for us. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:30:07 -07:00
Sage Weil	9dbd412f56	ceph: fix misleading/incorrect debug message Nothing is released here: the caps message is simply ignored in this case. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:29:59 -07:00
Jeff Mahoney	00d5643e7c	ceph: fix atomic64_t initialization on ia64 bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-10 13:29:50 -07:00
Miklos Szeredi	6db40cf047	pipe: fix check in "set size" fcntl As it stands this check compares the number of pages to the page size. This makes no sense and makes the fcntl fail in almost any sane case. Fix it by checking if nr_pages is not zero (it can become zero only if arg is too big and round_pipe_size() overflows). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Miklos Szeredi	1d862f4122	pipe: fix pipe buffer resizing pipe_set_size() needs to copy pipe bufs from the old circular buffer to the new. The current code gets this wrong in multiple ways, resulting in oops. Test program is available here: http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/ Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Jens Axboe	3e6c05052c	block: remove duplicate BUG_ON() in bd_finish_claiming() We do the same BUG_ON() just a line later when calling into __bd_abort_claiming(). Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Nick Piggin	b0018361c3	block: bd_start_claiming cleanup I don't like the subtle multi-context code in bd_claim (ie. detects where it has been called based on bd_claiming). It seems clearer to just require a new function to finish a 2-part claim. Also improve commentary in bd_start_claiming as to how it should be used. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Nick Piggin	cf3425707e	block: bd_start_claiming fix module refcount bd_start_claiming has an unbalanced module_put introduced in `6b4517a79`. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-10 19:08:34 +02:00
Linus Torvalds	b95a568093	Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux * 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: nfsd4: shut down callback queue outside state lock nfsd: nfsd_setattr needs to call commit_metadata	2010-06-09 12:43:04 -07:00
Dave Chinner	254c8c2dbf	xfs: remove nr_to_write writeback windup. Now that the background flush code has been fixed, we shouldn't need to silently multiply the wbc->nr_to_write to get good writeback. Remove that code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-08 18:12:44 -07:00
J. Bruce Fields	44b56603c4	Merge branch 'for-2.6.34-incoming' into for-2.6.35-incoming	2010-06-08 20:05:18 -04:00
J. Bruce Fields	c3935e3049	nfsd4: shut down callback queue outside state lock This reportedly causes a lockdep warning on nfsd shutdown. That looks like a false positive to me, but there's no reason why this needs the state lock anyway. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-08 19:33:52 -04:00
Linus Torvalds	3975d16760	Merge git://git.infradead.org/~dwmw2/mtd-2.6.35 * git://git.infradead.org/~dwmw2/mtd-2.6.35: jffs2: update ctime when changing the file's permission by setfacl jffs2: Fix NFS race by using insert_inode_locked() jffs2: Fix in-core inode leaks on error paths mtd: Fix NAND submenu mtd/r852: update card detect early. mtd/r852: Fixes in case of DMA timeout mtd/r852: register IRQ as last step drivers/mtd: Use memdup_user docbook: make mtd nand module init static	2010-06-07 17:10:06 -07:00
Jan Kara	1c24d06f8e	jffs2: update ctime when changing the file's permission by setfacl jffs2 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, jffs2 must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2010-06-06 11:27:10 +01:00
Linus Torvalds	78b36558b7	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files	2010-06-05 10:07:25 -07:00
Dmitry Monakhov	84a8dce271	ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags A few functions were still modifying i_flags in a racy manner. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-05 11:51:27 -04:00
Linus Torvalds	6c5de280b6	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: improve xfs_isilocked xfs: skip writeback from reclaim context xfs: remove done roadmap item from xfs-delayed-logging-design.txt xfs: fix race in inode cluster freeing failing to stale inodes xfs: fix access to upper inodes without inode64 xfs: fix might_sleep() warning when initialising per-ag tree fs/xfs/quota: Add missing mutex_unlock xfs: remove duplicated #include xfs: convert more trace events to DEFINE_EVENT xfs: xfs_trace.c: remove duplicated #include xfs: Check new inode size is OK before preallocating xfs: clean up xlog_align xfs: cleanup log reservation calculactions xfs: be more explicit if RT mount fails due to config xfs: replace E2BIG with EFBIG where appropriate	2010-06-05 07:33:05 -07:00
Linus Torvalds	7926e0bfbb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: remove obsolete declarations of cache constructor and destructor nilfs2: fix style issue in nilfs_destroy_cachep	2010-06-05 07:31:13 -07:00
Linus Torvalds	7f0d384caf	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: Minix: Clean up left over label fix truncate inode time modification breakage fix setattr error handling in sysfs, configfs fcntl: return -EFAULT if copy_to_user fails wrong type for 'magic' argument in simple_fill_super() fix the deadlock in qib_fs mqueue doesn't need make_bad_inode()	2010-06-04 21:12:39 -07:00
Linus Torvalds	d2dd328b7f	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: make blk_init_free_list and elevator_init idempotent block: avoid unconditionally freeing previously allocated request_queue pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface pipe: change the privilege required for growing a pipe beyond system max pipe: adjust minimum pipe size to 1 page block: disable preemption before using sched_clock() cciss: call BUG() earlier Preparing 8.3.8rc2 drbd: Reduce verbosity drbd: use drbd specific ratelimit instead of global printk_ratelimit drbd: fix hang on local read errors while disconnected drbd: Removed the now empty w_io_error() function drbd: removed duplicated #includes drbd: improve usage of MSG_MORE drbd: need to set socket bufsize early to take effect drbd: improve network latency, TCP_QUICKACK drbd: Revert "drbd: Create new current UUID as late as possible" brd: support discard Revert "writeback: fix WB_SYNC_NONE writeback from umount" Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync" ...	2010-06-04 15:37:44 -07:00
Linus Torvalds	f9196e7c03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: fix setattr error handling in sysfs, configfs kobject: free memory if netlink_kernel_create() fails lib/kobject_uevent.c: fix CONIG_NET=n warning	2010-06-04 15:27:27 -07:00
Mike Frysinger	1da083c9b2	flat: fix unmap len in load error path The data chunk is mmaped with 'len' which remains unchanged, so use that when unmapping in the error path rather than trying to recalculate (and incorrectly so) the value used originally. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: David McCullough <davidm@snapgear.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-04 15:21:45 -07:00
Mike Frysinger	2e94de8acb	fs/binfmt_flat.c: split the stack & data alignments The stack and data have different alignment requirements, so don't force them to wear the same shoe. Increase the data alignment to match that which the elf2flt linker script has always been using: 0x20 bytes. Not only does this bring the kernel loader in line with the toolchain, but it also fixes a swath of gcc tests which try to force larger alignment values but randomly fail when the FLAT loader fails to deliver. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: David McCullough <davidm@snapgear.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: Paul Mundt <lethal@linux-sh.org> Tested-by: Michal Simek <monstr@monstr.eu> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jie Zhang <jie@codesourcery.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-04 15:21:44 -07:00
Heiko Carstens	7cbe17701a	fs/compat_rw_copy_check_uvector: add missing compat_ptr call A call to access_ok is missing a compat_ptr conversion. Introduced with `b83733639a` "compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev" fs/compat.c: In function 'compat_rw_copy_check_uvector': fs/compat.c:629: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-04 15:21:44 -07:00
Andrew Hendry	01afaf6198	Minix: Clean up left over label Remove a left over fail label. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-06-04 17:16:30 -04:00
Nick Piggin	af5a30d8cf	fix truncate inode time modification breakage mtime and ctime should be changed only if the file size has actually changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate sequence has caused regressions where they always update timestamps. There is some strange cases in POSIX where truncate(2) must not update times unless the size has acutally changed, see `6e656be89`. This area is all still rather buggy in different ways in a lot of filesystems and needs a cleanup and audit (ideally the vfs will provide a simple attribute or call to direct all filesystems exactly which attributes to change). But coming up with the best solution will take a while and is not appropriate for rc anyway. So fix recent regression for now. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-06-04 17:16:30 -04:00
Nick Piggin	8718d36cf9	fix setattr error handling in sysfs, configfs sysfs and configfs setattr functions have error cases after the generic inode's attributes have been changed. Fix consistency by changing the generic inode attributes only when it is guaranteed to succeed. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-06-04 17:16:29 -04:00
Dan Carpenter	5b54470dad	fcntl: return -EFAULT if copy_to_user fails copy_to_user() returns the number of bytes remaining, but we want to return -EFAULT. ret = fcntl(fd, F_SETOWN_EX, NULL); With the original code ret would be 8 here. V2: Takuya Yoshikawa pointed out a similar issue in f_getown_ex() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-06-04 17:16:28 -04:00
Roberto Sassu	7d683a0999	wrong type for 'magic' argument in simple_fill_super() It's used to superblock ->s_magic, which is unsigned long. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Reviewed-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Eric Paris <eparis@redhat.com> CC: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-06-04 17:16:28 -04:00
Nick Piggin	75de46b98d	fix setattr error handling in sysfs, configfs sysfs and configfs setattr functions have error cases after the generic inode's attributes have been changed. Fix consistency by changing the generic inode attributes only when it is guaranteed to succeed. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-06-04 13:27:53 -07:00
Alex Elder	1bf7dbfde8	Merge branch 'master' into for-linus	2010-06-04 13:22:30 -05:00
Sage Weil	1e5ea23df1	ceph: fix lease revocation when seq doesn't match If the client revokes a lease with a higher seq than what we have, keep the mds's seq, so that it honors our release. Otherwise, we can hang indefinitely. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-04 10:05:40 -07:00
Linus Torvalds	03cd373981	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix page refcount leak	2010-06-03 07:20:28 -07:00
Jens Axboe	ff9da691c0	pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface This changes the interface to be based on bytes instead. The API matches that of F_SETPIPE_SZ in that it rounds up the passed in size so that the resulting page array is a power-of-2 in size. The proc file is renamed to /proc/sys/fs/pipe-max-size to reflect this change. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-03 14:54:39 +02:00
Jens Axboe	419f8367ea	pipe: change the privilege required for growing a pipe beyond system max Change it to CAP_SYS_RESOURCE, as that more accurately models what we want to control. Suggested-by: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-03 12:45:28 +02:00
Jens Axboe	6a6ca57de9	pipe: adjust minimum pipe size to 1 page We don't need to pages to guarantee the POSIX requirement that upto a page size write must be atomic to an empty pipe. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-03 12:44:30 +02:00
David Woodhouse	e72e6497e7	jffs2: Fix NFS race by using insert_inode_locked() New inodes need to be locked as we're creating them, so they don't get used by other things (like NFSd) before they're ready. Pointed out by Al Viro. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2010-06-03 08:22:04 +01:00
David Woodhouse	f324e4cb2c	jffs2: Fix in-core inode leaks on error paths Pointed out by Al Viro. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2010-06-03 08:03:39 +01:00
Christoph Hellwig	f936972949	xfs: improve xfs_isilocked Use rwsem_is_locked to make the assertations for shared locks work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-06-03 16:22:29 +10:00
Christoph Hellwig	070ecdca54	xfs: skip writeback from reclaim context Allowing writeback from reclaim context causes massive problems with stack overflows as we can call into the writeback code which tends to be a heavy stack user both in the generic code and XFS from random contexts that perform memory allocations. Follow the example of btrfs (and in slightly different form ext4) and refuse to write out data from reclaim context. This issue should really be handled by the VM so that we can tune better for this case, but until we get it sorted out there we have to hack around this in each filesystem with a complex writeback path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-06-03 16:22:29 +10:00
Dave Chinner	5b257b4a1f	xfs: fix race in inode cluster freeing failing to stale inodes When an inode cluster is freed, it needs to mark all inodes in memory as XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes have a different life cycle to the buffer, and once the buffer is torn down during transaction completion, we must ensure none of the inodes get written back (which is what XFS_ISTALE does). Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these inodes either during inode reclaim or tail pushing on the AIL. The buffer is read back, but no longer contains inodes and so triggers assert failures and shutdowns. This was reproducable with at run.dbench10 invocation from xfstests. There are two main causes of xfs_ifree_cluster() failing. The first is simple - it checks in-memory inodes it finds in the per-ag icache to see if they are clean without holding the flush lock. if they are clean it skips them completely. However, If an inode is flushed delwri, it will appear clean, but is not guaranteed to be written back until the flush lock has been dropped. Hence we may have raced on the clean check and the inode may actually be dirty. Hence always mark inodes found in memory stale before we check properly if they are clean. The second is more complex, and makes the first problem easier to hit. Basically the in-memory inode scan is done with full knowledge it can be racing with inode flushing and AIl tail pushing, which means that inodes that it can't get the flush lock on might not be attached to the buffer after then in-memory inode scan due to IO completion occurring. This is actually documented in the code as "needs better interlocking". i.e. this is a zero-day bug. Effectively, the in-memory scan must be done while the inode buffer is locked and Io cannot be issued on it while we do the in-memory inode scan. This ensures that inodes we couldn't get the flush lock on are guaranteed to be attached to the cluster buffer, so we can then catch all in-memory inodes and mark them stale. Now that the inode cluster buffer is locked before the in-memory scan is done, there is no need for the two-phase update of the in-memory inodes, so simplify the code into two loops and remove the allocation of the temporary buffer used to hold locked inodes across the phases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-03 16:22:29 +10:00
Theodore Ts'o	1f5a81e41f	ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the donor file is an append-only file, we should not allow the operation to proceed, lest we end up overwriting the contents of an append-only file. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>	2010-06-02 22:04:39 -04:00
Sage Weil	558d3499bd	ceph: fix f_namelen reported by statfs We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-01 16:56:03 -07:00
Yehuda Sadeh	205475679a	ceph: fix memory leak in statfs Freeing the statfs request structure when required. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-01 16:56:02 -07:00
Henry C Chang	13a4214cd9	ceph: fix d_subdirs ordering problem We misused list_move_tail() to order the dentry in d_subdirs. This will screw up the d_subdirs order. This bug can be reliably reproduced by: 1. mount ceph fs. 2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git 3. Run autogen.sh in ceph directory. (Note: Errors only occur at the first time you run autogen.sh.) Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-06-01 16:55:55 -07:00
Christoph Hellwig	b160fdabe9	nfsd: nfsd_setattr needs to call commit_metadata The conversion of write_inode_now calls to commit_metadata in commit `f501912a35` missed out the call in nfsd_setattr. But without this conversion we can't guarantee that a SETATTR request has actually been commited to disk with XFS, which causes a regression from 2.6.32 (only for NFSv2, but anyway). Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-06-01 19:17:50 -04:00
Dan Carpenter	08a66859e6	FS-Cache: Remove unneeded null checks fscache_write_op() makes unnecessary checks of the page variable to see if it is NULL. It can't be NULL at those points as the kernel would already have crashed a little higher up where we examined page->index. Furthermore, unless radix_tree_gang_lookup_tag() can return 1 but no page, a NULL pointer crash should not be encountered there as we can only get there if r_t_g_l_t() returned 1. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-01 13:32:11 -07:00
Jeff Layton	06b43672a9	cifs: fix page refcount leak Commit `315e995c63` is causing OOM kills when stress-testing a CIFS filesystem. The VFS readpages operation takes a page reference. The older code just handed this reference off to the page cache, but the new code takes an extra one. The simplest fix is to put the new reference after add_to_page_cache_lru. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-06-01 17:15:52 +00:00
Denis Kirjanov	037776fcbe	AFS: Fix possible null pointer dereference in afs_alloc_server() Fix a possible null pointer dereference in afs_alloc_server(): the server pointer is NULL if there was an allocation failure, and under such a condition, we can't dereference it in the _leave() statement. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-01 09:26:36 -07:00
Takuya Yoshikawa	e30c7c3b30	binfmt_elf_fdpic: Fix clear_user() error handling clear_user() returns the number of bytes that could not be copied rather than an error code. So we should return -EFAULT rather than directly returning the results. Without this patch, positive values may be returned to elf_fdpic_map_file() and the following error handlings do not function as expected. 1. ret = elf_fdpic_map_file_constdisp_on_uclinux(params, file, mm); if (ret < 0) return ret; 2. ret = elf_fdpic_map_file_by_direct_mmap(params, file, mm); if (ret < 0) return ret; Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mike Frysinger <vapier@gentoo.org> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andrew Morton <akpm@linux-foundation.org> CC: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> CC: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-06-01 08:11:06 -07:00
Jens Axboe	b4ca761577	Merge branch 'master' into for-linus Conflicts: fs/pipe.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-01 12:42:12 +02:00
Jens Axboe	0e3c9a2284	Revert "writeback: fix WB_SYNC_NONE writeback from umount" This reverts commit `e913fc825d`. We are investigating a hang associated with the WB_SYNC_NONE changes, so revert them for now. Conflicts: fs/fs-writeback.c mm/page-writeback.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-01 11:08:43 +02:00
Jens Axboe	f17625b318	Revert "writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync" This reverts commit `7c8a3554c6`. We are investigating a hang associated with the WB_SYNC_NONE changes, so revert them for now. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-06-01 11:05:22 +02:00
Ryusuke Konishi	c29684d683	nilfs2: remove obsolete declarations of cache constructor and destructor The commit `41c88bd7` ("nilfs2: cleanup multi kmem_cache_{create,destroy} code") consolidated slab constructors and destructors used in nilfs, but it left some declarations in header files. This gets rid of the obsolete declarations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-31 20:50:29 +09:00
Ryusuke Konishi	84cb099985	nilfs2: fix style issue in nilfs_destroy_cachep This gets rid of unwanted space chars in front of conditional sentences of nilfs_destroy_cachep(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-31 20:50:29 +09:00
Linus Torvalds	003386fff3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable	2010-05-30 09:16:14 -07:00
Linus Torvalds	d28619f156	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Convert quota statistics to generic percpu_counter ext3 uses rb_node = NULL; to zero rb_root. quota: Fixup dquot_transfer reiserfs: Fix resuming of quotas on remount read-write pohmelfs: Remove dead quota code ufs: Remove dead quota code udf: Remove dead quota code quota: rename default quotactl methods to dquot_ quota: explicitly set ->dq_op and ->s_qcop quota: drop remount argument to ->quota_on and ->quota_off quota: move unmount handling into the filesystem quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers quota: move remount handling into the filesystem ocfs2: Fix use after free on remount read-only Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c	2010-05-30 09:11:11 -07:00
Linus Torvalds	b612a05537	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: clean up on forwarded aborted mds request ceph: fix leak of osd authorizer ceph: close out mds, osd connections before stopping auth ceph: make lease code DN specific fs/ceph: Use ERR_CAST ceph: renew auth tickets before they expire ceph: do not resend mon requests on auth ticket renewal ceph: removed duplicated #includes ceph: avoid possible null dereference ceph: make mds requests killable, not interruptible sched: add wait_for_completion_killable_timeout	2010-05-30 08:56:39 -07:00
Sage Weil	2a8e5e3637	ceph: clean up on forwarded aborted mds request If an mds request is aborted (timeout, SIGKILL), it is left registered to keep our state in sync with the mds. If we get a forward notification, though, we know the request didn't succeed and we can unregister it safely. We were trying to resend it, but then bailing out (and not unregistering) in __do_request. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:42:05 -07:00
Sage Weil	79494d1b9b	ceph: fix leak of osd authorizer Release the ceph_authorizer when releasing osd state. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:42:04 -07:00
Sage Weil	a922d38fd1	ceph: close out mds, osd connections before stopping auth The auth module (part of the mon_client) is needed to free any ceph_authorizer(s) used by the mds and osd connections. Flush the msgr workqueue before stopping monc to ensure that the destroy_authorizer auth op is available when those connections are closed out. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:42:03 -07:00
Sage Weil	dd1c905736	ceph: make lease code DN specific The lease code includes a mask in the CEPH_LOCK_* namespace, but that namespace is changing, and only one mask (formerly _DN == 1) is used, so hard code for that value for now. If we ever extend this code to handle leases over different data types we can extend it accordingly. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:42 -07:00
Julia Lawall	7e34bc524e	fs/ceph: Use ERR_CAST Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:41 -07:00
Sage Weil	a41359fa35	ceph: renew auth tickets before they expire We were only requesting renewal after our tickets expire; do so before that. Most of the low-level logic for this was already there; just use it. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:39 -07:00
Sage Weil	09c4d6a7d4	ceph: do not resend mon requests on auth ticket renewal We only want to send pending mon requests when we successfully authenticate. If we are already authenticated, like when we renew our ticket, there is no need to resend pending requests. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:38 -07:00
Andrea Gelmini	984c76908e	ceph: removed duplicated #includes fs/ceph/auth.c: linux/slab.h is included more than once. fs/ceph/super.h: linux/slab.h is included more than once. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:37 -07:00
Sage Weil	e95e9a7ae4	ceph: avoid possible null dereference ac->ops may be null; use protocol id in error message instead. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:36 -07:00
Sage Weil	aa91647c89	ceph: make mds requests killable, not interruptible The underlying problem is that many mds requests can't be restarted. For example, a restarted create() would return -EEXIST if the original request succeeds. However, we do not want a hung MDS to hang the client too. So, use the _killable wait_for_completion variants to abort on SIGKILL but nothing else. Signed-off-by: Sage Weil <sage@newdream.net>	2010-05-29 09:12:35 -07:00
Linus Torvalds	9a90e09854	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits) ACPI: Don't let acpi_pad needlessly mark TSC unstable drivers/acpi/sleep.h: Checkpatch cleanup ACPI: Minor cleanup eliminating redundant PMTIMER_TICKS to NS conversion ACPI: delete unused c-state promotion/demotion data strucutures ACPI: video: fix acpi_backlight=video ACPI: EC: Use kmemdup drivers/acpi: use kasprintf ACPI, APEI, EINJ injection parameters support Add x64 support to debugfs ACPI, APEI, Use ERST for persistent storage of MCE ACPI, APEI, Error Record Serialization Table (ERST) support ACPI, APEI, Generic Hardware Error Source memory error support ACPI, APEI, UEFI Common Platform Error Record (CPER) header Unified UUID/GUID definition ACPI Hardware Error Device (PNP0C33) support ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup ACPI, APEI, Document for APEI ACPI, APEI, EINJ support ACPI, APEI, HEST table parsing ACPI, APEI, APEI supporting infrastructure ...	2010-05-28 14:42:18 -07:00
Christoph Hellwig	fb3b504ade	xfs: fix access to upper inodes without inode64 If a filesystem is mounted without the inode64 mount option we should still be able to access inodes not fitting into 32 bits, just not created new ones. For this to work we need to make sure the inode cache radix tree is initialized for all allocation groups, not just those we plan to allocate inodes from. This patch makes sure we initialize the inode cache radix tree for all allocation groups, and also cleans xfs_initialize_perag up a bit to separate the inode32 logical from the general perag structure setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:56 -05:00
Dave Chinner	9b98b6f3e1	xfs: fix might_sleep() warning when initialising per-ag tree The use of radix_tree_preload() only works if the radix tree was initialised without the __GFP_WAIT flag. The per-ag tree uses GFP_NOFS, so does not trigger allocation of new tree nodes from the preloaded array. Hence it enters the allocator with a spinlock held and triggers the might_sleep() warnings. Reported-by; Chris Mason <chris.mason@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:50 -05:00
Julia Lawall	38e712ab3d	fs/xfs/quota: Add missing mutex_unlock Add a mutex_unlock missing on the error path. The use of this lock is balanced elsewhere in the file. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * mutex_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * mutex_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:41 -05:00
Huang Weiyi	3bd0946eb1	xfs: remove duplicated #include Remove duplicated #include('s) in fs/xfs/linux-2.6/xfs_quotaops.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:36 -05:00
Li Zefan	f8adb4d574	xfs: convert more trace events to DEFINE_EVENT Use DECLARE_EVENT_CLASS, and save ~15K: text data bss dec hex filename 171949 43028 48 215025 347f1 fs/xfs/linux-2.6/xfs_trace.o.orig 156521 43028 36 199585 30ba1 fs/xfs/linux-2.6/xfs_trace.o No change in functionality. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:31 -05:00
Huang Weiyi	292ec4cf35	xfs: xfs_trace.c: remove duplicated #include Remove duplicated #include('s) in fs/xfs/linux-2.6/xfs_trace.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:24 -05:00
Dave Chinner	07f1a4f5e8	xfs: Check new inode size is OK before preallocating The new xfsqa test 228 tries to preallocate more space than the filesystem contains. it should fail, but instead triggers an assert about lock flags. The failure is due to the size extension failing in vmtruncate() due to rlimit being set. Check this before we start the preallocation to avoid allocating space that will never be used. Also the path through xfs_vn_allocate already holds the IO lock, so it should not be present in the lock flags when the setattr fails. Hence the assert needs to take this into account. This will prevent other such callers from hitting this incorrect ASSERT. (Fixed a reference to "newsize" to read "new_size". -Alex) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 15:19:12 -05:00
Christoph Hellwig	fdc07f44c8	xfs: clean up xlog_align Add suggested cleanups to commit 29db3370a1369541d58d692fbfb168b8a0bd7f41 from review that didn't end up being commited. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 14:58:36 -05:00
Christoph Hellwig	025101dca4	xfs: cleanup log reservation calculactions Instead of having small helper functions calling big macros do the calculations for the log reservations directly in the functions. These are mostly 1:1 from the macros execept that the macros kept the quota calculations in their callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 14:58:30 -05:00
Eric Sandeen	32891b292d	xfs: be more explicit if RT mount fails due to config Recent testers were slightly confused that a realtime mount failed due to missing CONFIG_XFS_RT; we can make that a little more obvious. V2: drop the else as suggested by Christoph Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 14:58:24 -05:00
Eric Sandeen	657a4cffde	xfs: replace E2BIG with EFBIG where appropriate Many places in the xfs code return E2BIG when they really mean EFBIG; trying to grow past 16T on a 32 bit machine, for example, says "Argument list too long" rather than "File too large" which is not particularly helpful. Some of these don't make perfect sense as EFBIG either, but still better than E2BIG IMHO. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-05-28 14:58:16 -05:00
Al Viro	49837a80b3	remove detritus left by "mm: make read_cache_page synchronous" gets minix get_dir_page() in sync with its analogs; back in 2007 Nick has switched read_cache_page() and friends to sync behaviour (i.e. they wait for the page to get unlocked, check if it's uptodate and if it isn't return ERR_PTR(-EIO) instead) and removed the duplicate logics from the callers. In case of fs/minix/dir.c he'd removed only half of that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-28 11:37:41 -04:00
Al Viro	4c9002de32	fix fs/sysv s_dirt handling got broken on ->sync_fs() conversion a year ago, nobody noticed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:16:05 -04:00
npiggin@suse.de	459f6ed3b8	fat: convert to use the new truncate convention. Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:16:02 -04:00
npiggin@suse.de	737f2e93b9	ext2: convert to use the new truncate convention. I also have commented a possible bug in existing ext2 code, marked with XXX. Cc: linux-ext4@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:15:57 -04:00
Nick Piggin	3322e79a38	fs: convert simple fs to new truncate Convert simple filesystems: ramfs, configfs, sysfs, block_dev to new truncate sequence. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:15:47 -04:00
npiggin@suse.de	15c6fd9786	kill spurious reference to vmtruncate Lots of filesystems calls vmtruncate despite not implementing the old ->truncate method. Switch them to use simple_setsize and add some comments about the truncate code where it seems fitting. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:15:42 -04:00
npiggin@suse.de	7bb46a6734	fs: introduce new truncate sequence Introduce a new truncate calling sequence into fs/mm subsystems. Rather than setattr > vmtruncate > truncate, have filesystems call their truncate sequence from ->setattr if filesystem specific operations are required. vmtruncate is deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced previously should be used. simple_setattr is introduced for simple in-ram filesystems to implement the new truncate sequence. Eventually all filesystems should be converted to implement a setattr, and the default code in notify_change should go away. simple_setsize is also introduced to perform just the ATTR_SIZE portion of simple_setattr (ie. changing i_size and trimming pagecache). To implement the new truncate sequence: - filesystem specific manipulations (eg freeing blocks) must be done in the setattr method rather than ->truncate. - vmtruncate can not be used by core code to trim blocks past i_size in the event of write failure after allocation, so this must be performed in the fs code. - convert usage of helpers block_write_begin, nobh_write_begin, cont_write_begin, and blockdev_direct_IO to use _newtrunc postfixed variants. These avoid calling vmtruncate to trim blocks (see previous). - inode_setattr should not be used. generic_setattr is a new function to be used to copy simple attributes into the generic inode. - make use of the better opportunity to handle errors with the new sequence. Big problem with the previous calling sequence: the filesystem is not called until i_size has already changed. This means it is not allowed to fail the call, and also it does not know what the previous i_size was. Also, generic code calling vmtruncate to truncate allocated blocks in case of error had no good way to return a meaningful error (or, for example, atomically handle block deallocation). Cc: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:15:33 -04:00
Randy Dunlap	7000d3c424	fs/super: fix kernel-doc warning Fix fs/super.c kernel-doc warning and function notation: Warning(fs/super.c:957): No description found for parameter 'sb' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:06:23 -04:00
Erik van der Kouwe	0ab7620a0c	fs/minix: bugfix, number of indirect block ptrs per block depends on block size The MINIX filesystem driver used a constant number of indirect block pointers in an indirect block. This worked only for filesystems with 1kb block, while the MINIX default block size is now 4kb. As a consequence, large files were read incorrectly on such filesystems and writing a large file would cause the filesystem to become corrupted. This patch computes the number of indirect block pointers based on the block size, making the driver work for each block size. I would like to thank Feiran Zheng ('Fam') for pointing out the cause of the corruption. Signed-off-by: Erik van der Kouwe <vdkouwe@cs.vu.nl> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:06:22 -04:00
Christoph Hellwig	1b061d9247	rename the generic fsync implementations We don't name our generic fsync implementations very well currently. The no-op implementation for in-memory filesystems currently is called simple_sync_file which doesn't make too much sense to start with, the the generic one for simple filesystems is called simple_fsync which can lead to some confusion. This patch renames the generic file fsync method to generic_file_fsync to match the other generic_file_* routines it is supposed to be used with, and the no-op implementation to noop_fsync to make it obvious what to expect. In addition add some documentation for both methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:06:06 -04:00
Christoph Hellwig	7ea8085910	drop unused dentry argument to ->fsync Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:05:02 -04:00
Julia Lawall	cc967be547	fs: Add missing mutex_unlock Add a mutex_unlock missing on the error path. At other exists from the function that return an error flag, the mutex is unlocked, so do the same here. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * mutex_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * mutex_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:03:09 -04:00
Al Viro	d7065da038	get rid of the magic around f_count in aio __aio_put_req() plays sick games with file refcount. What it wants is fput() from atomic context; it's almost always done with f_count > 1, so they only have to deal with delayed work in rare cases when their reference happens to be the last one. Current code decrements f_count and if it hasn't hit 0, everything is fine. Otherwise it keeps a pointer to struct file (with zero f_count!) around and has delayed work do __fput() on it. Better way to do it: use atomic_long_add_unless( , -1, 1) instead of !atomic_long_dec_and_test(). IOW, decrement it only if it's not the last reference, leave refcount alone if it was. And use normal fput() in delayed work. I've made that atomic_long_add_unless call a new helper - fput_atomic(). Drops a reference to file if it's safe to do in atomic (i.e. if that's not the last one), tells if it had been able to do that. aio.c converted to it, __fput() use is gone. req->ki_file always contributes to refcount now. And __fput() became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:03:07 -04:00
Neil Brown	176306f59a	VFS: fix recent breakage of FS_REVAL_DOT Commit `1f36f774b2` broke FS_REVAL_DOT semantics. In particular, before this patch, the command ls -l in an NFS mounted directory would always check if the directory on the server had changed and if so would flush and refill the pagecache for the dir. After this patch, the same "ls -l" will repeatedly return stale date until the cached attributes for the directory time out. The following patch fixes this by ensuring the d_revalidate is called by do_last when "." is being looked-up. link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN is not set so nfs_lookup_verify_inode chooses not to do any validation. The following patch restores the original behaviour. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:03:06 -04:00
Al Viro	1eb2cbb6d5	Revert "anon_inode: set S_IFREG on the anon_inode" This reverts commit `a7cf4145bb`.	2010-05-27 22:03:05 -04:00
Linus Torvalds	105a048a4f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits) Btrfs: add more error checking to btrfs_dirty_inode Btrfs: allow unaligned DIO Btrfs: drop verbose enospc printk Btrfs: Fix block generation verification race Btrfs: fix preallocation and nodatacow checks in O_DIRECT Btrfs: avoid ENOSPC errors in btrfs_dirty_inode Btrfs: move O_DIRECT space reservation to btrfs_direct_IO Btrfs: rework O_DIRECT enospc handling Btrfs: use async helpers for DIO write checksumming Btrfs: don't walk around with task->state != TASK_RUNNING Btrfs: do aio_write instead of write Btrfs: add basic DIO read/write support direct-io: do not merge logically non-contiguous requests direct-io: add a hook for the fs to provide its own submit_bio function fs: allow short direct-io reads to be completed via buffered IO Btrfs: Metadata ENOSPC handling for balance Btrfs: Pre-allocate space for data relocation Btrfs: Metadata ENOSPC handling for tree log Btrfs: Metadata reservation for orphan inodes Btrfs: Introduce global metadata reservation ...	2010-05-27 10:43:44 -07:00
Linus Torvalds	e4ce30f377	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Make fsync sync new parent directories in no-journal mode ext4: Drop whitespace at end of lines ext4: Fix compat EXT4_IOC_ADD_GROUP ext4: Conditionally define compat ioctl numbers tracing: Convert more ext4 events to DEFINE_EVENT ext4: Add new tracepoints to track mballoc's buddy bitmap loads ext4: Add a missing trace hook ext4: restart ext4_ext_remove_space() after transaction restart ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted ext4: Avoid crashing on NULL ptr dereference on a filesystem error ext4: Use bitops to read/modify i_flags in struct ext4_inode_info ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() ext4: Use our own write_cache_pages() ext4: Show journal_checksum option ext4: Fix for ext4_mb_collect_stats() ext4: check for a good block group before loading buddy pages ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate ext4: Remove extraneous newlines in ext4_msg() calls ... Fixed up trivial conflict in fs/ext4/fsync.c	2010-05-27 10:26:37 -07:00
Linus Torvalds	ade61088bc	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix another nfs_wb_page() deadlock NFS: Ensure that we mark the inode as dirty if we exit early from commit NFS: Fix a lock imbalance typo in nfs_access_cache_shrinker sunrpc: fix leak on error on socket xprt setup	2010-05-27 10:18:44 -07:00
Dmitry Monakhov	f32764bd2b	quota: Convert quota statistics to generic percpu_counter Generic per-cpu counter has some memory overhead but it is negligible for modern systems and embedded systems compile without quota support. And code reuse is a good thing. This patch should fix complain from preemptive kernels which was introduced by `dde9588853`. [Jan Kara: Fixed patch to work on 32-bit archs as well] Reported-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-05-27 18:56:27 +02:00
jan Blunck	ca572727db	fs/: do not fallback to default_llseek() when readdir() uses BKL Do not use the fallback default_llseek() if the readdir operation of the filesystem still uses the big kernel lock. Since llseek() modifies file->f_pos of the directory directly it may need locking to not confuse readdir which usually uses file->f_pos directly as well Since the special characteristics of the BKL (unlocked on schedule) are not necessary in this case, the inode mutex can be used for locking as provided by generic_file_llseek(). This is only possible since all filesystems, except reiserfs, either use a directory as a flat file or with disk address offsets. Reiserfs on the other hand uses a 32bit hash off the filename as the offset so generic_file_llseek() can get used as well since the hash is always smaller than sb->s_maxbytes (= (512 << 32) - blocksize). Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Anders Larsen <al@alarsen.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:56 -07:00

1 2 3 4 5 ...

18546 Commits (8a6cfeb6deca3a8fefd639d898b0d163c0b5d368)