linux

Commit Graph

Author	SHA1	Message	Date
James Morris	a002951c97	Merge branch 'next' into for-linus	2011-03-16 09:41:17 +11:00
Steven Whitehouse	7e32d02613	GFS2: Don't use _raw version of RCU dereference As per RCU glock patch review comments, don't use the _raw version of this function here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-03-15 08:58:17 +00:00
Maxim	6c474f7bc1	GFS2: Adding missing unlock_page() gfs2_write_begin() calls grab_cache_page_write_begin() that returns locked page. Correspondent error-handling path lacks for unlock_page() call: > out: > if (error == 0) > return 0; > > page_cache_release(page); The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin() failed for some reason. Reported-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Maxim <maxim.patlasov@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-14 13:19:21 +00:00
Aneesh Kumar K.V	5fe0c23788	exportfs: Return the minimum required handle size The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:28 -04:00
Steven Whitehouse	c618e87a5f	GFS2: Update to AIL list locking The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>	2011-03-14 12:40:29 +00:00
Dave Chinner	d6a079e82e	GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 11:52:25 +00:00
Benjamin Marzinski	e4a7b7b0c9	GFS2: fix block allocation check for fallocate GFS2 fallocate wasn't properly checking if a blocks were already allocated. In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2 was always treating it as if there were no blocks allocated for that page. GFS2 now calls gfs2_block_map() to check if the blocks are allocated before writing them out. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:26:48 +00:00
Bob Peterson	fa1bbdea30	GFS2: Optimize glock multiple-dequeue code This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 09:24:54 +00:00
Al Viro	53fe924161	gfs2: fix d_revalidate oopsen on NFS exports can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-10 03:44:48 -05:00
Jens Axboe	4c63f5646e	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:58:35 +01:00
Jens Axboe	721a9602e6	block: kill off REQ_UNPLUG With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:27 +01:00
Jens Axboe	7eaceaccab	block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:07 +01:00
Steven Whitehouse	0a33443b38	GFS2: Remove potential race in flock code This patch ensures that we always wait for glock demotion when dropping flocks on a file in order to prevent any race conditions associated with further flock calls or closing the file. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 11:14:32 +00:00
Steven Whitehouse	fc0e38dae6	GFS2: Fix glock deallocation race This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 10:58:04 +00:00
Abhijith Das	662e3a551b	GFS2: quota allows exceeding hard limit Immediately after being synced to disk, cached quotas are zeroed out and a subsequent access of the cached quotas results in incorrect zero values. This meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage values it found in the cached quotas and comparison against warn/limits never triggered a quota violation. This patch adds a new flag QDF_REFRESH that is set after a sync so that the cached quotas are forcefully refreshed from disk on a subsequent access on seeing this flag set. Resolves: rhbz#675944 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-09 09:32:44 +00:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
Bob Peterson	4c16c36ad6	GFS2: deallocation performance patch This patch is a performance improvement to GFS2's dealloc code. Rather than update the quota file and statfs file for every single block that's stripped off in unlink function do_strip, this patch keeps track and updates them once for every layer that's stripped. This is done entirely inside the existing transaction, so there should be no risk of corruption. The other functions that deallocate blocks will be unaffected because they are using wrapper functions that do the same thing that they do today. I tested this code on my roth cluster by creating 200 files in a directory, each of which is 100MB, then on four nodes, I simultaneously deleted the files, thus competing for GFS2 resources (but different files). The commands I used were: [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done The performance increase was significant: roth-01 roth-02 roth-03 roth-05 --------- --------- --------- --------- old: real 0m34.027 0m25.021s 0m23.906s 0m35.646s new: real 0m22.379s 0m24.362s 0m24.133s 0m18.562s Total time spent deleting: old: 118.6s new: 89.4 For this particular case, this showed a 25% performance increase for GFS2 unlinks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-02-24 12:13:48 +00:00
Miklos Szeredi	2aa15890f3	mm: prevent concurrent unmap_mapping_range() on the same inode Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-23 19:52:52 -08:00
Tejun Heo	58a69cb47e	workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>	2011-02-16 17:48:59 +01:00
Abhijith Das	e79a46a030	GFS2: panics on quotacheck update Handle block allocation for forceful unstuffing of quota dinode during quota update using quotactl(). Also fix block reservation for special cases when quotas cross over block boundaries and update 2 blocks instead of 1. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-02-07 20:00:44 +00:00
Steven Whitehouse	b9c93bb7de	GFS2: Improve cluster mmap scalability The mmap system call grabs a glock when an update to atime maybe required. It does this in order to ensure that the flags on the inode are uptodate, but since it will only mark atime for a future update, an exclusive lock is not required here (one will be taken later when the actual update is performed). Also, the lock can be skipped when the mount is marked noatime in addition to the original check which only looked at the noatime flag for the inode itself. This should increase the scalability of the mmap call when multiple nodes are all mmaping the same file. Reported-by: Scooter Morris <scooter@cgl.ucsf.edu> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-02-02 14:48:10 +00:00
Eric Paris	2a7dba391e	fs/vfs/security: pass last path component to LSM on inode creation SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>	2011-02-01 11:12:29 -05:00
Steven Whitehouse	edae38a643	GFS2: Fix glock queue trace point Somehow this tracepoint landed up in the wrong place. This moves it to where it should be. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-31 09:38:12 +00:00
Steven Whitehouse	75d5cfbe4b	GFS2: Post-VFS scale update for RCU path walk We can allow a few more cases to use RCU path walking than originally allowed. It should be possible to also enable RCU path walking when the glock is already cached. Thats a bit more complicated though, so left for a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@gmail.com>	2011-01-21 09:39:24 +00:00
Steven Whitehouse	bc015cb841	GFS2: Use RCU for glock hash table This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-01-21 09:39:08 +00:00
Steven Whitehouse	24d9765fc1	GFS2: Fix error path in gfs2_lookup_by_inum() In the (impossible, except if there is fs corruption) error path in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh() fails, it was leaving the function by calling iput() rather than iget_failed(). This would cause future lookups of the same inode to block forever. This patch fixes the problem by moving the call to gfs2_inode_refresh() into gfs2_inode_lookup() where iget_failed() is part of the error path already. Also this cleans up some unreachable code and makes gfs2_set_iop() static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:49:08 +00:00
Benjamin Marzinski	23c3010808	GFS2: remove iopen glocks from cache on failed deletes When a file gets deleted on GFS2, if a node can't get an exclusive lock on the file's iopen glock, it punts on actually freeing up the space, because another node is using the file. When it does this, it needs to drop the iopen glock from its cache so that the other node can get an exclusive lock on it. Now, gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the iopen glock in preparation for grabbing it in the exclusive state. Since the node needs the glock in the exclusive state, dropping the shared lock from the cache doesn't slow down the case where no other nodes are using the file. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:28:29 +00:00
Christoph Hellwig	2fe17c1075	fallocate should be a file operation Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:31 -05:00
Christoph Hellwig	64c23e8687	make the feature checks in ->fallocate future proof Instead of various home grown checks that might need updates for new flags just check for any bit outside the mask of the features supported by the filesystem. This makes the check future proof for any newly added flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:30 -05:00
Linus Torvalds	275220f0fc	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...	2011-01-13 10:45:01 -08:00
Josef Bacik	9ecf639a96	Gfs2: fail if we try to use hole punch Gfs2 doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:44 -05:00
Al Viro	41ced6dcf3	switch gfs2, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:46 -05:00
Alexey Dobriyan	57cc7215b7	headers: kobject.h redux Remove kobject.h from files which don't need it, notably, sched.h and fs.h. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-10 08:51:44 -08:00
Linus Torvalds	b4a45f5fe8	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...	2011-01-07 08:56:33 -08:00
Nick Piggin	b74c79e993	fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	34286d6662	fs: rcu-walk aware d_revalidate method Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	b1e6a015a5	fs: change d_hash for rcu-walk Change d_hash so it may be called from lock-free RCU lookups. See similar patch for d_compare for details. For in-tree filesystems, this is just a mechanical change. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:20 +11:00
Nick Piggin	fe15ce446b	fs: change d_delete semantics Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:18 +11:00
Steven Whitehouse	846f404552	GFS2: Don't flush delete workqueue when releasing the transaction lock There is no requirement to flush the delete workqueue before a gfs2 filesystem is suspended. The workqueue's work will just be suspended along with the rest of the tasks on the filesystem. The resolves a deadlock situation where the transaction lock's demotion code was trying to flush the delete workqueue while at the same time, the workqueue was waiting for the transaction lock. The delete workqueue is flushed by gfs2_make_fs_ro() already, so that umount/remount are correctly protected anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-12-16 15:18:48 +00:00
Bob Peterson	bcd7278d8a	GFS2: fsck.gfs2 reported statfs error after gfs2_grow When you do gfs2_grow it failed to take the very last rgrp into account when adding up the new free space due to an off-by-one error. It was not reading the last rgrp from the rindex because of a check for "<=" that should have been "<". Therefore, fsck.gfs2 was finding (and fixing) an error with the system statfs file. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2010-12-07 18:55:07 +00:00
Steven Whitehouse	47a25380e3	GFS2: Merge glock state fields into a bitfield We can only merge the fields into a bitfield if the locking rules for them are the same. In this case gl_spin covers all of the fields (write side) but a couple of them are used with GLF_LOCK as the read side lock, which should be ok since we know that the field in question won't be changing at the time. The gl_req setting has to be done earlier (in glock.c) in order to place it under gl_spin. The gl_reply setting also has to be brought under gl_spin in order to comply with the new rules. This saves 4*sizeof(unsigned int) per glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2010-11-30 15:49:31 +00:00
Steven Whitehouse	e06dfc4928	GFS2: Fix uninitialised error value in previous patch Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 15:46:02 +00:00
Benjamin Marzinski	086d8334cf	GFS2: fix recursive locking during rindex truncates When you truncate the rindex file, you need to avoid calling gfs2_rindex_hold, since you already hold it. However, if you haven't already read in the resource groups, you need to do that. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 15:41:54 +00:00
Benjamin Marzinski	0489b3f5eb	GFS2: reread rindex when necessary to grow rindex When GFS2 grew the filesystem, it was never rereading the rindex file during the grow. This is necessary for large grows when the filesystem is almost full, and GFS2 needs to use some of the space allocated earlier in the grow to complete it. Now, if GFS2 fails to reserve the necessary space and the rindex file is not uptodate, it rereads it. Also, the only difference between gfs2_ri_update() and gfs2_ri_update_special() was that gfs2_ri_update_special() didn't clear out the existing resource groups, since you knew that it was only called when there were no resource groups. Attempting to clear out the resource groups when there are none takes almost no time, and rarely happens, so I simply removed gfs2_ri_update_special(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 15:34:18 +00:00
Steven Whitehouse	0b1246e677	GFS2: Remove duplicate #defines from glock.h There are a number of duplicated #defines in glock.h plus one which is unused. This removes the extra definitions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 15:33:04 +00:00
Steven Whitehouse	921169ca2f	GFS2: Clean up of gdlm_lock function The DLM never returns -EAGAIN in response to dlm_lock(), and even if it did, the test in gdlm_lock() was wrong anyway. Once that test is removed, it is possible to greatly simplify this code by simply using a "normal" error return code (0 for success). We then no longer need the LM_OUT_ASYNC return code which can be removed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:31:48 +00:00
Abhijith Das	802ec9b668	GFS2: Allow gfs2 to update quota usage values through the quotactl interface With this patch the gfs2_set_dqblk() function will be able to update the quota usage block count (FS_DQ_BCOUNT) in addition to the already supported FS_DQ_BHARD (limit) and FS_DQ_BSOFT (warn) fields of the dquot structure. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:31:27 +00:00
Joe Perches	edc221d00b	GFS2: fs/gfs2/glock.h: Add __attribute__((format(printf,2,3)) to gfs2_print_dbg Functions that use printf formatting, especially those that use %pV, should have their uses of printf format and arguments checked by the compiler. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:31:05 +00:00
Joe Perches	5e69069c1a	GFS2: fs/gfs2/glock.c: Use printf extension %pV Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:30:41 +00:00
Steven Whitehouse	2ae51ed7b5	GFS2: Clean up duplicated setattr code While preparing the last patch I noticed that the gfs2_setattr_simple code had been duplicated into two other places. This patch updates those to call gfs2_setattr_simple rather than open coding it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:30:19 +00:00
Steven Whitehouse	9e55cd5372	GFS2: Remove unreachable calls to vmtruncate Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:22:48 +00:00
Joe Perches	cc18152eb7	GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pS Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-30 10:22:19 +00:00
Steven Whitehouse	d2115778c7	GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIM The WQ_RESCUER flag should only be used internally to the workqueue implementation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2010-11-30 10:21:55 +00:00
Jens Axboe	f30195c502	Merge branch 'cleanup-bd_claim' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core	2010-11-27 19:49:18 +01:00
Abhijith Das	14870b4575	GFS2: Userland expects quota limit/warn/usage in 512b blocks Userland programs using the quotactl() syscall assume limit/warn/usage block counts in 512b basic blocks which were instead being read/written in fs blocksize in gfs2. With this patch, gfs2 correctly interacts with the syscall using 512b blocks. Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-19 11:20:29 +00:00
Steven Whitehouse	044b9414c7	GFS2: Fix inode deallocation race This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-15 12:44:42 +00:00
Tejun Heo	d4d7762995	block: clean up blkdev_get() wrappers and their users After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:18 +01:00
Tejun Heo	e525fd89d3	block: make blkdev_get/put() handle exclusive access Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:17 +01:00
Christoph Hellwig	51ee4b84f5	locks: let the caller free file_lock on ->setlease failure The caller allocated it, the caller should free it. The only issue so far is that we could change the flp pointer even on an error return if the fl_change callback failed. But we can simply move the flp assignment after the fl_change invocation, as the callers don't care about the flp return value if the setlease call failed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-31 06:35:15 -07:00
J. Bruce Fields	05fa3135fd	locks: fix setlease methods to free passed-in lock We modified setlease to require the caller to allocate the new lease in the case of creating a new lease, but forgot to fix up the filesystem methods. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-30 18:08:15 -07:00
Al Viro	8bcbbf0009	convert gfs2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:17:16 -04:00
Linus Torvalds	426e1f5cec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...	2010-10-26 17:58:44 -07:00
Wu Fengguang	1b430beee5	writeback: remove nonblocking/encountered_congestion references This removes more dead code that was somehow missed by commit `0d99519efe` (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Al Viro	9dcefee508	gfs2: invalidate_inodes() is no-op there In fill_super() we hadn't MS_ACTIVE set yet, so there won't be any inodes with zero i_count sitting around. In put_super() we already have MS_ACTIVE removed and we had called invalidate_inodes() since then. So again there won't be any inodes with zero i_count... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:23:01 -04:00
Christoph Hellwig	ebdec241d5	fs: kill block_prepare_write __block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:18:20 -04:00
Linus Torvalds	91b745016c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove in_workqueue_context() workqueue: Clarify that schedule_on_each_cpu is synchronous memory_hotplug: drop spurious calls to flush_scheduled_work() shpchp: update workqueue usage pciehp: update workqueue usage isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr() workqueue: add and use WQ_MEM_RECLAIM flag workqueue: fix HIGHPRI handling in keep_working() workqueue: add queue_work and activate_work trace points workqueue: prepare for more tracepoints workqueue: implement flush[_delayed]_work_sync() workqueue: factor out start_flush_work() workqueue: cleanup flush/cancel functions workqueue: implement alloc_ordered_workqueue() Fix up trivial conflict in fs/gfs2/main.c as per Tejun	2010-10-22 17:13:10 -07:00
Linus Torvalds	a2887097f2	Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...	2010-10-22 17:07:18 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Linus Torvalds	79f14b7c56	Merge branch 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits) BKL: remove BKL from freevxfs BKL: remove BKL from qnx4 autofs4: Only declare function when CONFIG_COMPAT is defined autofs: Only declare function when CONFIG_COMPAT is defined ncpfs: Lock socket in ncpfs while setting its callbacks fs/locks.c: prepare for BKL removal BKL: Remove BKL from ncpfs BKL: Remove BKL from OCFS2 BKL: Remove BKL from squashfs BKL: Remove BKL from jffs2 BKL: Remove BKL from ecryptfs BKL: Remove BKL from afs BKL: Remove BKL from USB gadgetfs BKL: Remove BKL from autofs4 BKL: Remove BKL from isofs BKL: Remove BKL from fat BKL: Remove BKL from ext2 filesystem BKL: Remove BKL from do_new_mount() BKL: Remove BKL from cgroup BKL: Remove BKL from NTFS ...	2010-10-22 10:52:01 -07:00
Jens Axboe	fa251f8990	Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-19 09:13:04 +02:00
Andrea Gelmini	33027af637	GFS2: fixed typo Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-10-18 14:38:07 +01:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Tejun Heo	6370a6ad3b	workqueue: add and use WQ_MEM_RECLAIM flag Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark WQ_RESCUER as internal and replace all external WQ_RESCUER usages to WQ_MEM_RECLAIM. This makes the API users express the intent of the workqueue instead of indicating the internal mechanism used to guarantee forward progress. This is also to make it cleaner to add more semantics to WQ_MEM_RECLAIM. For example, if deemed necessary, memory reclaim workqueues can be made highpri. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Steven Whitehouse <swhiteho@redhat.com>	2010-10-11 15:20:26 +02:00
Steven Whitehouse	134669854e	GFS2: Fix type mapping for demote_rq interface Mostly the glock operations follow the type of the glock. The one exception is the transaction glock, so we need to check for that directly. Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-10-06 09:58:44 +01:00
Arnd Bergmann	b89f432133	fs/locks.c: prepare for BKL removal This prepares the removal of the big kernel lock from the file locking code. We still use the BKL as long as fs/lockd uses it and ceph might sleep, but we can flip the definition to a private spinlock as soon as that's done. All users outside of fs/lockd get converted to use lock_flocks() instead of lock_kernel() where appropriate. Based on an earlier patch to use a spinlock from Matthew Wilcox, who has attempted this a few times before, the earliest patch from over 10 years ago turned it into a semaphore, which ended up being slower than the BKL and was subsequently reverted. Someone should do some serious performance testing when this becomes a spinlock, since this has caused problems before. Using a spinlock should be at least as good as the BKL in theory, but who knows... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org	2010-10-05 11:02:04 +02:00
Bob Peterson	46290341cd	GFS2 fatal: filesystem consistency error on rename This patch fixes a GFS2 problem whereby the first rename after a mount can result in a file system consistency error being flagged improperly and cause the file system to withdraw. The problem is that the rename code tries to run the rgrp list with function gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in from disk. The patch makes the rename function hold the rindex glock (as the gfs2_unlink code does today) which reads in the rgrp list if need be. There were a total of three places in the rename code that improperly referenced the rgrp list without the rindex glock and this patch fixes all three. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-30 17:23:03 +01:00
Steven Whitehouse	feb47ca931	GFS2: Improve journal allocation via sysfs Recently a feature was added to GFS2 to allow journal id allocation via sysfs. This patch builds upon that so that a negative journal id will be treated as an error code to be passed back as the return code from mount. This allows termination of the mount process if there is a failure. Also, the process has been updated so that the kernel will wait for a journal id, even in the "spectator" case. This is required in order to avoid mounting a filesystem in case there is an error while joining the cluster. In the spectator case, 0 is written into the file to indicate that all is well, and that mount should continue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 15:04:18 +01:00
Steven Whitehouse	43f74c1995	GFS2: Add "norecovery" mount option as a synonym for "spectator" XFS supports the "norecovery" mount option which is basically the same as the GFS2 spectator mode. This adds support for "norecovery" as a synonym for spectator mode, which is hopefully a more obvious description of what it actually does. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:24:41 +01:00
Steven Whitehouse	c741c45512	GFS2: Fix spectator umount issue The tests further down the recovery function relating to unlocking the journal need to be updated to match the intial test. Also, a test in the umount code which was surplus to requirements has been removed. Umounting spectator mounts now works correctly, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-29 14:20:52 +01:00
Steven Whitehouse	d594845106	GFS2: Fix compiler warning from previous patch This shouldn't really be required, but gcc can't tell that "al" is only accessed when initialised. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-28 10:17:47 +01:00
Benjamin Marzinski	bf97b6734e	GFS2: reserve more blocks for transactions Some of the functions in GFS2 were not reserving space in the transaction for the resource group header and the resource groups bitblocks that get added when you do allocation. GFS2 now makes sure to reserve space for the resource group header and either all the bitblocks in the resource group, or one for each block that it may allocate, whichever is smaller using the new gfs2_rg_blocks() inline function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-28 09:44:24 +01:00
Steven Whitehouse	d0795f9123	GFS2: Fix journal check for spectator mounts When checking journals for spectator mounts, we cannot rely on the journal being locked, whatever its jid might be. This patch ensures that we always get the journal locks when checking journals for a spectator mount. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-27 15:58:11 +01:00
Steven Whitehouse	c80dbb58f9	GFS2: Remove upgrade mount option This option has never done anything useful. Also at the same time this cleans up the sb checks which are done at mount time. The debug option will be accepted, but ignored in future. Since it didn't do anything, there didn't seem much point in retaining it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-24 09:55:07 +01:00
Steven Whitehouse	c2048b003c	GFS2: Remove localcaching mount option This option defaulted to on for lock_nolock mounts and off otherwise. The only function was to avoid the revalidation of dentries. In the cluster case, that is entirely pointless and liable to cause coherency problems. The patch changes the revalidation to depend upon whether the fs is a local or cluster fs (i.e. it follows the existing default behaviour). I very much doubt anybody ever used this option as there is no reason to. Even so we will continue to accept it on the mount command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-23 14:00:31 +01:00
Steven Whitehouse	f57a024ed2	GFS2: Remove ignore_local_fs mount argument This is been a no-op for a very long time now. I'm pretty sure nobody uses it, but just in case we'll still accept it on the command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-23 13:41:42 +01:00
Steven Whitehouse	8d1235852b	GFS2: Make . and .. qstrs constant Rather than calculating the qstrs for . and .. each time we need them, its better to keep a constant version of these and just refer to them when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>	2010-09-20 11:21:09 +01:00
Steven Whitehouse	9fa0ea9f26	GFS2: Use new workqueue scheme The recovery workqueue can be freezable since we want it to finish what it is doing if the system is to be frozen (although why you'd want to freeze a cluster node is beyond me since it will result in it being ejected from the cluster). It does still make sense for single node GFS2 filesystems though. The glock workqueue will benefit from being able to run more work items concurrently. A test running postmark shows improved performance and multi-threaded workloads are likely to benefit even more. It needs to be high priority because the latency directly affects the latency of filesystem glock operations. The delete workqueue is similar to the recovery workqueue in that it must not get blocked by memory allocations, and may run for a long time. Potentially other GFS2 threads might also be converted to workqueues, but I'll leave that for a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>	2010-09-20 11:20:36 +01:00
Steven Whitehouse	1fea7c25a0	GFS2: Update handling of DLM return codes to match reality GFS2's idea of which return codes it needs to handle was based upon those listed in dlm.h. Those didn't cover all the possible codes and listed some which never happen. This updates GFS2 to handle all the codes which can actually be returned from the DLM under various circumstances. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:20:12 +01:00
Steven Whitehouse	7b5e3d5fcf	GFS2: Don't enforce min hold time when two demotes occur in rapid succession Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>	2010-09-20 11:19:50 +01:00
Steven Whitehouse	fe08d5a897	GFS2: Fix whitespace in previous patch Removes the offending space Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:35 +01:00
Benjamin Marzinski	3921120e75	GFS2: fallocate support This patch adds support for fallocate to gfs2. Since the gfs2 does not support uninitialized data blocks, it must write out zeros to all the blocks. However, since it does not need to lock any pages to read from, gfs2 can write out the zero blocks much more efficiently. On a moderately full filesystem, fallocate works around 5 times faster on average. The fallocate call also allows gfs2 to add blocks to the file without changing the filesize, which will make it possible for gfs2 to preallocate space for the rindex file, so that gfs2 can grow a completely full filesystem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:19:17 +01:00
Steven Whitehouse	9a3f236d40	GFS2: Add a bug trap in allocation code This adds a check to ensure that if we reach the block allocator that we don't try and proceed if there is no alloc structure hanging off the inode. This should only happen if there is a bug in GFS2. The error return code is distinctive in order that it will be easily spotted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:59 +01:00
Steven Whitehouse	820969f353	GFS2: No longer experimental I think the time has arrvied to remove the experimental tag from GFS2. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:46 +01:00
Steven Whitehouse	a2e0f79939	GFS2: Remove i_disksize With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-20 11:18:29 +01:00
Steven Whitehouse	ff8f33c8b3	GFS2: New truncate sequence This updates GFS2's truncate code to use the new truncate sequence correctly. This is a stepping stone to being able to remove ip->i_disksize in favour of using i_size everywhere now that the two sizes are always identical. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de>	2010-09-20 11:18:16 +01:00
Steven Whitehouse	5f4874903d	GFS2: gfs2_logd should be using interruptible waits Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-17 14:00:10 +01:00
Christoph Hellwig	dd3932eddf	block: remove BLKDEV_IFL_WAIT All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-16 20:52:58 +02:00

1 2 3 4 5 ...

1078 Commits (61aaff49e20fdb700f1300a49962bc76effc77fc)