Commit graph

1053 commits

Author SHA1 Message Date
Steven Whitehouse
c83ae9cad8 GFS2: Add an AIL writeback tracepoint
Add a tracepoint for monitoring writeback of the AIL.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:58 +01:00
Steven Whitehouse
4667a0ec32 GFS2: Make writeback more responsive to system conditions
This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:37 +01:00
Steven Whitehouse
f42ab08529 GFS2: Optimise glock lru and end of life inodes
The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:01:17 +01:00
Steven Whitehouse
627c10b7e4 GFS2: Improve tracing support (adds two flags)
This adds support for two new flags. One keeps track of whether
the glock is on the LRU list or not. The other isn't really a
flag as such, but an indication of whether the glock has an
attached object or not. This indication is reported without
any locking, which is ok since we do not dereference the object
pointer but merely report whether it is NULL or not.

Also, this fixes one place where a tracepoint was missing, which
was at the point we remove deallocated blocks from the journal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:00:59 +01:00
Steven Whitehouse
dba898b02d GFS2: Clean up fsync()
This patch is designed to clean up GFS2's fsync
implementation and ensure that it really does get everything on
disk. Since ->write_inode() has been updated, we can call that
via the vfs library function sync_inode_metadata() and the only
remaining thing that has to be done is to ensure that we get
any revoke records in the log after the inode has been written back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:00:41 +01:00
Steven Whitehouse
efc1a9c2a7 GFS2: Remove unused macro
The buffer_in_io() macro has been unused for some time,
so remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 09:00:24 +01:00
Steven Whitehouse
29687a2ac8 GFS2: Alter point of entry to glock lru list for glocks with an address_space
Rather than allowing the glocks to be scheduled for possible
reclaim as soon as they have exited the journal, this patch
delays their entry to the list until the glocks in question
are no longer in use.

This means that we will rely on the vm for writeback of all
dirty data and metadata from now on. When glocks are added
to the lru list they should be freeable much faster since all
the I/O required to free them should have already been completed.

This should lead to much better I/O patterns under low memory
conditions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:59:48 +01:00
Steven Whitehouse
5ac048bb7e GFS2: Use filemap_fdatawrite() to write back the AIL
In order to ensure that the mapping stats (and thus the bdi) are correctly
updated, this patch changes the AIL writeback to use the filemap_datawrite
function. This helps prevent stalls in balance_dirty_pages() due to
large amounts of dirty metadata when there is little or no dirty data
around.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:59:25 +01:00
Steven Whitehouse
1027efaa23 GFS2: Make ->write_inode() really write
The GFS2 ->write_inode function should be more aggressive at writing
back to the filesystem. This adopts the XFS system of returning
-EAGAIN when the writeback has not been completely done. Also, we
now kick off in-place writeback when called with WB_SYNC_NONE,
but we only wait for it and flush the log when WB_SYNC_ALL is
requested.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:55:07 +01:00
Bob Peterson
556bb17998 GFS2: move function foreach_leaf to gfs2_dir_exhash_dealloc
The previous patches made function gfs2_dir_exhash_dealloc do nothing
but call function foreach_leaf.  This patch simplifies the code by
moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:54:44 +01:00
Bob Peterson
ec038c826b GFS2: pass leaf_bh into leaf_dealloc
Function foreach_leaf used to look up the leaf block address and get
a buffer_head.  Then it would call leaf_dealloc which did the same
lookup.  This patch combines the two operations by making foreach_leaf
pass the leaf bh to leaf_dealloc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:54:26 +01:00
Bob Peterson
d24a7a439a GFS2: Combine transaction from gfs2_dir_exhash_dealloc
At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode
type to "file" to prevent directory corruption in case of a crash.
It was doing so in its own journal transaction.  This patch makes the
change occur when the last call is make to leaf_dealloc, since it needs
to rewrite the directory dinode at that time anyway.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:56 +01:00
Bob Peterson
0d95326d9b GFS2: remove *leaf_call_t and simplify leaf_dealloc
Since foreach_leaf is only called with leaf_dealloc as its only possible
call function, we can simplify the code by making it call leaf_dealloc
directly.  This simplifies the code and eliminates the need for
leaf_call_t, the generic call method.  This is a first small step in
simplifying the directory leaf deallocation code.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:35 +01:00
Bob Peterson
95c8e17f2f GFS2: Dump better debug info if a bitmap inconsistency is detected
On rare occasions we encounter gfs2 problems where an
invalid bitmap state transition is attempted.  For example,
trying to "unlink" a free block.  In these cases, there
is really no useful information logged to debug the problem.
This patch adds more debug details that should allow us to
more closely examine the problem and possibly solve it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-20 08:53:12 +01:00
Bob Peterson
44ad37d69b GFS2: filesystem hang caused by incorrect lock order
This patch fixes a deadlock in GFS2 where two processes are trying
to reclaim an unlinked dinode:
One holds the inode glock and calls gfs2_lookup_by_inum trying to look
up the inode, which it can't, due to I_FREEING.  The other has set
I_FREEING from vfs and is at the beginning of gfs2_delete_inode
waiting for the glock, which is held by the first.  The solution is to
add a new non_block parameter to the gfs2_iget function that causes it
to return -ENOENT if the inode is being freed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:23:50 +01:00
Steven Whitehouse
001e8e8df4 GFS2: Don't try to deallocate unlinked inodes when mounted ro
This adds a couple of missing tests to avoid read-only nodes
from attempting to deallocate unlinked inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Michel Andre de la Porte <madelaporte@ubi.com>
2011-04-18 15:23:12 +01:00
Benjamin Marzinski
0ee532062f GFS2: directly write blocks past i_size
GFS2 was relying on the writepage code to write out the zeroed data for
fallocate.  However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size.
If it is, it will be ignored.  To work around this, gfs2 now calls
write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE
is set, and it's writing past i_size.

This version is just a cleanup of my last version

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:22:52 +01:00
Bob Peterson
deab72d379 GFS2: write_end error path fails to unlock transaction lock
I did an audit of gfs2's transaction glock for bugzilla bug
658619 and ran across this:

In function gfs2_write_end, in the unlikely event that
gfs2_meta_inode_buffer returns an error, the code may forget
to unlock the transaction lock because the "failed" label
appears after the call to function gfs2_trans_end.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-04-18 15:22:35 +01:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Serge E. Hallyn
2e14967075 userns: rename is_owner_or_cap to inode_owner_or_capable
And give it a kernel-doc comment.

[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:47:13 -07:00
Linus Torvalds
a44f99c7ef Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits)
  video: change to new flag variable
  scsi: change to new flag variable
  rtc: change to new flag variable
  rapidio: change to new flag variable
  pps: change to new flag variable
  net: change to new flag variable
  misc: change to new flag variable
  message: change to new flag variable
  memstick: change to new flag variable
  isdn: change to new flag variable
  ieee802154: change to new flag variable
  ide: change to new flag variable
  hwmon: change to new flag variable
  dma: change to new flag variable
  char: change to new flag variable
  fs: change to new flag variable
  xtensa: change to new flag variable
  um: change to new flag variables
  s390: change to new flag variable
  mips: change to new flag variable
  ...

Fix up trivial conflict in drivers/hwmon/Makefile
2011-03-20 18:14:55 -07:00
matt mooney
0ccd234ca0 fs: change to new flag variable
Replace EXTRA_CFLAGS with ccflags-y. And change ntfs-objs to ntfs-y
for cleaner conditional inclusion.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-03-17 14:02:57 +01:00
Linus Torvalds
0f6e0e8448 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
  AppArmor: kill unused macros in lsm.c
  AppArmor: cleanup generated files correctly
  KEYS: Add an iovec version of KEYCTL_INSTANTIATE
  KEYS: Add a new keyctl op to reject a key with a specified error code
  KEYS: Add a key type op to permit the key description to be vetted
  KEYS: Add an RCU payload dereference macro
  AppArmor: Cleanup make file to remove cruft and make it easier to read
  SELinux: implement the new sb_remount LSM hook
  LSM: Pass -o remount options to the LSM
  SELinux: Compute SID for the newly created socket
  SELinux: Socket retains creator role and MLS attribute
  SELinux: Auto-generate security_is_socket_class
  TOMOYO: Fix memory leak upon file open.
  Revert "selinux: simplify ioctl checking"
  selinux: drop unused packet flow permissions
  selinux: Fix packet forwarding checks on postrouting
  selinux: Fix wrong checks for selinux_policycap_netpeer
  selinux: Fix check for xfrm selinux context algorithm
  ima: remove unnecessary call to ima_must_measure
  IMA: remove IMA imbalance checking
  ...
2011-03-16 09:15:43 -07:00
Linus Torvalds
3ae2a1ce2e Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Don't use _raw version of RCU dereference
  GFS2: Adding missing unlock_page()
  GFS2: Update to AIL list locking
  GFS2: introduce AIL lock
  GFS2: fix block allocation check for fallocate
  GFS2: Optimize glock multiple-dequeue code
  GFS2: Remove potential race in flock code
  GFS2: Fix glock deallocation race
  GFS2: quota allows exceeding hard limit
  GFS2: deallocation performance patch
  GFS2: panics on quotacheck update
  GFS2: Improve cluster mmap scalability
  GFS2: Fix glock queue trace point
  GFS2: Post-VFS scale update for RCU path walk
  GFS2: Use RCU for glock hash table
2011-03-16 08:58:43 -07:00
James Morris
a002951c97 Merge branch 'next' into for-linus 2011-03-16 09:41:17 +11:00
Steven Whitehouse
7e32d02613 GFS2: Don't use _raw version of RCU dereference
As per RCU glock patch review comments, don't use the _raw
version of this function here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-03-15 08:58:17 +00:00
Maxim
6c474f7bc1 GFS2: Adding missing unlock_page()
gfs2_write_begin() calls grab_cache_page_write_begin() that returns *locked*
page. Correspondent error-handling path lacks for unlock_page() call:

> out:
> 	if (error == 0)
> 		return 0;
>
> 	page_cache_release(page);

The whole system hangs if gfs2_unstuff_dinode() called from gfs2_write_begin()
failed for some reason.

Reported-by: Maxim <maxim.patlasov@gmail.com>
Signed-off-by: Maxim <maxim.patlasov@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-14 13:19:21 +00:00
Aneesh Kumar K.V
5fe0c23788 exportfs: Return the minimum required handle size
The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Steven Whitehouse
c618e87a5f GFS2: Update to AIL list locking
The previous patch missed a couple of places where the AIL list
needed locking, so this fixes up those places, plus a comment
is corrected too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
2011-03-14 12:40:29 +00:00
Dave Chinner
d6a079e82e GFS2: introduce AIL lock
The log lock is currently used to protect the AIL lists and
the movements of buffers into and out of them. The lists
are self contained and no log specific items outside the
lists are accessed when starting or emptying the AIL lists.

Hence the operation of the AIL does not require the protection
of the log lock so split them out into a new AIL specific lock
to reduce the amount of traffic on the log lock. This will
also reduce the amount of serialisation that occurs when
the gfs2_logd pushes on the AIL to move it forward.

This reduces the impact of log pushing on sequential write
throughput.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11 11:52:25 +00:00
Benjamin Marzinski
e4a7b7b0c9 GFS2: fix block allocation check for fallocate
GFS2 fallocate wasn't properly checking if a blocks were already allocated.
In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2
was always treating it as if there were no blocks allocated for that page.
GFS2 now calls gfs2_block_map() to check if the blocks are allocated before
writing them out.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11 09:26:48 +00:00
Bob Peterson
fa1bbdea30 GFS2: Optimize glock multiple-dequeue code
This is a small patch that optimizes multiple glock dequeue
operations.  It changes the unlock order to be more efficient
and makes it easier for lock debugging tools to unravel.  It
also eliminates the need for the temp variable x, although
that would likely be optimized out.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-11 09:24:54 +00:00
Al Viro
53fe924161 gfs2: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:44:48 -05:00
Jens Axboe
4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe
721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe
7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Steven Whitehouse
0a33443b38 GFS2: Remove potential race in flock code
This patch ensures that we always wait for glock demotion when
dropping flocks on a file in order to prevent any race
conditions associated with further flock calls or closing
the file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09 11:14:32 +00:00
Steven Whitehouse
fc0e38dae6 GFS2: Fix glock deallocation race
This patch fixes a race in deallocating glocks which was introduced
in the RCU glock patch. We need to ensure that the glock count is
kept correct even in the case that there is a race to add a new
glock into the hash table. Also, to avoid having to wait for an
RCU grace period, the glock counter can be decremented before
call_rcu() is called.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09 10:58:04 +00:00
Abhijith Das
662e3a551b GFS2: quota allows exceeding hard limit
Immediately after being synced to disk, cached quotas are zeroed out and a
subsequent access of the cached quotas results in incorrect zero values. This
meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage
values it found in the cached quotas and comparison against warn/limits never
triggered a quota violation.

This patch adds a new flag QDF_REFRESH that is set after a sync so that the
cached quotas are forcefully refreshed from disk on a subsequent access on
seeing this flag set.

Resolves: rhbz#675944
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-03-09 09:32:44 +00:00
James Morris
fe3fa43039 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next 2011-03-08 11:38:10 +11:00
Bob Peterson
4c16c36ad6 GFS2: deallocation performance patch
This patch is a performance improvement to GFS2's dealloc code.
Rather than update the quota file and statfs file for every
single block that's stripped off in unlink function do_strip,
this patch keeps track and updates them once for every layer
that's stripped.  This is done entirely inside the existing
transaction, so there should be no risk of corruption.
The other functions that deallocate blocks will be unaffected
because they are using wrapper functions that do the same
thing that they do today.

I tested this code on my roth cluster by creating 200
files in a directory, each of which is 100MB, then on
four nodes, I simultaneously deleted the files, thus competing
for GFS2 resources (but different files).  The commands
I used were:

[root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done

The performance increase was significant:

             roth-01     roth-02     roth-03     roth-05
             ---------   ---------   ---------   ---------
old: real    0m34.027    0m25.021s   0m23.906s   0m35.646s
new: real    0m22.379s   0m24.362s   0m24.133s   0m18.562s

Total time spent deleting:
old: 118.6s
new:  89.4

For this particular case, this showed a 25% performance increase for
GFS2 unlinks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-24 12:13:48 +00:00
Miklos Szeredi
2aa15890f3 mm: prevent concurrent unmap_mapping_range() on the same inode
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Michael Leun <lkml20101129@newton.leun.net>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-23 19:52:52 -08:00
Tejun Heo
58a69cb47e workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'.  The former is the more prominent one.  The latter is
mostly used by workqueue and in a few other odd places.  Unify the
spelling to 'freezable'.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
2011-02-16 17:48:59 +01:00
Abhijith Das
e79a46a030 GFS2: panics on quotacheck update
Handle block allocation for forceful unstuffing of quota dinode during quota
update using quotactl(). Also fix block reservation for special cases when
quotas cross over block boundaries and update 2 blocks instead of 1.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-07 20:00:44 +00:00
Steven Whitehouse
b9c93bb7de GFS2: Improve cluster mmap scalability
The mmap system call grabs a glock when an update to atime maybe
required. It does this in order to ensure that the flags on the
inode are uptodate, but since it will only mark atime for a future
update, an exclusive lock is not required here (one will be taken
later when the actual update is performed).

Also, the lock can be skipped when the mount is marked noatime in
addition to the original check which only looked at the noatime
flag for the inode itself.

This should increase the scalability of the mmap call when multiple
nodes are all mmaping the same file.

Reported-by: Scooter Morris <scooter@cgl.ucsf.edu>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-02-02 14:48:10 +00:00
Eric Paris
2a7dba391e fs/vfs/security: pass last path component to LSM on inode creation
SELinux would like to implement a new labeling behavior of newly created
inodes.  We currently label new inodes based on the parent and the creating
process.  This new behavior would also take into account the name of the
new object when deciding the new label.  This is not the (supposed) full path,
just the last component of the path.

This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations.  We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly.  This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook.  If no such name
exists it is fine to pass NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>
2011-02-01 11:12:29 -05:00
Steven Whitehouse
edae38a643 GFS2: Fix glock queue trace point
Somehow this tracepoint landed up in the wrong place. This moves it
to where it should be.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-31 09:38:12 +00:00
Steven Whitehouse
75d5cfbe4b GFS2: Post-VFS scale update for RCU path walk
We can allow a few more cases to use RCU path walking than
originally allowed. It should be possible to also enable
RCU path walking when the glock is already cached. Thats
a bit more complicated though, so left for a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@gmail.com>
2011-01-21 09:39:24 +00:00
Steven Whitehouse
bc015cb841 GFS2: Use RCU for glock hash table
This has a number of advantages:

 - Reduces contention on the hash table lock
 - Makes the code smaller and simpler
 - Should speed up glock dumps when under load
 - Removes ref count changing in examine_bucket
 - No longer need hash chain lock in glock_put() in common case

There are some further changes which this enables and which
we may do in the future. One is to look at using SLAB_RCU,
and another is to look at using a per-cpu counter for the
per-sb glock counter, since that is touched twice in the
lifetime of each glock (but only used at umount time).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-01-21 09:39:08 +00:00