linux/fs/xfs/linux-2.6
Zach Brown 8459d86aff [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED
The only time it is safe to call aio_complete() is when the ->ki_retry
function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
historically done this by relying on its caller to translate positive return
codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
going to call aio_complete().  It would reverse the test and wait and free the
dio in the cases it thought that finished_one_bio() wasn't going to.

Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
errno from the submission path but it failed to communicate this to
finished_one_bio().  direct_io_worker() would return < 0, it's callers
wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
would be called for a second time which can manifest as an oops.

The previous cleanups have whittled the sync and async completion paths down
to the point where we can collapse them and clearly reassert the invariant
that we must only call aio_complete() after returning -EIOCBQUEUED.
direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
drop the dio refcount and the aio bio completion path will only call
aio_complete() when it is the last to drop the dio refcount.
direct_io_worker() can ensure that it is the last to drop the reference count
by waiting for bios to drain.  It does this for sync ops, of course, and for
partial dio writes that must fall back to buffered and for aio ops that saw
errors during submission.

This means that operations that end up waiting, even if they were issued as
aio ops, will not call aio_complete() from dio.  Instead we return the return
code of the operation and let the aio core call aio_complete().  This is
purposely done to fix a bug where AIO DIO file extensions would call
aio_complete() before their callers have a chance to update i_size.

Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
no longer have to translate for it.  XFS needs to be careful not to free
resources that will be used during AIO completion if -EIOCBQUEUED is returned.
 We maintain the previous behaviour of trying to write fs metadata for O_SYNC
aio+dio writes.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <xfs-masters@oss.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:21 -08:00
..
kmem.c [PATCH] separate bdi congestion functions from queue congestion functions 2006-10-20 10:26:35 -07:00
kmem.h [XFS] Add a greedy allocation interface, allocating within a min/max size 2006-09-28 11:03:27 +10:00
mrlock.h [XFS] lock validator: lockdep: small xfs init_rwsem() cleanup 2006-06-09 14:57:01 +10:00
mutex.h
sema.h [XFS] standardize on one sema init macro 2006-09-28 11:05:46 +10:00
spin.h
sv.h [XFS] Collapse sv_init and init_sv into just the one interface. 2006-09-28 11:05:52 +10:00
time.h
xfs_aops.c [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED 2006-12-10 09:57:21 -08:00
xfs_aops.h [PATCH] mark address_space_operations const 2006-06-28 14:59:04 -07:00
xfs_buf.c [PATCH] Use freezeable workqueues in XFS 2006-12-07 08:39:29 -08:00
xfs_buf.h [XFS] Remove several macros that are no longer used anywhere 2006-09-28 11:02:57 +10:00
xfs_cred.h
xfs_dmapi_priv.h [XFS] Remove KERNEL_VERSION macros from xfs_dmapi.h 2006-11-11 18:05:06 +11:00
xfs_export.c [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_export.h [XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all 2006-03-29 08:55:14 +10:00
xfs_file.c [PATCH] xfs: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
xfs_fs_subr.c [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. 2006-06-09 17:00:52 +10:00
xfs_fs_subr.h
xfs_globals.c [XFS] Improve error handling for the zero-fsblock extent detection code. 2006-09-28 11:03:20 +10:00
xfs_globals.h
xfs_ioctl.c [PATCH] xfs: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
xfs_ioctl32.c [PATCH] xfs: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
xfs_ioctl32.h [XFS] Fix compiler warning from xfs_file_compat_invis_ioctl prototype. 2006-03-20 13:25:48 +11:00
xfs_iops.c [XFS] Update XFS for i_blksize removal from generic inode structure 2006-09-28 11:01:22 +10:00
xfs_iops.h [PATCH] Make most file operations structs in fs/ const 2006-03-28 09:16:06 -08:00
xfs_linux.h [XFS] Fix a porting botch on the realtime subvol growfs code path. 2006-09-28 11:03:53 +10:00
xfs_lrw.c [PATCH] xfs: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
xfs_lrw.h [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. 2006-06-09 17:00:52 +10:00
xfs_stats.c [PATCH] for_each_possible_cpu: xfs 2006-06-23 07:42:45 -07:00
xfs_stats.h
xfs_super.c [PATCH] Add include/linux/freezer.h and move definitions from sched.h 2006-12-07 08:39:27 -08:00
xfs_super.h [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. 2006-06-09 17:00:52 +10:00
xfs_sysctl.c [PATCH] for_each_possible_cpu: xfs 2006-06-23 07:42:45 -07:00
xfs_sysctl.h [XFS] Add degframentation exclusion support 2006-06-09 14:54:19 +10:00
xfs_version.h
xfs_vfs.c [XFS] Remove version 1 directory code. Never functioned on Linux, just 2006-06-20 13:04:51 +10:00
xfs_vfs.h [XFS] remove accidentally reintroduced vfs unmount flag, unneeded in 2006-09-28 10:59:06 +10:00
xfs_vnode.c [PATCH] inode-diet: Eliminate i_blksize from the inode structure 2006-09-27 08:26:18 -07:00
xfs_vnode.h [XFS] remove bhv_lookup, _range version works aswell and has more useful 2006-09-28 10:58:52 +10:00