linux/fs
David Howells c61ea31dac CacheFiles: Fix occasional EIO on call to vfs_unlink()
Fix an occasional EIO returned by a call to vfs_unlink():

	[ 4868.465413] CacheFiles: I/O Error: Unlink failed
	[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
	[ 4947.320011] CacheFiles: File cache on md3 unregistering
	[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
	[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
	[ 5127.348716] CacheFiles: File cache on md3 registered
	[ 7076.871081] CacheFiles: I/O Error: Unlink failed
	[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
	[ 7116.780891] CacheFiles: File cache on md3 unregistering
	[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
	[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
	[ 7296.813432] CacheFiles: File cache on md3 registered

What happens is this:

 (1) A cached NFS file is seen to have become out of date, so NFS retires the
     object and immediately acquires a new object with the same key.

 (2) Retirement of the old object is done asynchronously - so the lookup/create
     to generate the new object may be done first.

     This can be a problem as the old object and the new object must exist at
     the same point in the backing filesystem (i.e. they must have the same
     pathname).

 (3) The lookup for the new object sees that a backing file already exists,
     checks to see whether it is valid and sees that it isn't.  It then deletes
     that file and creates a new one on disk.

 (4) The retirement phase for the old file is then performed.  It tries to
     delete the dentry it has, but ext4_unlink() returns -EIO because the inode
     attached to that dentry no longer matches the inode number associated with
     the filename in the parent directory.

The trace below shows this quite well.

	[md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
	[md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)

NFS has retired the old cookie and asked for a new one.

	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
	[kslowd] <== fscache_object_state_machine() [->OBJECT_DYING]
	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
	[kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP]
	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
	[kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING]

The old object (OBJ52) is going through the terminal states to get rid of it,
whilst the new object - (OBJ53) - is coming into being.

	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
	[kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
	[kslowd] lookup '@68'
	[kslowd] next -> ffff88002ce41bd0 positive
	[kslowd] advance
	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] next -> ffff8800369faac8 positive

The new object has looked up the subdir in which the file would be in (getting
dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
ffff8800369faac8).

	[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
	[kslowd] unlink stale object
	[kslowd] <== cachefiles_bury_object() = 0

It then checks the file's xattrs to see if it's valid.  NFS says that the
auxiliary data indicate the file is out of date (obvious to us - that's why NFS
ditched the old version and got a new one).  CacheFiles then deletes the old
file (dentry ffff8800369faac8).

	[kslowd] redo lookup
	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] next -> ffff88002cd94288 negative
	[kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}}

CacheFiles then redoes the lookup and gets a negative result in a new dentry
(ffff88002cd94288) which it then creates a file for.

	[kslowd] ==> cachefiles_mark_object_active(,OBJ53)
	[kslowd] <== cachefiles_mark_object_active() = 0
	[kslowd] === OBTAINED_OBJECT ===
	[kslowd] <== cachefiles_walk_to_object() = 0 [148247]
	[kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE]

The new object is then marked active and the state machine moves to the
available state - at which point NFS can start filling the object.

	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20})
	[kslowd] ==> fscache_release_object()
	[kslowd] ==> cachefiles_drop_object({OBJ52,2})
	[kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8})

The old object, meanwhile, goes on with being retired.  If allocation occurs
first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to
become available before it can continue.

	[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
	[kslowd] unlink stale object
	EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193)
	CacheFiles: I/O Error: Unlink failed
	FS-Cache: Cache cachefiles stopped due to I/O error

CacheFiles then tries to delete the file for the old object, but the dentry it
has (ffff8800369faac8) no longer points to a valid inode for that directory
entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino.

	[kslowd] <== cachefiles_bury_object() = -5
	[kslowd] <== cachefiles_delete_object() = -5
	[kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD]
	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
	[kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE]

(Note that the above trace includes extra information beyond that produced by
the upstream code).

The fix is to note when an object that is being retired has had its object
deleted preemptively by a replacement object that is being created, and to
skip the second removal attempt in such a case.

Reported-by: Greg M <gregm@servu.net.au>
Reported-by: Mark Moseley <moseleymark@gmail.com>
Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-11 10:07:53 -07:00
..
9p 9p: add bdi backing to mount session 2010-04-22 11:42:00 +02:00
adfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
affs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
afs Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2010-04-28 07:56:05 -07:00
autofs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
autofs4 autofs4-2.6.34-rc1 - fix link_count usage 2010-05-10 09:48:10 -07:00
befs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
bfs pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
btrfs btrfs: convert to using bdi_setup_and_register() 2010-04-26 10:27:54 +02:00
cachefiles CacheFiles: Fix occasional EIO on call to vfs_unlink() 2010-05-11 10:07:53 -07:00
ceph ceph: remove bad auth_x kmem_cache 2010-05-03 10:49:25 -07:00
cifs cifs: add bdi backing to mount session 2010-04-22 12:09:48 +02:00
coda coda: add bdi backing to mount session 2010-04-22 12:12:40 +02:00
configfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
cramfs
debugfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
devpts include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
dlm include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
ecryptfs ecryptfs: add bdi backing to mount session 2010-04-22 12:22:04 +02:00
efs
exofs exofs: Fix "add bdi backing to mount session" fall out 2010-04-29 20:35:29 +02:00
exportfs
ext2 ext2: symlink must be handled via filesystem specific operation 2010-04-12 21:11:25 +02:00
ext3 ext3: symlink must be handled via filesystem specific operation 2010-04-12 21:11:39 +02:00
ext4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2010-04-25 10:01:51 -07:00
fat Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
freevxfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fscache fs-cache: order the debugfs stats correctly 2010-04-07 08:38:05 -07:00
fuse include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
gfs2 include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
hfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
hfsplus include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
hostfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
hpfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
hppfs hppfs can use existing proc_mnt, no need for do_kern_mount() in there 2010-03-03 14:08:00 -05:00
hugetlbfs Untangling ima mess, part 1: alloc_file() 2009-12-16 12:16:47 -05:00
isofs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
jbd include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
jbd2 include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
jffs2 include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 2010-04-21 12:30:07 -07:00
lockd include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
logfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs 2010-04-21 12:31:12 -07:00
minix include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
ncpfs ncpfs: add bdi backing to mount session 2010-04-22 12:31:11 +02:00
nfs NFS: Fix RCU issues in the NFSv4 delegation code 2010-05-01 12:37:18 -04:00
nfs_common include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
nfsd nfsd4: bug in read_buf 2010-04-26 15:39:08 -04:00
nilfs2 nilfs2: fix sync silent failure 2010-05-03 07:36:01 -07:00
nls
notify Inotify: Fix build failure in inotify user support 2010-04-30 10:14:56 -07:00
ntfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
ocfs2 ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). 2010-05-03 19:15:49 -07:00
omfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
openpromfs
partitions include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
proc procfs: fix tid fdinfo 2010-04-27 16:26:03 -07:00
qnx4 fs/qnx4: decrement sizeof size in strncmp 2010-02-04 11:55:46 +01:00
quota quota: Convert __DQUOT_PARANOIA symbol to standard config option 2010-04-20 18:25:25 +02:00
ramfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
reiserfs reiserfs: fix corruption during shrinking of xattrs 2010-04-24 11:31:24 -07:00
romfs fix leak in romfs_fill_super() 2010-01-26 22:22:26 -05:00
smbfs smbfs: add bdi backing to mount session 2010-04-22 12:37:07 +02:00
squashfs squashfs: fix potential buffer over-run on 4K block file systems 2010-04-25 02:09:05 +01:00
sysfs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
sysv pass writeback_control to ->write_inode 2010-03-05 13:25:52 -05:00
ubifs include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
udf udf: add speciffic ->setattr callback 2010-04-08 15:35:20 +02:00
ufs ufs: make solaris fsck happy 2010-03-12 15:52:35 -08:00
xfs xfs: add a shrinker to background inode reclaim 2010-04-29 16:22:13 -05:00
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2010-03-19 09:43:06 -07:00
Kconfig.binfmt
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2010-03-19 09:43:06 -07:00
aio.c aio: remove unused field 2009-12-16 07:20:13 -08:00
anon_inodes.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
attr.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
bad_inode.c
binfmt_aout.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
binfmt_elf.c coredump: pass mm->flags as a coredump parameter for consistency 2010-03-06 11:26:46 -08:00
binfmt_elf_fdpic.c Remove redundant check for CONFIG_MMU 2010-04-27 09:01:26 -07:00
binfmt_em86.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
binfmt_flat.c uclinux: error message when FLAT reloc symbol is invalid, v2 2010-04-21 13:28:49 +10:00
binfmt_misc.c
binfmt_script.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
binfmt_som.c Split 'flush_old_exec' into two functions 2010-01-29 08:22:01 -08:00
bio-integrity.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
bio.c Merge branch 'master' into for-linus 2010-03-19 08:05:10 +01:00
block_dev.c fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices 2010-04-24 11:31:26 -07:00
buffer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-03-12 16:04:50 -08:00
char_dev.c
compat.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
compat_binfmt_elf.c elf coredump: replace ELF_CORE_EXTRA_* macros by functions 2010-03-06 11:26:45 -08:00
compat_ioctl.c pktcdvd: improve BKL and compat_ioctl.c usage 2010-04-29 08:44:37 -07:00
dcache.c fix race in d_splice_alias() 2010-03-03 14:13:08 -05:00
dcookies.c
direct-io.c dio: fix use-after-free 2009-12-17 04:52:13 -05:00
drop_caches.c
eventfd.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
eventpoll.c anonfd: Allow making anon files read-only 2009-12-22 12:27:34 -05:00
exec.c coredump: suppress uid comparison test if core output files are pipes 2010-03-06 11:26:46 -08:00
fcntl.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
fifo.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
file.c fs: use rlimit helpers 2010-03-06 11:26:29 -08:00
file_table.c vfs: take f_lock on modifying f_mode after open time 2010-03-06 11:26:25 -08:00
filesystems.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fs-writeback.c Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2010-04-09 11:50:29 -07:00
fs_struct.c
generic_acl.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
inode.c dquot: move dquot initialization responsibility into the filesystem 2010-03-05 00:20:30 +01:00
internal.h Take vfsmount_lock to fs/internal.h 2010-03-03 14:07:59 -05:00
ioctl.c Cleanup generic block based fiemap 2010-04-23 10:39:48 -07:00
ioprio.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
libfs.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
locks.c Merge branch 'for-next' into for-linus 2010-03-08 16:55:37 +01:00
mbcache.c
mpage.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
namei.c Restore LOOKUP_DIRECTORY hint handling in final lookup on open() 2010-03-26 12:41:05 -04:00
namespace.c vfs: add NOFOLLOW flag to umount(2) 2010-03-03 14:08:00 -05:00
nfsctl.c Switch may_open() and break_lease() to passing O_... 2010-03-03 13:00:21 -05:00
no-block.c
open.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pipe.c fs: no games with DCACHE_UNHASHED 2009-12-17 10:51:40 -05:00
pnode.c Kill CL_PROPAGATION, sanitize fs/pnode.c:get_source() 2010-03-03 13:00:22 -05:00
pnode.h VFS: Clean up shared mount flag propagation 2010-03-03 14:07:55 -05:00
posix_acl.c
read_write.c do_sync_read/write() should set kiocb.ki_nbytes to be consistent 2010-03-24 16:43:29 -07:00
read_write.h
readdir.c
select.c Add generic sys_old_select() 2010-03-12 15:52:32 -08:00
seq_file.c seq_file: fix new kernel-doc warnings 2010-03-07 15:48:26 -08:00
signalfd.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
splice.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
stack.c VFS/fsstack: handle 32-bit smp + preempt + large files in fsstack_copy_inode_size 2009-12-17 10:58:17 -05:00
stat.c Add unlocked version of inode_add_bytes() function 2009-12-23 13:33:54 +01:00
super.c fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK 2010-04-29 20:33:35 +02:00
sync.c Catch filesystems lacking s_bdi 2010-04-25 08:54:42 +02:00
timerfd.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
utimes.c
xattr.c sanitize xattr handler prototypes 2009-12-16 12:16:49 -05:00
xattr_acl.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00