linux/include
Al Viro 8f7b0ba1c8 Fix inotify watch removal/umount races
Inotify watch removals suck violently.

To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex.  That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount.  We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.

Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done.  Cleanup is just deactivate_super().

However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords.  That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.

We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e.  that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().

So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong.  We had
to drop ih->mutex before we could grab ->s_umount.  So the watch
could've been gone already.

That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer.  If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address.  That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that.  Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.

So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check.  If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.

And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 12:26:44 -08:00
..
acpi Merge branch 'ec' into release 2008-11-11 21:17:26 -05:00
asm-arm
asm-frv ide: fix support for IDE PCI controllers using MMIO on frv 2008-10-17 18:09:14 +02:00
asm-generic Fix __pfn_to_page(pfn) for CONFIG_DISCONTIGMEM=y 2008-11-08 10:02:48 -08:00
asm-h8300
asm-m32r
asm-m68k proc: move /proc/hardware to m68k-specific code 2008-10-23 14:24:03 +04:00
asm-mn10300
asm-x86 x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps 2008-10-31 10:12:38 +01:00
asm-xtensa Merge git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6 2008-10-23 09:16:56 -07:00
crypto
drm drm/i915: Filter pci devices based on PCI_CLASS_DISPLAY_VGA 2008-11-11 18:02:12 +10:00
keys
linux Fix inotify watch removal/umount races 2008-11-15 12:26:44 -08:00
math-emu math-emu: Fix thinko in _FP_DIV 2008-10-22 22:09:59 -07:00
media V4L/DVB (9335): videobuf: split unregister bus creating self-contained frontend de-allocator 2008-10-21 14:32:08 -02:00
mtd
net net: unix: fix inflight counting bug in garbage collector 2008-11-09 11:17:33 -08:00
pcmcia
rdma
rxrpc
scsi scsi: make sure that scsi_init_shared_tag_map() doesn't overwrite existing map 2008-10-27 19:25:30 +01:00
sound Merge branches 'topic/fix/misc' and 'topic/fix/hda' into for-linus 2008-11-10 17:58:46 +01:00
trace
video atmel_lcdfb: change irq_base definition to allow error reporting 2008-11-12 17:17:16 -08:00
xen
Kbuild