linux/include
Nick Piggin e8c82c2e23 mm lockless pagecache barrier fix
An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.

This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.

Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.

Assembly of find_get_pages,
before:
.L220:
        movq    (%rbx), %rax    #* ivtmp.1162, tmp82
        movq    (%rax), %rdi    #, prephitmp.1149
.L218:
        testb   $1, %dil        #, prephitmp.1149
        jne     .L217   #,
        testq   %rdi, %rdi      # prephitmp.1149
        je      .L203   #,
        cmpq    $-1, %rdi       #, prephitmp.1149
        je      .L217   #,
        movl    8(%rdi), %esi   # <variable>._count.counter, c
        testl   %esi, %esi      # c
        je      .L218   #,

after:
.L212:
        movq    (%rbx), %rax    #* ivtmp.1109, tmp81
        movq    (%rax), %rdi    #, ret
        testb   $1, %dil        #, ret
        jne     .L211   #,
        testq   %rdi, %rdi      # ret
        je      .L197   #,
        cmpq    $-1, %rdi       #, ret
        je      .L211   #,
        movl    8(%rdi), %esi   # <variable>._count.counter, c
        testl   %esi, %esi      # c
        je      .L212   #,

(notice the obvious infinite loop in the first example, if page->count remains 0)

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05 18:31:12 -08:00
..
acpi cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t 2009-01-03 19:15:40 +01:00
asm-arm
asm-frv frv: define __fls 2009-01-03 16:14:05 +10:30
asm-generic Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-02 11:44:09 -08:00
asm-h8300
asm-m32r m32r: define __fls 2009-01-03 16:16:54 +10:30
asm-m68k m68k: define __fls 2009-01-01 10:12:18 +10:30
asm-mn10300 mn10300: define __fls 2009-01-03 16:19:03 +10:30
asm-xtensa xtensa: define __fls 2009-01-03 16:21:08 +10:30
crypto crypto: aes - Precompute tables 2008-12-25 11:05:13 +11:00
drm drm: Add a debug node for vblank state. 2008-12-29 17:47:27 +10:00
keys
linux mm lockless pagecache barrier fix 2009-01-05 18:31:12 -08:00
math-emu
media V4L/DVB (10141): v4l2: debugging API changed to match against driver name instead of ID. 2009-01-02 17:11:52 -02:00
mtd
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2008-12-28 12:49:40 -08:00
pcmcia
rdma
rxrpc
scsi [SCSI] fcoe: Fibre Channel over Ethernet 2008-12-29 11:24:33 -06:00
sound V4L/DVB (10135): v4l2: introduce v4l2_file_operations. 2009-01-02 17:11:12 -02:00
trace sched, trace: update trace_sched_wakeup() 2008-12-25 13:10:21 +01:00
video video: sh_mobile_lcdcfb deferred io support 2008-12-22 18:44:48 +09:00
xen xen: clean up asm/xen/hypervisor.h 2008-12-16 21:50:31 +01:00
Kbuild