linux/include
Oleg Nesterov 55c888d6d0 [PATCH] timers fixes/improvements
This patch tries to solve following problems:

1. del_timer_sync() is racy. The timer can be fired again after
   del_timer_sync have checked all cpus and before it will recheck
   timer_pending().

2. It has scalability problems. All cpus are scanned to determine
   if the timer is running on that cpu.

   With this patch del_timer_sync is O(1) and no slower than plain
   del_timer(pending_timer), unless it has to actually wait for
   completion of the currently running timer.

   The only restriction is that the recurring timer should not use
   add_timer_on().

3. The timers are not serialized wrt to itself.

   If CPU_0 does mod_timer(jiffies+1) while the timer is currently
   running on CPU 1, it is quite possible that local interrupt on
   CPU_0 will start that timer before it finished on CPU_1.

4. The timers locking is suboptimal. __mod_timer() takes 3 locks
   at once and still requires wmb() in del_timer/run_timers.

   The new implementation takes 2 locks sequentially and does not
   need memory barriers.

Currently ->base != NULL means that the timer is pending. In that case
->base.lock is used to lock the timer. __mod_timer also takes timer->lock
because ->base can be == NULL.

This patch uses timer->entry.next != NULL as indication that the timer is
pending. So it does __list_del(), entry->next = NULL instead of list_del()
when the timer is deleted.

The ->base field is used for hashed locking only, it is initialized
in init_timer() which sets ->base = per_cpu(tvec_bases). When the
tvec_bases.lock is locked, it means that all timers which are tied
to this base via timer->base are locked, and the base itself is locked
too.

So __run_timers/migrate_timers can safely modify all timers which could
be found on ->tvX lists (pending timers).

When the timer's base is locked, and the timer removed from ->entry list
(which means that _run_timers/migrate_timers can't see this timer), it is
possible to set timer->base = NULL and drop the lock: the timer remains
locked.

This patch adds lock_timer_base() helper, which waits for ->base != NULL,
locks the ->base, and checks it is still the same.

__mod_timer() schedules the timer on the local CPU and changes it's base.
However, it does not lock both old and new bases at once. It locks the
timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
and adds this timer. This simplifies the code, because AB-BA deadlock is not
possible. __mod_timer() also ensures that the timer's base is not changed
while the timer's handler is running on the old base.

__run_timers(), del_timer() do not change ->base anymore, they only clear
pending flag.

So del_timer_sync() can test timer->base->running_timer == timer to detect
whether it is running or not.

We don't need timer_list->lock anymore, this patch kills it.

We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
before clearing timer's pending flag. It was needed because __mod_timer()
did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
could race with del_timer()->list_del(). With this patch these functions are
serialized through base->lock.

One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
adds global

        struct timer_base_s {
                spinlock_t lock;
                struct timer_list *running_timer;
        } __init_timer_base;

which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
struct are replaced by struct timer_base_s t_base.

It is indeed ugly. But this can't have scalability problems. The global
__init_timer_base.lock is used only when __mod_timer() is called for the first
time AND the timer was compile time initialized. After that the timer migrates
to the local CPU.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Renaud Lienhart <renaud.lienhart@free.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:16 -07:00
..
acpi Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
asm-alpha [PATCH] remove non-DISCONTIG use of pgdat->node_mem_map 2005-06-23 09:45:00 -07:00
asm-arm Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-06-22 14:51:06 -07:00
asm-arm26 [PATCH] Remove obsolete HAVE_ARCH_GET_SIGNAL_TO_DELIVER? 2005-06-12 20:43:21 -07:00
asm-cris [PATCH] asm/signal.h unification 2005-05-04 07:33:15 -07:00
asm-frv [PATCH] asm/signal.h unification 2005-05-04 07:33:15 -07:00
asm-generic [PATCH] x86/x86_64: pcibus_to_node 2005-06-23 09:45:08 -07:00
asm-h8300 [PATCH] h8300 build error fix 2005-06-06 14:42:23 -07:00
asm-i386 [PATCH] xen: x86: Rename usermode macro 2005-06-23 09:45:14 -07:00
asm-ia64 [PATCH] ia64: Selectable Timer Interrupt Frequency 2005-06-23 09:45:10 -07:00
asm-m32r [PATCH] m32r: build fix for asm-m32r/topology.h 2005-06-23 09:45:08 -07:00
asm-m68k [PATCH] asm/signal.h unification 2005-05-04 07:33:15 -07:00
asm-m68knommu [PATCH] asm/signal.h unification 2005-05-04 07:33:15 -07:00
asm-mips [PATCH] mips: add vr41xx gpio support 2005-06-21 18:46:32 -07:00
asm-parisc [PATCH] remove non-DISCONTIG use of pgdat->node_mem_map 2005-06-23 09:45:00 -07:00
asm-ppc [PATCH] ppc32: Clean up NUM_TLBCAMS usage for Freescale Book-E PPC's 2005-06-21 18:46:24 -07:00
asm-ppc64 [PATCH] ppc64: pcibus_to_node fix 2005-06-23 09:45:08 -07:00
asm-s390 [PATCH] smp_processor_id() cleanup 2005-06-21 18:46:13 -07:00
asm-sh [PATCH] Hugepage consolidation 2005-06-21 18:46:15 -07:00
asm-sh64 [PATCH] Hugepage consolidation 2005-06-21 18:46:15 -07:00
asm-sparc [PATCH] smp_processor_id() cleanup 2005-06-21 18:46:13 -07:00
asm-sparc64 [PATCH] Hugepage consolidation 2005-06-21 18:46:15 -07:00
asm-um [PATCH] smp_processor_id() cleanup 2005-06-21 18:46:13 -07:00
asm-v850 [PATCH] asm/signal.h unification 2005-05-04 07:33:15 -07:00
asm-x86_64 [PATCH] xen: x86_64: Add macro for debugreg 2005-06-23 09:45:14 -07:00
linux [PATCH] timers fixes/improvements 2005-06-23 09:45:16 -07:00
math-emu Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
media [PATCH] dvb: modified dvb_register_adapter() to avoid kmalloc/kfree 2005-05-17 07:59:33 -07:00
mtd Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
net [X25]: Fast select with no restriction on response 2005-06-22 22:16:17 -07:00
pcmcia Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
rxrpc Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
scsi Automatic merge of ../scsi-misc-2.6-old/ 2005-05-26 14:14:55 -04:00
sound [ALSA] Add const prefix 2005-06-22 12:28:54 +02:00
video [PATCH] Clean-up and bug fix for tdfxfb framebuffer size detection 2005-05-01 08:59:25 -07:00