linux/include
Nick Piggin 9d55c369bb fs: implement faster dentry memcmp
The standard memcmp function on a Westmere system shows up hot in
profiles in the `git diff` workload (both parallel and single threaded),
and it is likely due to the costs associated with trapping into
microcode, and little opportunity to improve memory access (dentry
name is not likely to take up more than a cacheline).

So replace it with an open-coded byte comparison. This increases code
size by 8 bytes in the critical __d_lookup_rcu function, but the
speedup is huge, averaging 10 runs of each:

git diff st   user   sys   elapsed  CPU
before        1.15   2.57  3.82      97.1
after         1.14   2.35  3.61      96.8

git diff mt   user   sys   elapsed  CPU
before        1.27   3.85  1.46     349
after         1.26   3.54  1.43     333

Elapsed time for single threaded git diff at 95.0% confidence:
        -0.21  +/- 0.01
        -5.45% +/- 0.24%

It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the
fam10h seem to be relatively smaller, but there is still a win.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:32 +11:00
..
acpi ACPI: video: fix build for CONFIG_ACPI=n 2010-12-11 02:01:46 -05:00
asm-generic asm-generic/stat.h: support 64-bit file time_t for stat() 2010-11-01 15:31:29 -04:00
crypto
drm drm/i915: announce to userspace that the bsd ring is coherent 2010-12-05 10:40:39 +00:00
keys
linux fs: implement faster dentry memcmp 2011-01-07 17:50:32 +11:00
math-emu
media [media] wm8775: Revert changeset fcb9757333 to avoid a regression 2011-01-03 09:09:56 -02:00
mtd
net Revert "ipv4: Allow configuring subnets as local addresses" 2010-12-23 12:03:57 -08:00
pcmcia
rdma IB/core: Add VLAN support for IBoE 2010-10-25 10:20:39 -07:00
rxrpc
scsi SCSI host lock push-down 2010-11-16 13:33:23 -08:00
sound ARM: mach-shmobile: ap4evb: FSI clock use proper process for HDMI 2010-11-24 15:29:56 +09:00
trace ext4: Add new ext4 inode tracepoints 2010-11-08 13:51:33 -05:00
video fbdev: da8xx: punt duplicated FBIO_WAITFORVSYNC define 2010-11-16 10:14:22 +09:00
xen xen: Provide a variant of __RING_SIZE() that is an integer constant expression 2010-12-15 12:34:28 -08:00
Kbuild