Commit Graph

41623 Commits (8571a19c4ac140f1a507f3e7eb716892afa27109)

Author SHA1 Message Date
David S. Miller 8571a19c4a Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-01-27 16:00:37 -08:00
Eric Dumazet ccf434380d net: fix dev_seq_next()
Commit c6d14c8456 (net: Introduce for_each_netdev_rcu() iterator)
added a race in dev_seq_next().

The rcu_dereference() call should be done _before_ testing the end of
list, or we might return a wrong net_device if a concurrent thread
changes net_device list under us.

Note : discovered thanks to a sparse warning :

net/core/dev.c:3919:9: error: incompatible types in comparison expression
(different address spaces)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 15:02:56 -08:00
David S. Miller 144001bddc inetpeer: Mark metrics as "new" in fresh inetpeer entries.
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 13:52:16 -08:00
David S. Miller 606598237c inetpeer: Add metrics storage to inetpeer entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 13:48:26 -08:00
David S. Miller 62fa8a846d net: Implement read-only protection and COW'ing of metrics.
Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 20:51:05 -08:00
David S. Miller b4e69ac670 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-26 13:49:30 -08:00
David S. Miller 9b6941d8b1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-26 11:49:49 -08:00
Michał Mirosław acd1130e87 net: reduce and unify printk level in netdev_fix_features()
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:45:15 -08:00
Michał Mirosław 04ed3e741d net: change netdev->features to u32
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:32:47 -08:00
Ben Hutchings c445477d74 net: RPS: Enable hardware acceleration of RFS
Allow drivers for multiqueue hardware with flow filter tables to
accelerate RFS.  The driver must:

1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion
IRQs (in queue order).  This will provide a mapping from CPUs to the
queues for which completions are handled nearest to them.

2. Implement net_device_ops::ndo_rx_flow_steer.  This operation adds
or replaces a filter steering the given flow to the given RX queue, if
possible.

3. Periodically remove filters for which rps_may_expire_flow() returns
true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:53:01 -08:00
Ben Hutchings c39649c331 lib: cpu_rmap: CPU affinity reverse-mapping
When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add
library functions to support a generic reverse-mapping from CPUs to
objects with affinity and the specific case where the objects are
IRQs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:51:56 -08:00
David S. Miller c2df88cbb4 Merge branch 'irq/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-01-24 14:30:13 -08:00
David S. Miller 5bdc22a565 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/sched/sch_hfsc.c
	net/sched/sch_htb.c
	net/sched/sch_tbf.c
2011-01-24 14:09:35 -08:00
David S. Miller e92427b289 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2011-01-24 13:17:06 -08:00
Linus Torvalds ce84d539ce Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  RTC: Remove Kconfig symbol for UIE emulation
  RTC: Properly handle rtc_read_alarm error propagation and fix bug
  RTC: Propagate error handling via rtc_timer_enqueue properly
  acpi_pm: Clear pmtmr_ioport if acpi_pm initialization fails
  rtc: Cleanup removed UIE emulation declaration
  hrtimers: Notify hrtimer users of switches to NOHZ mode
2011-01-25 05:25:55 +10:00
Linus Torvalds d315777b32 Merge branch 'BUG_ON' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'BUG_ON' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  Remove MAYBE_BUILD_BUG_ON
  BUILD_BUG_ON: make it handle more cases
2011-01-24 19:58:39 +10:00
Rusty Russell 1765e3a493 Remove MAYBE_BUILD_BUG_ON
Now BUILD_BUG_ON() can handle optimizable constants, we don't need
MAYBE_BUILD_BUG_ON any more.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:45:11 +10:30
Rusty Russell 7ef88ad561 BUILD_BUG_ON: make it handle more cases
BUILD_BUG_ON used to use the optimizer to do code elimination or fail
at link time; it was changed to first the size of a negative array (a
nicer compile time error), then (in
8c87df457c) to a bitfield.

This forced us to change some non-constant cases to MAYBE_BUILD_BUG_ON();
as Jan points out in that commit, it didn't work as intended anyway.

bitfields: needs a literal constant at parse time, and can't be put under
	"if (__builtin_constant_p(x))" for example.
negative array: can handle anything, but if the compiler can't tell it's
	a constant, silently has no effect.
link time: breaks link if the compiler can't determine the value, but the
	linker output is not usually as informative as a compiler error.

If we use the negative-array-size method *and* the link time trick,
we get the ability to use BUILD_BUG_ON() under __builtin_constant_p()
branches, and maximal ability for the compiler to detect errors at
build time.

We also document it thoroughly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jan Beulich <JBeulich@novell.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
2011-01-24 14:45:10 +10:30
Linus Walleij b75be4204e param: add null statement to compiled-in module params
Add an unused struct declaration statement requiring a
terminating semicolon to the compile-in case to provoke an
error if __MODULE_INFO() is used without the terminating
semicolon. Previously MODULE_ALIAS("foo") (no semicolon)
compiled fine if MODULE was not selected.

Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:32:52 +10:30
Rusty Russell 3b90a5b292 module: fix linker error for MODULE_VERSION when !MODULE and CONFIG_SYSFS=n
lib/built-in.o:(__modver+0x8): undefined reference to `__modver_version_show'
lib/built-in.o:(__modver+0x2c): undefined reference to `__modver_version_show'

Simplest to just not emit anything: if they've disabled SYSFS they probably
want the smallest kernel possible.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:32:52 +10:30
Dmitry Torokhov e94965ed5b module: show version information for built-in modules in sysfs
Currently only drivers that are built as modules have their versions
shown in /sys/module/<module_name>/version, but this information might
also be useful for built-in drivers as well. This especially important
for drivers that do not define any parameters - such drivers, if
built-in, are completely invisible from userspace.

This patch changes MODULE_VERSION() macro so that in case when we are
compiling built-in module, version information is stored in a separate
section. Kernel then uses this data to create 'version' sysfs attribute
in the same fashion it creates attributes for module parameters.

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:32:51 +10:30
Ben Hutchings cd7eab44e9 genirq: Add IRQ affinity notifiers
When initiating I/O on a multiqueue and multi-IRQ device, we may want
to select a queue for which the response will be handled on the same
or a nearby CPU.  This requires a reverse-map of IRQ affinity.  Add a
notification mechanism to support this.

This is based closely on work by Thomas Gleixner <tglx@linutronix.de>.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: linux-net-drivers@solarflare.com
Cc: Tom Herbert <therbert@google.com>
Cc: David Miller <davem@davemloft.net>
LKML-Reference: <1295470904.11126.84.camel@bwh-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-22 17:34:25 +01:00
Linus Torvalds 13a3cec844 Merge branch 'media_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'media_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (101 commits)
  [media] staging/lirc: fix mem leaks and ptr err usage
  [media] hdpvr: reduce latency of i2c read/write w/recycled buffer
  [media] hdpvr: enable IR part
  [media] rc/mceusb: timeout should be in ns, not us
  [media] v4l2-device: fix 'use-after-freed' oops
  [media] v4l2-dev: don't memset video_device.dev
  [media] zoran: use video_device_alloc instead of kmalloc
  [media] w9966: zero device state after a detach
  [media] v4l: Fix a use-before-set in the control framework
  [media] v4l: Include linux/videodev2.h in media/v4l2-ctrls.h
  [media] DocBook/v4l: update V4L2 revision and update copyright years
  [media] DocBook/v4l: fix validation error in dev-rds.xml
  [media] v4l2-ctrls: queryctrl shouldn't attempt to replace V4L2_CID_PRIVATE_BASE IDs
  [media] v4l2-ctrls: fix missing 'read-only' check
  [media] pvrusb2: Provide more information about IR units to lirc_zilog and ir-kbd-i2c
  [media] ir-kbd-i2c: Add back defaults setting for Zilog Z8's at addr 0x71
  [media] lirc_zilog: Update TODO.lirc_zilog
  [media] lirc_zilog: Add Andy Walls to copyright notice and authors list
  [media] lirc_zilog: Remove useless struct i2c_driver.command function
  [media] lirc_zilog: Remove unneeded tests for existence of the IR Tx function
  ...
2011-01-21 16:50:31 -08:00
Linus Torvalds 5bf7a6503f Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: note the nested NOT_RUNNING test in worker_clr_flags() isn't a noop
  workqueue: relax lockdep annotation on flush_work()
2011-01-21 13:38:57 -08:00
Linus Torvalds 0f5c2ac58f Merge branch 'irq-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
  um: Use generic irq Kconfig
  tile: Use generic irq Kconfig
  sparc: Use generic irq Kconfig
  score: Use generic irq Kconfig
  powerpc: Use generic irq Kconfig
  parisc: Use generic irq Kconfig
  mn10300: Use generic irq Kconfig
  microblaze: Use generic irq Kconfig
  m68knommu: Use generic irq Kconfig
  ia64: Use generic irq Kconfig
  frv: Use generic irq Kconfig
  blackfin: Use generic irq Kconfig
  alpha: Use generic irq Kconfig
  genirq: Remove __do_IRQ
  m32r: Convert to generic irq Kconfig
  m32r: Convert usrv platform irq handling
  m32r: Convert opsput_lcdpld irq chip
  m32r: Convert opsput lanpld irq chip
  m32r: Convert opsput pld irq chip
  m32r: Convert opsput irq chip
  ...
2011-01-21 13:38:26 -08:00
Bruno Randolf 59eb21a650 cfg80211: Extend channel to frequency mapping for 802.11j
Extend channel to frequency mapping for 802.11j Japan 4.9GHz band, according to
IEEE802.11 section 17.3.8.3.2 and Annex J. Because there are now overlapping
channel numbers in the 2GHz and 5GHz band we can't map from channel to
frequency without knowing the band. This is no problem as in most contexts we
know the band. In places where we don't know the band (and WEXT compatibility)
we assume the 2GHz band for channels below 14.

This patch does not implement all channel to frequency mappings defined in
802.11, it's just an extension for 802.11j 20MHz channels. 5MHz and 10MHz
channels as well as 802.11y channels have been omitted.

The following drivers have been updated to reflect the API changes:
iwl-3945, iwl-agn, iwmc3200wifi, libertas, mwl8k, rt2x00, wl1251, wl12xx.
The drivers have been compile-tested only.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: Brian Prodoehl <bprodoehl@gmail.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-21 15:34:17 -05:00
Michal Simek 3dece370ec mm: System without MMU do not need pte_mkwrite
The patch "thp: export maybe_mkwrite" (commit 14fd403f21) breaks
systems without MMU.

Error log:

    CC      arch/microblaze/mm/init.o
  In file included from include/linux/mman.h:14,
                   from arch/microblaze/mm/consistent.c:24:
  include/linux/mm.h: In function 'maybe_mkwrite':
  include/linux/mm.h:482: error: implicit declaration of function 'pte_mkwrite'
  include/linux/mm.h:482: error: incompatible types in assignment

Signed-off-by: Michal Simek <monstr@monstr.eu>
CC: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-21 08:40:30 -08:00
John Stultz aa0be0f465 RTC: Propagate error handling via rtc_timer_enqueue properly
In cases where RTC hardware does not support alarms, the virtualized
RTC interfaces did not have a way to propagate the error up to userland.

This patch extends rtc_timer_enqueue so it catches errors from the hardware
and returns them upwards to the virtualized interfaces. To simplify error
handling, it also internalizes the management of the timer->enabled bit
into rtc_timer_enqueue and rtc_timer_remove.

Also makes rtc_timer_enqueue and rtc_timer_remove static.

Reported-by: David Daney <ddaney@caviumnetworks.com>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Diagnosed-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
LKML-Reference: <1295565973-14358-1-git-send-email-john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-21 17:38:19 +01:00
John Stultz 1daeddd596 rtc: Cleanup removed UIE emulation declaration
rtc_dev_update_irq_enable_emul was removed in commit
042620a018 (UIE emulation is
now handled via hrtimer), but the declaration was missed.
This patch cleans it up.

Signed-off-by: John Stultz <john.stultz@linaro.org>
LKML-Reference: <1294939849-20608-1-git-send-email-john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-21 17:38:18 +01:00
Linus Torvalds 4843456c5c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Fix deadlock during path resolution
2011-01-21 07:33:37 -08:00
Thomas Gleixner 1c77ff22f5 genirq: Remove __do_IRQ
All architectures are finally converted. Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
2011-01-21 11:55:31 +01:00
Eric Dumazet 9190b3b320 net_sched: accurate bytes/packets stats/rates
In commit 44b8288308 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 23:31:33 -08:00
David S. Miller 686a295553 net: Add safe reverse SKB queue walkers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 22:47:32 -08:00
Linus Torvalds 2b1caf6ed7 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
  lockdep: Move early boot local IRQ enable/disable status to init/main.c
2011-01-20 18:30:37 -08:00
Rafael J. Wysocki 2d6d9fd3a5 ACPI: Introduce acpi_os_ioremap()
Commit ca9b600be3 ("ACPI / PM: Make suspend_nvs_save() use
acpi_os_map_memory()") attempted to prevent the code in osl.c and nvs.c
from using different ioremap() variants by making the latter use
acpi_os_map_memory() for mapping the NVS pages.  However, that also
requires acpi_os_unmap_memory() to be used for unmapping them, which
causes synchronize_rcu() to be executed many times in a row
unnecessarily and introduces substantial delays during resume on some
systems.

Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c
introduce acpi_os_ioremap() calling ioremap_cache() and make the code in
both osl.c and nvs.c use it.

Reported-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 18:30:17 -08:00
Linus Torvalds 8d99641f6c Merge branch 'akpm'
* akpm:
  kernel/smp.c: consolidate writes in smp_call_function_interrupt()
  kernel/smp.c: fix smp_call_function_many() SMP race
  memcg: correctly order reading PCG_USED and pc->mem_cgroup
  backlight: fix 88pm860x_bl macro collision
  drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
  MAINTAINERS: update Atmel AT91 entry
  mm: fix truncate_setsize() comment
  memcg: fix rmdir, force_empty with THP
  memcg: fix LRU accounting with THP
  memcg: fix USED bit handling at uncharge in THP
  memcg: modify accounting function for supporting THP better
  fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
  mm: compaction: prevent division-by-zero during user-requested compaction
  mm/vmscan.c: remove duplicate include of compaction.h
  memblock: fix memblock_is_region_memory()
  thp: keep highpte mapped until it is no longer needed
  kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
2011-01-20 17:02:14 -08:00
KAMEZAWA Hiroyuki ca3e021417 memcg: fix USED bit handling at uncharge in THP
Now, under THP:

at charge:
  - PageCgroupUsed bit is set to all page_cgroup on a hugepage.
    ....set to 512 pages.
at uncharge
  - PageCgroupUsed bit is unset on the head page.

So, some pages will remain with "Used" bit.

This patch fixes that Used bit is set only to the head page.
Used bits for tail pages will be set at splitting if necessary.

This patch adds this lock order:
   compound_lock() -> page_cgroup_move_lock().

[akpm@linux-foundation.org: fix warning]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:06 -08:00
Shan Wei d18046b3cd dccp: clean up unused DCCP_STATE_MASK definition
Remove unused DCCP_STATE_MASK macro.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 17:01:09 -08:00
Eric Dumazet a2da570d62 net_sched: RCU conversion of stab
This patch converts stab qdisc management to RCU, so that we can perform
the qdisc_calculate_pkt_len() call before getting qdisc lock.

This shortens the lock's held time in __dev_xmit_skb().

This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
cache misses and so reducing latencies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Eric Dumazet fd245a4adb net_sched: move TCQ_F_THROTTLED flag
In commit 3711210576 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Linus Torvalds e589501cb9 Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPICA: Update version to 20110112
  ACPICA: Update all ACPICA copyrights and signons to 2011
  ACPICA: Fix issues/fault with automatic "serialized" method support
  ACPICA: Debugger: Lock namespace for duration of a namespace dump
  ACPICA: Fix namespace race condition
  ACPICA: Fix memory leak in acpi_ev_asynch_execute_gpe_method().
2011-01-20 16:28:34 -08:00
Patrick McHardy 2f1e317672 netfilter: nf_conntrack: fix linker error with NF_CONNTRACK_TIMESTAMP=n
net/built-in.o: In function `nf_conntrack_init_net':
net/netfilter/nf_conntrack_core.c:1521:
	undefined reference to `nf_conntrack_tstamp_init'
net/netfilter/nf_conntrack_core.c:1531:
	undefined reference to `nf_conntrack_tstamp_fini'

Add dummy inline functions for the =n case to fix this.

Reported-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 20:46:52 +01:00
Jan Engelhardt 06988b0693 netfilter: xtables: add missing header inclusions for headers_check
Resolve these warnings on `make headers_check`:

usr/include/linux/netfilter/xt_CT.h:7: found __[us]{8,16,32,64} type
without #include <linux/types.h>
...

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-20 17:50:17 +01:00
Patrick McHardy 4cda47d2e7 Merge branch 'connlimit' of git://dev.medozas.de/linux 2011-01-20 14:17:32 +01:00
Jan Engelhardt ba12b130a6 netfilter: xtables: remove duplicate member
Accidentally missed removing the old out-of-union "inverse" member,
which caused the struct size to change which then gives size mismatch
warnings when using an old iptables.

It is interesting to see that gcc did not warn about this before.
(Filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47376 )

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-20 14:01:12 +01:00
Tejun Heo 2ce802f62b lockdep: Move early boot local IRQ enable/disable status to init/main.c
During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized.  During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.

lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available.  While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-20 13:32:33 +01:00
Patrick McHardy 82d800d8e7 Merge branch 'connlimit' of git://dev.medozas.de/linux
Conflicts:
	Documentation/feature-removal-schedule.txt

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 10:33:55 +01:00
David S. Miller a07aa004c8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-01-20 00:06:15 -08:00
Jan Engelhardt 5d84492864 netfilter: xtables: remove extraneous header that slipped in
Commit 0b8ad87 (netfilter: xtables: add missing header files to export
list) erroneously added this.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 08:48:15 +01:00
John Fastabend b8970f0bfc net_sched: implement a root container qdisc sch_mqprio
This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.

Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mqprio_qopt {
	__u8    num_tc;
	__u8    prio_tc_map[TC_BITMASK + 1];
	__u8    hw;
	__u16   count[TC_MAX_QUEUE];
	__u16   offset[TC_MAX_QUEUE];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.

The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:11 -08:00