Commit graph

5057 commits

Author SHA1 Message Date
Patrick McHardy
bf742482d7 [NET]: dev: introduce generic net_device address lists
Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:54 -07:00
Patrick McHardy
75ebe8f736 [NET]: dev_mcast: unexport dev_mc_upload
dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:53 -07:00
Stephen Hemminger
d212f87b06 [NET]: IPV6 checksum offloading in network devices
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
 * adds NETIF_F_IPV6_CSUM for those devices
 * fixes bnx2 and tg3 devices that need it
 * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
 * fixes assumptions about NETIF_F_ALL_CSUM in nat
 * adjusts bridge union of checksumming computation

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:52 -07:00
Masahide NAKAMURA
d3d6dd3ada [XFRM]: Add module alias for transformation type.
It is clean-up for XFRM type modules and adds aliases with its
protocol:
 ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
 ROUTING and DSTOPTS for MIPv6

It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:43 -07:00
Masahide NAKAMURA
59fbb3a61e [IPV6] MIP6: Loadable module support for MIPv6.
This patch makes MIPv6 loadable module named "mip6".

Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:

alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6

Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.

Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO

These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
  opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:42 -07:00
Masahide NAKAMURA
136ebf08b4 [IPV6] MIP6: Kill unnecessary ifdefs.
Kill unnecessary CONFIG_IPV6_MIP6.

o It is redundant for RAW socket to keep MH out with the config then
  it can handle any protocol.
o Clean-up at AH.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:41 -07:00
Patrick McHardy
2371baa4bd [RTNETLINK]: Fix rtnetlink compat attribute patch
Sent the wrong patch previously.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:40 -07:00
Patrick McHardy
afdc3238ec [RTNETLINK]: Add nested compat attribute
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
        [ compat contents ]
        struct rtattr {
                .rta_len        = total size,
                .rta_type       = type,
        } rta;
        struct old_structure struct;

        [ nested top-level attribute ]
        struct rtattr {
                .rta_len        = nest size,
                .rta_type       = type,
        } nest_attr;

        [ optional 0 .. n nested attributes ]
        struct rtattr {
                .rta_len        = private attribute len,
                .rta_type       = private attribute typ,
        } nested_attr;
        struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:39 -07:00
Patrick McHardy
1092cb2197 [NETLINK]: attr: add nested compat attribute type
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:38 -07:00
Patrick McHardy
334a8132d9 [SKBUFF]: Keep track of writable header len of headerless clones
Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.

The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.

I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.

Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:

- sendfile eth0, no NAT:	sys     0m0.388s
- sendfile eth0, NAT:		sys     0m1.835s
- sendfile eth0: NAT + path:	sys     0m0.370s	(~ -80%)

- sendfile lo, no NAT:		sys     0m0.258s
- sendfile lo, NAT:		sys     0m2.609s
- sendfile lo, NAT + patch:	sys     0m0.260s	(~ -90%)

- sendmsg eth0, no NAT:		sys     0m2.508s
- sendmsg eth0, NAT:		sys     0m2.539s
- sendmsg eth0, NAT + patch:	sys     0m2.445s	(no change)

- sendmsg lo, no NAT:		sys	0m2.151s
- sendmsg lo, NAT:		sys     0m3.557s
- sendmsg lo, NAT + patch:	sys     0m2.159s	(~ -40%)

I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:37 -07:00
Krishna Kumar
e50c41b53d [NET]: qdisc_restart - couple of optimizations.
Changes :

- netif_queue_stopped need not be called inside qdisc_restart as
  it has been called already in qdisc_run() before the first skb
  is sent, and in __qdisc_run() after each intermediate skb is
  sent (note : we are the only sender, so the queue cannot get
  stopped while the tx lock was got in the ~LLTX case).

- BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
  meant more packets are available, and __qdisc_run used to loop
  when qdisc_restart() returned -1. During those days, it was
  necessary to make sure that qlen is never less than zero, since
  __qdisc_run would get into an infinite loop if no packets are on
  the queue and this bug in qdisc was there (and worse - no more
  skbs could ever get queue'd as we hold the queue lock too). With
  Herbert's recent change to return values, this check is not
  required.  Hopefully Herbert can validate this change. If at all
  this is required, it should be added to skb_dequeue (in failure
  case), and not to qdisc_qlen.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:36 -07:00
Krishna Kumar
6c1361a6f2 [NET]: qdisc_restart - readability changes plus one bug fix.
New changes :

- Incorporated Peter Waskiewicz's comments.
- Re-added back one warning message (on driver returning wrong value).

Previous changes :

- Converted to use switch/case code which looks neater.

- "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless
  check should be removed, since driver will return NETDEV_TX_LOCKED only
  if lockless is true and driver has to do the locking. In the original
  code as well as the latest code, this code can result in a bug where
  if LLTX is not set for a driver (lockless == 0) but the driver is written
  wrongly to do a trylock (despite LLTX being set), the driver returns
  LOCKED. But since lockless is zero, the packet is requeue'd instead of
  calling collision code which will issue warning and free up the skb.
  Instead this skb will be retried with this driver next time, and the same
  result will ensue. Removing this check will catch these driver bugs instead
  of hiding the problem. I am keeping this change to readability section
  since :
  	a. it is confusing to check two things as it is; and
  	b. it is difficult to keep this check in the changed 'switch' code.

- Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is
  the work being done and easier to understand) and do_dev_requeue to
  dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to
  dev_handle_collision (handle_dev_cpu_collision is a small routine with only
  one caller, so there is no need to have two separate routines which also
  results in getting rid of two macros, etc.

- Removed an XXX comment as it should never fail (I suspect this was related
  to batch skb WIP, Jamal ?). Converted some functions to original coding
  style of having the return values and the function name on same line, eg
  prio2list.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:35 -07:00
Gerrit Renker
49d66a70cf [CCID3]: Fix a bug in the send time processing
ccid3_hc_tx_send_packet currently returns 0 when the time difference between
current time and t_nom is less than 1000 microseconds.

In this case the packet is sent immediately; but, unlike other packets that can
be emitted on first attempt, it will not have its window counter updated and
its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
algorithm then converges: time differences > 1000 microseconds trigger the
timer in dccp_write_xmit; after timer expiry this function is tried again; when
the time difference is less than 1000, the packet will have its options added
and window counter updated as required.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:34 -07:00
Gerrit Renker
8132da4d41 [CCID3]: Sending time: update to ktime_t
This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in
                ccid3hctx_no_feedback_timer

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:27 -07:00
Arnaldo Carvalho de Melo
dd36a9aba4 loss_interval: make struct dccp_li_hist_entry private
net/dccp/ccids/lib/loss_interval.c is the only place where this struct is used.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:24 -07:00
Arnaldo Carvalho de Melo
cc4d6a3a34 loss_interval: Nuke dccp_li_hist
It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:23 -07:00
Arnaldo Carvalho de Melo
c70b729e66 loss_interval: Make dccp_li_hist_entry_{new,delete} private
Not used outside the loss_interval code anymore.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:22 -07:00
Arnaldo Carvalho de Melo
8c281780c6 loss_interval: unexport dccp_li_hist_interval_new
Now its only used inside the loss_interval code.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:21 -07:00
Arnaldo Carvalho de Melo
cc0a910b94 [DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval
Renaming it to dccp_li_update_li.

Also based on previous work by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:20 -07:00
Arnaldo Carvalho de Melo
878ac60023 [CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li
Now ccid3_hc_rx_update_li is ready to be moved to
net/dccp/ccids/lib/loss_interval, it uses the same interface as the other
functions there.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:19 -07:00
Arnaldo Carvalho de Melo
d83258a3da Remove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li
This is a preparatory patch for moving these loss interval functions from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c.

Based on a patch by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:18 -07:00
Ian McDonald
6bc7efe8ef loss_interval: Fix timeval initialisation
When compiling with EXTRA_CFLAGS=-W noticed that tstamp is not initialised
correctly in dccp_li_calc_first_li.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2007-07-10 22:15:06 -07:00
Ian McDonald
e961811fcd Fix dccp_sum_coverage
When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned issue
in dccp.h.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
2007-07-10 22:15:05 -07:00
Ian McDonald
b2f41ff413 ccid3: Update copyrights
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:04 -07:00
Patrick McHardy
07b5b17e15 [VLAN]: Use rtnl_link API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:03 -07:00
Patrick McHardy
a4bf3af4ac [VLAN]: Introduce symbolic constants for flag values
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:02 -07:00
Patrick McHardy
b020cb4885 [VLAN]: Keep track of number of QoS mappings
Keep track of the number of configured ingress/egress QoS mappings to
avoid iteration while calculating the netlink attribute size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:01 -07:00
Patrick McHardy
734423cf38 [VLAN]: Use 32 bit value for skb->priority mapping
skb->priority has only 32 bits and even VLAN uses 32 bit values in its API.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:00 -07:00
Patrick McHardy
2ae0bf69b7 [VLAN]: Return proper error codes in register_vlan_device
The returned device is unused, return proper error codes instead and avoid
having the ioctl handler guess the error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:59 -07:00
Patrick McHardy
e89fe42cd0 [VLAN]: Move device registation to seperate function
Move device registration and configuration of the underlying device to a
seperate function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:58 -07:00
Patrick McHardy
c1d3ee9925 [VLAN]: Split up device checks
Move the checks of the underlying device to a seperate function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:57 -07:00
Patrick McHardy
42429aaee5 [VLAN]: Move vlan_group allocation to seperate function
Move group allocation to a seperate function to clean up the code a bit
and allocate groups before registering the device. Device registration
is globally visible and causes netlink events, so we shouldn't fail
afterwards.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:40 -07:00
Patrick McHardy
2f4284a406 [VLAN]: Move some device intialization code to dev->init callback
Move some device initialization code to new dev->init callback to make
it shareable with netlink. Additionally this fixes a minor bug, dev->iflink
is set after registration, which causes an incorrect value in the initial
netlink message.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:39 -07:00
Patrick McHardy
c17d8874f9 [VLAN]: Convert name-based configuration functions to struct netdevice *
Move the device lookup and checks to the ioctl handler under the RTNL and
change all name-based interfaces to take a struct net_device * instead.

This allows to use them from a netlink interface, which identifies devices
based on ifindex not name. It also avoids races between the ioctl interface
and the (upcoming) netlink interface since now all changes happen under the
RTNL.

As a nice side effect this greatly simplifies error handling in the helper
functions and fixes a number of incorrect error codes like -EINVAL for
device not found.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:38 -07:00
Patrick McHardy
38f7b870d4 [RTNETLINK]: Link creation API
Add rtnetlink API for creating, changing and deleting software devices.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:20 -07:00
Patrick McHardy
0157f60c0c [RTNETLINK]: Split up rtnl_setlink
Split up rtnl_setlink into a function performing validation and a function
performing the actual changes. This allows to share the modifcation logic
with rtnl_newlink, which is introduced by the next patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:16 -07:00
Larry Finger
b3d88ad49a [MAC80211]: Add support for SIOCGIWRATE ioctl
At present, transmission rate information for mac80211 is available only
if verbose debugging is turned on, and then only in the logs. This patch
implements the SIOCGIWRATE ioctl, which adds the current transmission rate to
the output of iwconfig.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:14:07 -07:00
Herbert Xu
a7ab4b501f [TCPv4]: Improve BH latency in /proc/net/tcp
Currently the code for /proc/net/tcp disable BH while iterating
over the entire established hash table.  Even though we call
cond_resched_softirq for each entry, we still won't process
softirq's as regularly as we would otherwise do which results
in poor performance when the system is loaded near capacity.

This anomaly comes from the 2.4 code where this was all in a
single function and the local_bh_disable might have made sense
as a small optimisation.

The cost of each local_bh_disable is so small when compared
against the increased latency in keeping it disabled over a
large but mostly empty TCP established hash table that we
should just move it to the individual read_lock/read_unlock
calls as we do in inet_diag.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:20 -07:00
Jamal Hadi Salim
c716a81ab9 [NET_SCHED]: Cleanup readability of qdisc restart
Over the years this code has gotten hairier. Resulting in many long
discussions over long summer days and patches that get it wrong.
This patch helps tame that code so normal people will understand it.

Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy
for their valuable reviews.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:16 -07:00
Allan Stephens
05646c9110 [TIPC]: Optimize stream send routine to avoid fragmentation
This patch enhances TIPC's stream socket send routine so that
it avoids transmitting data in chunks that require fragmentation
and reassembly, thereby improving performance at both the
sending and receiving ends of the connection.

The "maximum packet size" hint that records MTU info allows
the socket to decide how big a chunk it should send; in the
event that the hint has become stale, fragmentation may still
occur, but the data will be passed correctly and the hint will
be updated in time for the following send.  Note: The 66060 byte
pseudo-MTU used for intra-node connections requires the send
routine to perform an additional check to ensure it does not
exceed TIPC"s limit of 66000 bytes of user data per chunk.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:12 -07:00
Allan Stephens
5eee6a6dc9 [TIPC]: Use standard socket "not implemented" routines
This patch modifies TIPC's socket API to utilize existing
generic routines to indicate unsupported operations, rather
than adding similar TIPC-specific routines.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:09 -07:00
Allan Stephens
f3ec75f627 [TIPC]: Improved support for Ethernet traffic filtering
This patch simplifies TIPC's Ethernet receive routine to take
advantage of information already present in each incoming sk_buff
indicating whether the packet was explicitly sent to the interface,
has been broadcast to all interfaces, or was picked up because the
interface is in promiscous mode.

This new approach also fixes the problem of TIPC accepting unwanted
traffic through UML's multicast-based Ethernet interfaces (which
deliver traffic in a promiscuous manner even if the interface is
not configured to be promiscuous).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:02 -07:00
David S. Miller
e06e7c6158 [IPV4]: The scheduled removal of multipath cached routing support.
With help from Chris Wedgwood.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:05:57 -07:00
Linus Torvalds
6ed911fb04 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (40 commits)
  bonding/bond_main.c: make 2 functions static
  ps3: gigabit ethernet driver for PS3, take3
  [netdrvr] Fix dependencies for ax88796 ne2k clone driver
  eHEA: Capability flag for DLPAR support
  Remove sk98lin ethernet driver.
  sunhme.c:quattro_pci_find() must be __devinit
  bonding / ipv6: no addrconf for slaves separately from master
  atl1: remove write-only var in tx handler
  macmace: use "unsigned long flags;"
  Cleanup usbnet_probe() return value handling
  netxen: deinline and sparse fix
  eeprom_93cx6: shorten pulse timing to match spec (bis)
  phylib: Add Marvell 88E1112 phy id
  phylib: cleanup marvell.c a bit
  AX88796 network driver
  IOC3: Switch to pci refcounting safe APIs
  e100: Fix Tyan motherboard e100 not receiving IPMI commands
  QE Ethernet driver writes to wrong register to mask interrupts
  rrunner.c:rr_init() must be __devinit
  tokenring/3c359.c:xl_init() must be __devinit
  ...
2007-07-10 14:56:22 -07:00
Jay Vosburgh
c2edacf80e bonding / ipv6: no addrconf for slaves separately from master
At present, when a device is enslaved to bonding, if ipv6 is
active then addrconf will be initated on the slave (because it is closed
then opened during the enslavement processing).  This causes DAD and RS
packets to be sent from the slave.  These packets in turn can confuse
switches that perform ipv6 snooping, causing them to incorrectly update
their forwarding tables (if, e.g., the slave being added is an inactve
backup that won't be used right away) and direct traffic away from the
active slave to a backup slave (where the incoming packets will be
dropped).

	This patch alters the behavior so that addrconf will only run on
the master device itself.  I believe this is logically correct, as it
prevents slaves from having an IPv6 identity independent from the
master.  This is consistent with the IPv4 behavior for bonding.

	This is accomplished by (a) having bonding set IFF_SLAVE sooner
in the enslavement processing than currently occurs (before open, not
after), and (b) having ipv6 addrconf ignore UP and CHANGE events on
slave devices.

	The eql driver also uses the IFF_SLAVE flag.  I inspected eql,
and I believe this change is reasonable for its usage of IFF_SLAVE, but
I did not test it.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-10 12:41:19 -04:00
Jens Axboe
cf8208d0ea sendfile: convert nfsd to splice_direct_to_actor()
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Akinobu Mita
67c4f7aa9e [PATCH] softmac: use list_for_each_entry
Cleanup using list_for_each_entry.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Joe Jezak <josejx@gentoo.org>
Cc: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-07-08 22:16:37 -04:00
David Woodhouse
1c39858b5d Fix use-after-free oops in Bluetooth HID.
When cleaning up HIDP sessions, we currently close the ACL connection
before deregistering the input device. Closing the ACL connection
schedules a workqueue to remove the associated objects from sysfs, but
the input device still refers to them -- and if the workqueue happens to
run before the input device removal, the kernel will oops when trying to
look up PHYSDEVPATH for the removed input device.

Fix this by deregistering the input device before closing the
connections.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-07 12:22:37 -07:00
Jarek Poplawski
25442cafb8 [NETPOLL]: Fixups for 'fix soft lockup when removing module'
>From my recent patch:

> >    #1
> >    Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
> >    required a work function should always (unconditionally) rearm with
> >    delay > 0 - otherwise it would endlessly loop. This patch replaces
> >    this function with cancel_delayed_work(). Later kernel versions don't
> >    require this, so here it's only for uniformity.

But Oleg Nesterov <oleg@tv-sign.ru> found:

> But 2.6.22 doesn't need this change, why it was merged?
> 
> In fact, I suspect this change adds a race,
...

His description was right (thanks), so this patch reverts #1.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-05 17:42:44 -07:00
Adrian Bunk
94b83419e5 [NET]: net/core/netevent.c should #include <net/netevent.h>
Every file should include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-05 17:40:27 -07:00