Commit Graph

44 Commits (c01cd429fc118c5db92475c5f08b307718aa4efc)

Author SHA1 Message Date
Patrick McHardy c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy f9d8928f83 [NETFILTER]: nf_queue: remove unused data pointer
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy e3ac529815 [NETFILTER]: nf_queue: make queue_handler const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:09 -08:00
Alexey Dobriyan 7351a22a3a [NETFILTER]: ip{,6}_queue: convert to seq_file interface
I plan to kill ->get_info which means killing proc_net_create().

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:20 -08:00
Herbert Xu 2ca7b0ac02 [NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom
This patch replaces unnecessary uses of skb_copy, pskb_copy and
skb_realloc_headroom by functions such as skb_make_writable and
pskb_expand_head.

This allows us to remove the double pointers later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:28 -07:00
Herbert Xu 37d4187922 [NETFILTER]: Do not copy skb in skb_make_writable
Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:27 -07:00
Denis V. Lunev cd40b7d398 [NET]: make netlink user -> kernel interface synchronious
This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:15:29 -07:00
Stephen Hemminger b95cce3576 [NET]: Wrap hard_header_parse
Wrap the hard_header_parse function to simplify next step of
header_ops conversion.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:51 -07:00
Eric W. Biederman b4b510290b [NET]: Support multiple network namespaces with netlink
Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.

This patch updates all of the existing netlink protocols
to only support the initial network namespace.  Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.

As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.

The implementation in af_netlink is a simple filter implementation
at hash table insertion and hash table look up time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:09 -07:00
Eric W. Biederman e9dc865340 [NET]: Make device event notification network namespace safe
Every user of the network device notifiers is either a protocol
stack or a pseudo device.  If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.

To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.

As the rest of the code is made network namespace aware these
checks can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:09 -07:00
Eric W. Biederman 457c4cbc5a [NET]: Make /proc/net per network namespace
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:06 -07:00
Patrick McHardy af65bdfce9 [NETLINK]: Switch cb_lock spinlock to mutex and allow to override it
Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:03 -07:00
Arnaldo Carvalho de Melo 27d7ff46a3 [SK_BUFF]: Introduce skb_copy_to_linear_data{_offset}
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-04-25 22:28:29 -07:00
Patrick McHardy 1b53d9042c [NETFILTER]: Remove changelogs and CVS IDs
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:35 -07:00
Arnaldo Carvalho de Melo b529ccf279 [NETLINK]: Introduce nlmsg_hdr() helper
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:34 -07:00
Arnaldo Carvalho de Melo 27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Eric Dumazet b7aa0bf70c [NET]: convert network timestamps to ktime_t
We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:34 -07:00
Eric W. Biederman 0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
YOSHIFUJI Hideaki e905a9edab [NET] IPV4: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:19:39 -08:00
Thomas Graf 82e91ffef6 [NET]: Turn nfmark into generic mark
nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:21:38 -08:00
Patrick McHardy d8a585d78e [NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue
Based on patch by James D. Nurmi:

I've got some code very dependant on nfnetlink_queue, and turned up a
large number of warns coming from skb_trim.  While it's quite possibly
my code, having not seen it on older kernels made me a bit suspect.

Anyhow, based on some googling I turned up this thread:
http://lkml.org/lkml/2006/8/13/56

And believe the issue to be related, so attached is a small patch to
the kernel -- not sure if this is completely correct, but for anyone
else hitting the WARN_ON(1) in skbuff.h, it might be helpful..

Signed-off-by: James D. Nurmi <jdnurmi@gmail.com>

Ported to ip6_queue and nfnetlink_queue and added return value
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-11-15 21:18:48 -08:00
Brian Haley 1192e403e9 [NETFILTER]: make some netfilter globals __read_mostly
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:19:58 -07:00
Brian Haley 94aec08ea4 [NETFILTER]: Change tunables to __read_mostly
Change some netfilter tunables to __read_mostly.  Also fixed some
incorrect file reference comments while I was in there.

(this will be my last __read_mostly patch unless someone points out
something else that needs it)

Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 15:18:54 -07:00
Patrick McHardy 84fa7933a3 [NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).

Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:53:53 -07:00
Darrel Goeddel c7bdb545d2 [NETLINK]: Encapsulate eff_cap usage within security framework.
This patch encapsulates the usage of eff_cap (in netlink_skb_params) within
the security framework by extending security_netlink_recv to include a required
capability parameter and converting all direct usage of eff_caps outside
of the lsm modules to use the interface.  It also updates the SELinux
implementation of the security_netlink_send and security_netlink_recv
hooks to take advantage of the sid in the netlink_skb_params struct.
This also enables SELinux to perform auditing of netlink capability checks.
Please apply, for 2.6.18 if possible.

Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by:  James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:55 -07:00
Patrick McHardy ef47c6a7b8 [NETFILTER]: ip_queue/nfnetlink_queue: drop bridge port references when dev disappears
When a device that is acting as a bridge port is unregistered, the
ip_queue/nfnetlink_queue notifier doesn't check if its one of
physindev/physoutdev and doesn't release the references if it is.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:48 -07:00
Patrick McHardy 32292a7ff1 [NETFILTER]: Fix section mismatch warnings
Fix section mismatch warnings caused by netfilter's init_or_cleanup
functions used in many places by splitting the init from the cleanup
parts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09 22:25:34 -07:00
Andrew Morton 65b4b4e81a [NETFILTER]: Rename init functions.
Every netfilter module uses `init' for its module_init() function and
`fini' or `cleanup' for its module_exit() function.

Problem is, this creates uninformative initcall_debug output and makes
ctags rather useless.

So go through and rename them all to $(filename)_init and
$(filename)_fini.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28 17:02:48 -08:00
Arjan van de Ven 4a3e2f711a [NET] sem2mutex: net/
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:33:17 -08:00
Thomas Graf 850a9a4e3c [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption
The size of the skb carrying the netlink message is not
equivalent to the length of the actual netlink message
due to padding. ip_queue matches the length of the payload
against the original packet size to determine if packet
mangling is desired, due to the above wrong assumption
arbitary packets may not be mangled depening on their
original size.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-07 14:56:12 -08:00
Herbert Xu 325ed82393 [NET]: Fix packet timestamping.
I've found the problem in general.  It affects any 64-bit
architecture.  The problem occurs when you change the system time.

Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base.  You then wind the clock back
by a day.  From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.

In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.

When we do overflow, we'll need a better solution of course.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 13:57:23 -07:00
Patrick McHardy 066286071d [NETLINK]: Add "groups" argument to netlink_kernel_create
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 16:01:11 -07:00
Patrick McHardy a61bbcf28a [NET]: Store skb->timestamp as offset to a base timestamp
Reduces skb size by 8 bytes on 64-bit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:58:24 -07:00
Harald Welte bbd86b9fc4 [NETFILTER]: add /proc/net/netfilter interface to nf_queue
This patch adds a /proc/net/netfilter/nf_queue file, similar to the
recently-added /proc/net/netfilter/nf_log.  It indicates which queue
handler is registered to which protocol family.  This is useful since
there are now multiple queue handlers in the treee (ip[6]_queue,
nfnetlink_queue).

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:51:18 -07:00
Harald Welte 210a9ebef2 [NETFILTER]: ip{6}_queue: prevent unregistration race with nfnetlink_queue
Since nfnetlink_queue can override ip{6}_queue as queue handlers, we
can no longer blindly unregister whoever is registered for PF_INET[6],
but only unregister ourselves.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:51:08 -07:00
Harald Welte 0ab43f8499 [NETFILTER]: Core changes required by upcoming nfnetlink_queue code
- split netfiler verdict in 16bit verdict and 16bit queue number
- add 'queuenum' argument to nf_queue_outfn_t and its users ip[6]_queue
- move NFNL_SUBSYS_ definitions from enum to #define
- introduce autoloading for nfnetlink subsystem modules
- add MODULE_ALIAS_NFNL_SUBSYS macro
- add nf_unregister_queue_handlers() to register all handlers for a given
  nf_queue_outfn_t
- add more verbose DEBUGP macro definition to nfnetlink.c
- make nfnetlink_subsys_register fail if subsys already exists
- add some more comments and debug statements to nfnetlink.c

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:36:49 -07:00
Harald Welte 2cc7d57309 [NETFILTER]: Move reroute-after-queue code up to the nf_queue layer.
The rerouting functionality is required by the core, therefore it has
to be implemented by the core and not in individual queue handlers.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:36:19 -07:00
Harald Welte 4fdb3bb723 [NETLINK]: Add properly module refcounting for kernel netlink sockets.
- Remove bogus code for compiling netlink as module
- Add module refcounting support for modules implementing a netlink
  protocol
- Add support for autoloading modules that implement a netlink protocol
  as soon as someone opens a socket for that protocol

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:35:08 -07:00
Harald Welte 089af26c70 [NETFILTER]: Rename skb_ip_make_writable() to skb_make_writable()
There is nothing IPv4-specific in it.  In fact, it was already used by
IPv6, too...  Upcoming nfnetlink_queue code will use it for any kind
of packet.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:34:40 -07:00
Harald Welte 6869c4d8e0 [NETFILTER]: reduce netfilter sk_buff enlargement
As discussed at netconf'05, we're trying to save every bit in sk_buff.
The patch below makes sk_buff 8 bytes smaller.  I did some basic
testing on my notebook and it seems to work.

The only real in-tree user of nfcache was IPVS, who only needs a
single bit.  Unfortunately I couldn't find some other free bit in
sk_buff to stuff that bit into, so I introduced a separate field for
them.  Maybe the IPVS guys can resolve that to further save space.

Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
alike are only 6 values defined), but unfortunately the bluetooth code
overloads pkt_type :(

The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just
came up with a way how to do it without any skb fields, so it's safe
to remove it.

- remove all never-implemented 'nfcache' code
- don't have ipvs code abuse 'nfcache' field. currently get's their own
  compile-conditional skb->ipvs_property field.  IPVS maintainers can
  decide to move this bit elswhere, but nfcache needs to die.
- remove skb->nfcache field to save 4 bytes
- move skb->nfctinfo into three unused bits to save further 4 bytes

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:31:04 -07:00
Patrick McHardy 66a79a19a7 [NETFILTER]: Fix HW checksum handling in ip_queue/ip6_queue
The checksum needs to be filled in on output, after mangling a packet
ip_summed needs to be reset.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:10:35 -07:00
Harald Welte 9bb7bc942d [NETFILTER]: Fix deadlock with ip_queue and tcp local input path.
When we have ip_queue being used from LOCAL_IN, then we end up with a
situation where the verdicts coming back from userspace traverse the TCP
input path from syscall context.  While this seems to work most of the
time, there's an ugly deadlock:

syscall context is interrupted by the timer interrupt.  When the timer
interrupt leaves, the timer softirq get's scheduled and calls
tcp_delack_timer() and alike.  They themselves do bh_lock_sock(sk),
which is already held from somewhere else -> boom.

I've now tested the suggested solution by Patrick McHardy and Herbert Xu to
simply use local_bh_{en,dis}able().

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-30 15:35:26 -07:00
Herbert Xu 2a0a6ebee1 [NETLINK]: Synchronous message processing.
Let's recap the problem.  The current asynchronous netlink kernel
message processing is vulnerable to these attacks:

1) Hit and run: Attacker sends one or more messages and then exits
before they're processed.  This may confuse/disable the next netlink
user that gets the netlink address of the attacker since it may
receive the responses to the attacker's messages.

Proposed solutions:

a) Synchronous processing.
b) Stream mode socket.
c) Restrict/prohibit binding.

2) Starvation: Because various netlink rcv functions were written
to not return until all messages have been processed on a socket,
it is possible for these functions to execute for an arbitrarily
long period of time.  If this is successfully exploited it could
also be used to hold rtnl forever.

Proposed solutions:

a) Synchronous processing.
b) Stream mode socket.

Firstly let's cross off solution c).  It only solves the first
problem and it has user-visible impacts.  In particular, it'll
break user space applications that expect to bind or communicate
with specific netlink addresses (pid's).

So we're left with a choice of synchronous processing versus
SOCK_STREAM for netlink.

For the moment I'm sticking with the synchronous approach as
suggested by Alexey since it's simpler and I'd rather spend
my time working on other things.

However, it does have a number of deficiencies compared to the
stream mode solution:

1) User-space to user-space netlink communication is still vulnerable.

2) Inefficient use of resources.  This is especially true for rtnetlink
since the lock is shared with other users such as networking drivers.
The latter could hold the rtnl while communicating with hardware which
causes the rtnetlink user to wait when it could be doing other things.

3) It is still possible to DoS all netlink users by flooding the kernel
netlink receive queue.  The attacker simply fills the receive socket
with a single netlink message that fills up the entire queue.  The
attacker then continues to call sendmsg with the same message in a loop.

Point 3) can be countered by retransmissions in user-space code, however
it is pretty messy.

In light of these problems (in particular, point 3), we should implement
stream mode netlink at some point.  In the mean time, here is a patch
that implements synchronous processing.  

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-03 14:55:09 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00