Similar to the nfnetlink_queue fixes:
The peer_pid must be checked in all cases when a logging instance exists,
additionally we must check whether an instance exists before attempting
to configure it to avoid NULL ptr dereferences.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whatever that comment tries to say, I don't get it and it looks like
a leftover from the time when RCU wasn't used properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_nat_setup_info gets the hook number and translates that to the
manip type to perform. This is a relict from the time when one
manip per hook could exist, the exact hook number doesn't matter
anymore, its converted to the manip type. Most callers already
know what kind of NAT they want to perform, so pass the maniptype
in directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the timer is late its timeout might be before the current time,
in which case a very large value is dumped.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for SCTP to ctnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for James Morris' connsecmark.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for master tuple event notification and
dumping. Conntrackd needs this information to recover related
connections appropriately.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The combination of NAT and helpers may produce TCP sequence adjustments.
In failover setups, this information needs to be replicated in order to
achieve a successful recovery of mangled, related connections. This patch is
particularly useful for conntrackd, see:
http://people.netfilter.org/pablo/conntrack-tools/
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When terminating DSL connections for an assortment of random customers, I've
found it necessary to use iptables to clamp the MSS used for connections to
work around the various ICMP blackholes in the greater net. Unfortunately,
the current behaviour in Linux is imperfect and actually make things worse,
so I'm proposing the following: increasing the MSS in a packet can never be
a good thing, so make --set-mss only lower the MSS in a packet.
Yes, I am aware of --clamp-mss-to-pmtu, but it doesn't work for outgoing
connections from clients (ie web traffic), as it only looks at the PMTU on
the destination route, not the source of the packet (the DSL interfaces in
question have a 1442 byte MTU while the destination ethernet interface is
1500 -- there are problematic hosts which use a 1300 byte MTU). Reworking
that is probably a good idea at some point, but it's more work than this is.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Old userspace doesn't support revision 1, especially for IPv6, which
is only available in the SVN snapshot.
Add compat support for revision 0.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current netfilter SVN version includes support for this, so enable
it in the kernel as well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make xt_compat_match_from_user return an int to make it usable in the
*tables iterator macros and kill a now unnecessary wrapper function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the NF_CONNTRACK_ENABLED option. It was meant for a smoother upgrade
to nf_conntrack, people having reconfigured their kernel at least once since
ip_conntrack was removed will have the NF_CONNTRACK option already set.
People upgrading from older kernels have to reconfigure a lot anyway.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The queueing core doesn't care about the exact return value from
the queue handler, so there's no need to go through the trouble
of returning a meaningful value as long as we indicate an error.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need a default case in nfqnl_build_packet_message(), the
copy_mode is validated when it is set. Tell the compiler about
the possible types and remove the default case. Saves 80b of
text on x86_64.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally I wanted to just remove the QDEBUG macro and use pr_debug, but
none of the messages seems worth keeping.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfqnl_set_mode takes the queue lock and calls __nfqnl_set_mode. Just move
the code from __nfqnl_set_mode to nfqnl_set_mode since there is no other
user.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU for queue instances hash. Avoids multiple atomic operations
for each packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The peer_pid must be checked in all cases when a queue exists, currently
it is not checked if for NFQA_CFG_QUEUE_MAXLEN when a NFQA_CFG_CMD
attribute exists in some cases. Same for the queue existance check,
which can cause a NULL pointer dereference.
Also consistently return -ENODEV for "queue not found". -ENOENT would
be better, but that is already used to indicate a queued skb id doesn't
exist.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sequence counter doesn't need to be an atomic_t, just move the increment
inside the locked section.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't log "nf_hook: Verdict = QUEUE." message with NETFILTER_DEBUG=y.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move duplicated error handling to end of function and add a helper function
to release the device and module references from the queue entry.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that issue_verdict doesn't need to free the queue entries anymore,
all it does is disable local BHs and call nf_reinject. Move the BH
disabling to the okfn invocation in nf_reinject and kill the
issue_verdict functions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.
Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A queue entry lookup currently looks like this:
find_dequeue_entry -> __find_dequeue_entry ->
__find_entry -> cmpfn -> id_cmp
Use simple open-coded list walking and kill the cmpfn for
find_dequeue_entry. Instead add it to nfqnl_flush (after
similar cleanups) and use nfqnl_flush for both complete
flushes and flushing entries related to a device.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use list_add_tail/list_for_each_entry instead of list_add and
list_for_each_prev as a preparation for switching to RCU.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We hold a module reference for each queued packet, so the hook that
queued the packet can't disappear. Also remove an obsolete comment
stating the opposite.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up
if (x) y;
constructs. We've got nothing to hide :)
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.
Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch from ipv6_find_hdr to ipv6_skip_exthdr to avoid pulling in ip6_tables
and ipv6 when only using it for IPv4.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rate estimator match. The rate estimator match can match on
estimated rates by the RATEEST target. It supports matching on
absolute bps/pps values, comparing two rate estimators and matching
on the difference between two rate estimators.
This is what I use to route outgoing data connections from a FTP
server over two lines based on the available bandwidth:
# estimate outgoing rates
iptables -t mangle -A POSTROUTING -o eth0 -j RATEEST --rateest-name eth0 \
--rateest-interval 250ms \
--rateest-ewma 0.5s
iptables -t mangle -A POSTROUTING -o ppp0 -j RATEEST --rateest-name ppp0 \
--rateest-interval 250ms \
--rateest-ewma 0.5s
# mark based on available bandwidth
iptables -t mangle -A BALANCE -m state --state NEW \
-m helper --helper ftp \
-m rateest --rateest-delta \
--rateest1 eth0 \
--rateest-bps1 2.5mbit \
--rateest-gt \
--rateest2 ppp0 \
--rateest-bps2 2mbit \
-j CONNMARK --set-mark 0x1
iptables -t mangle -A BALANCE -m state --state NEW \
-m helper --helper ftp \
-m rateest --rateest-delta \
--rateest1 ppp0 \
--rateest-bps1 2mbit \
--rateest-gt \
--rateest2 eth0 \
--rateest-bps2 2.5mbit \
-j CONNMARK --set-mark 0x2
iptables -t mangle -A BALANCE -j CONNMARK --restore-mark
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends the xt_DSCP target by xt_TOS v1 to add support for selectively
setting and flipping any bit in the IPv4 TOS and IPv6 Priority fields.
(ipt_TOS and xt_DSCP only accepted a limited range of possible
values.)
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends the xt_dscp match by xt_tos v1 to add support for selectively
matching any bit in the IPv4 TOS and IPv6 Priority fields. (ipt_tos
and xt_dscp only accepted a limited range of possible values.)
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>